Expert pours cold water on Apple's downbeat AI outlook — says lack of high-powered hardware could be to blame

(Image credit: Apple)

Professor Seok Joon Kwon of Sungkyunkwan University believes (via Jukan Choi) that the recent Apple research paper which found fundamental reasoning limits of modern large reasoning models (LRMs) and large language models (LLMs) is flawed because Apple does not have enough high-performance hardware to test what high-end LRMs and LLMs are truly capable of. The professor argues that Apple does not have a large GPU-based cluster comparable to those operated by Google, Microsoft, or xAI, and says its own hardware is unsuitable for AI.

Better hardware needed

The recently released Apple's research paper claimed that contemporary AI LLMs and LRMs fail to make sound judgements as the complexity of problems in controlled puzzle environments they were tasked to solve increased, revealing their fundamental limitations and debunking the common belief that these models can think like a human being. The researchers observed that models performed much better on well-known puzzles than on unfamiliar ones, indicating that their success likely stemmed from exposure during training rather than from adaptable or transferable problem-solving ability.

However, the professor claims that the key conclusion of the Apple research — that the accuracy of Claude 3.7 Sonnet Thinking and DeepSeek-R1 LRMs dropped to zero regardless of the available compute resources when complexity increases beyond a certain point — is flawed.

"This directly contradicts observations from actual language model scaling laws," Seok Joon Kwon argues. "Hundreds of scaling-related studies to date have consistently shown that performance improves in a power-law manner as the number of parameters increases, and beyond a certain size, performance is observed to move towards saturation. At the very least, performance might reach saturation, but it does not decrease. […] This might be because Apple does not have a GPU-based AI data center large enough to test a parameter space big enough to confirm scaling trends. […] Verifying the scaling law is similar to verifying the scaling law of large language models, and for this, Apple's researchers should have tested combinations of training data, parameters, and computational load and shown the performance curve."

The release of Apple's paper preceded its annual WWDC conference, where Apple, as expected, did not reveal anything significant related to its AI effort, prompting criticism that it may be falling behind in the global race for AI. Seok Joon Kwon believes that such a coincidence is not accidental, and Apple's intention was to downplay the achievements of companies like Anthropic, Google, OpenAI, or xAI, as the company is clearly behind the market leaders.

Fundamental hardware limitations

When Apple introduced its Apple Intelligence initiative in 2024, it focused on on-device processing and relatively basic tasks. At WWDC, the company revealed no progress related to its own data center-grade AI, thereby again limiting Apple Intelligence to on-device processing with strict privacy and performance constraints. While this approach strengthens its position among privacy-conscious users, it means that the company lacks the ability to train LLMs and LRMs that require substantial compute and user data to function competitively. At the same time, Apple now allows Siri and other AI tools to call out to external large language models — first ChatGPT 4o, soon Gemini — when Siri cannot answer a query on its own. In this case, ChatGPT only receives content explicitly approved by the user. Apple obscures the user's IP and assures no personal account data is shared or retained by OpenAI.

Such a hybrid approach is not common for Apple, and Professor Seok Joon Kwon believes that it is a result of Apple's fundamental focus on its closed ecosystem that prevented it from developing the right data center-grade hardware required for training LRMs and LLMs. In the end, Apple's M-series processors are designed primarily for client PCs and therefore their GPUs do not support FP16 used for AI training, whereas their memory subsystems rely on LPDDR5 memory rather than on high-performance HBM3E. Also, Apple’s M-series CPUs do not natively support widely used machine learning frameworks like PyTorch, requiring cumbersome conversions

As a result, if Apple wants to catch up with its rivals, it must develop dedicated server-grade processors with advanced memory subsystems and sophisticated AI training and inference capabilities that will not fundamentally rely on the designs of Apple's GPUs and NPUs for its M-series system-on-chips for client PCs.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

4 Comments Comment from the forums

coolitic

"Experts" that have a vested interest in non-stop AI-interest-churn.

Andrew Kelley had a good take on this:

LLMs are a way to make software take orders of magnitude more computational power, electricity, and human labor, while delivering a product whose extremely volatile quality is impossible to assure. The work will never be completed; it will only create the need for ever more labor.
From: https://andrewkelley.me/post/why-we-cant-have-nice-software.html
Reply
Penzi

The fundamental nature of token-based LLMs strikes me as having an in-built limitation that likely conforms to Apple’s take rather than NEED MORE POWER
Reply
bigdragon

I don't think this professor is being objective towards Apple's conclusions. I work in the "real world" and in an industry the AI tech bros are trying to stuff all sorts of AI crap into. LLMs are not trustworthy. I've seen Apple's research confirmed too many times to count. The inclination of these AI solutions to hallucinate and make up facts to support their conclusions makes them objectively dangerous in some scenarios. I'm honestly shocked a few companies haven't gone under yet for relying too much on AI.

This professor should work on fixing the hallucination problem. LLMs hallucinate no matter how much hardware and performance you throw at them. Begging Apple for more hardware performance is like begging for 1 more lane on a congested highway or 1 more turn after losing a game of Civilization -- hardware performance only masks a fundamental flaw in the system.
Reply
Lieutenant Barclay

Shill pours cold water on Apple's downbeat AI outlookFixed it.
Reply