Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models

Apple
(Image credit: Apple)

Apple researchers have tested advanced AI reasoning models — which are called large reasoning models (LRM) — in controlled puzzle environments and found that while they outperform 'standard' large language models (LLMs) models on moderately complex tasks, both fail completely as complexity increases. 

The researchers from Apple, which is not exactly at the forefront of AI development, believe that the current LRMs and LLMs have fundamental limits in their ability to generalize reasoning, or rather thinking the way humans do.

Anton Shilov
Contributing Writer

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

  • derekullo
    Maybe we just suck at asking questions :P
    Reply
  • Konomi
    Doesn't need a research paper to tell you that, just common sense. It isn't a human therefore we shouldn't expect it to think like one. But I suppose people need to find some validation in their efforts to make AI more than a meme..
    Reply
  • SomeoneElse23
    Is that a POP we're hearing of the AI bubble?

    "AI", as it is today, is now basically Search 2.0.
    Reply
  • Kindaian
    That mimics precisely my experience.

    When i ask simple and common things, the LLM answers with an adequate response. When I ask more complex and niche things, the response is just random garble.

    Also, for code / programming, it is not only quite useless, but also dangerous. The samples that they base the answers on have in a lot of cases disclaimers like "don't use this in prod" and the like. But the LLM will not understand what that means and will just give you something full of vulnerabilities.

    To add insult to injury, the LLMs don't reason about the code that they are building. They will not use best practices, patterns, consider code re-usability or maintenance. They won't optimize for speed, memory footprint or reliability. They will just give you an answer, and even if it may even work, consider it a bizarre coincidence.

    Otherwise you are shooting yourself in the foot!

    With regard of the datasets used for training, and considering that people are using the regurgitated output of LLM to "create" more content, it just means that the noise to signal will diminish in quality over time, which means that the LLMs will become worse, not better overtime.
    Reply
  • shady28
    This jives with the Atari 2600 beating ChatGPT 4.0 in Chess at beginner level.

    From my experience AI is more like a powerful data aggregator, that is to say it matches patterns and cross indexes things and puts together something that appears correct and accurate (but it may not be).

    This does make it an incredibly powerful search engine, able to not only find what you're looking for but pull related information from multiple sources into a more complete consolidated answer.

    However, I've seen many instances that if the same question is asked two different ways, a different result will come up because it decided that the question was better answered by one source vs another.

    It's a far cry from intelligence able to solve a complex problem that hasn't been solved before or answer a question that hasn't been asked before, and any answer is somewhat dubious if it's used for anything important.
    Reply
  • SomeoneElse23
    Kindaian said:
    Also, for code / programming, it is not only quite useless, but also dangerous. The samples that they base the answers on have in a lot of cases disclaimers like "don't use this in prod" and the like. But the LLM will not understand what that means and will just give you something full of vulnerabilities.

    To add insult to injury, the LLMs don't reason about the code that they are building. They will not use best practices, patterns, consider code re-usability or maintenance. They won't optimize for speed, memory footprint or reliability. They will just give you an answer, and even if it may even work, consider it a bizarre coincidence.

    Otherwise you are shooting yourself in the foot!

    I've had the same experience.

    I stopped using ChatGPT for coding help after it lied to me 3 times about code that simply did not work.

    Since they put so much effort into making "AI" friendly, they should at least change the programming related answers to include a disclaimer:

    "I think this answer may be correct. But it may not be. It may have serious flaws or bad practices in it. Use at your own risk."
    Reply
  • baboma
    >I stopped using ChatGPT for coding help after it lied to me 3 times about code that simply did not work.

    To channel Jobs, you're doing it wrong. Don't use ChatGPT for coding help. (Disclaimer: Yes, I also tried.)

    There are dedicated code-help AIs and best practices to avail of. Read what the pros and experts are saying. There are lots of good advices to learn from. Here's one, off the cuff:

    https://fly.io/blog/youre-all-nuts/
    Reply
  • A Stoner
    I thought this was widely known.

    The circular logic that happens when I try discuss anything complex with LLMs is unbelievable. Then again, after talking to some humans, I think Apple is giving too many humans too much credit. Some of them seem to be just as programmed and incapable of real thought as the LLMs are.
    Reply
  • Konomi
    A Stoner said:
    I thought this was widely known.

    The circular logic that happens when I try discuss anything complex with LLMs is unbelievable. Then again, after talking to some humans, I think Apple is giving too many humans too much credit. Some of them seem to be just as programmed and incapable of real thought as the LLMs are.
    Take those humans back to where you found them and ask for a refund.
    Reply
  • baboma
    A more digestible and cogent commentary on Apple's AI paper is here:

    https://garymarcus.substack.com/p/a-knockout-blow-for-llms
    Who is Gary Marcus: https://substack.com/@garymarcus
    Summation (in the author's words):
    "AI is not hitting a wall.
    But LLMs probably are (or at least a point of diminishing returns).
    We need new approaches, and to diversify which roads are being actively explored."

    My add: LLM progress may be hitting a wall, but it current capabilities are already enough to supplant many jobs and functions done by humans today. Countries are now focused on improving AI, and there are untold billions if not trillions of dollars being poured into AI's further development.
    Reply