ChatGPT plays Red Dead Redemption II — AI vision system was overwhelmed

An AI plays RDR2
(Image credit: Rockstar)

A group of researchers from China and Singapore recently published a paper detailing the challenge of getting an AI to play Red Dead Redemption II (RDR2). They also assessed and commented on the AI’s game-playing performance. In the paper Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study (PDF) we learn about the concept of General Computer Control (GCC) for AIs, as well as a six-module agent framework dubbed CRADLE, used to interface between GPT-4V and RDR2. In their conclusion, major issues facing the AI gaming agent are laid at the door of the GPT-4V vision system.

According to the research paper, this RDR2 playing project provides insight into how far AIs have progressed to achieving Artificial General Intelligence (AGI). To this end, they basically try and get an AI, powered by OpenAI’s GPT-4V, to interact with a computer – taking in the visual and audio cues to intelligently use the computer, like the average computer-savvy human. Thus, they try to demonstrate that an AI can be successful at complex General Computer Control (GCC).

The researchers chose RDR2 as the game to put under the spotlight as they claim it has a “complex black box control system, which epitomizes the most demanding computer tasks and enables us to evaluate the performance boundaries of our framework in such virtual environments.” Indeed, it offers rich environments and diverse situations for players to navigate. Additionally, UI elements like dialogues, unique icons, in-game prompts, and instructions ensure no background knowledge is taken for granted – which is great for AI learning. Lastly, the researchers say that RDR2 game control via mouse and keyboard provides a better workout for GCC than most other software a computer user might run day-to-day.

Though the published paper focuses on RDR2, CRADLE is designed to be extended as part of its GCC purpose, “to support a broader spectrum of games, such as simulation and strategy games, as well as various software applications.” The key innovation here is the introduction of the CRADLE framework, so let’s look more closely at that now.

(Image credit: arxiv.org)

Above you can see an overview of how CRADLE handles the challenge of GCC gaming, specifically in RDR2. The researchers hoped to demonstrate CRADLE's ability to learn the game from scratch (without access to any internal game state or API) just like a human. Then, the AI agent was to progress in the game by navigating the world and completing tasks, following the main storyline in RDR2.

Overall, CRADLE seems to have been moderately successful in RDR2 gaming. The researchers say they assessed even representative tasks from the main storyline and open-ended missions. The key finding was that “CRADLE can complete all tasks in the main storyline consistently.” Some notable exceptions were: Protect Dutch which involves a fast-paced gun battle, Search House which requires the agent to explore a complex indoor environment, and the open-ended task with a long horizon.

(Image credit: arxiv.org)

You can see the importance of task inference and reflection in CRADLE, above. These refinements are especially important in the agent’s movement through the game and understanding when tasks are complete. During the study, some of the repeated difficulties experienced by CRADLE were blamed on GPT4-V. Specifically, it is claimed that “GPT-4V’s spatial-visual recognition capability is insufficient for precise fine-grained control.” Moreover, GPT4-V is said to struggle with domain-specific concepts, such as unique icons within the game, with understanding mini-maps, as well as with general obstacles in the game environment.

(Image credit: arxiv.org)

The full study can be read via this link, but we wish that the researchers had shared some video of RDR2 gameplay using their AI agent. We wonder how other multimodal AIs could perform in RDR2 via CRADLE?

Mark Tyson
News Editor

Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.

  • NinoPino
    Results achieved are really impressive.
    Reply
  • thisisaname
    Would have been funnier if they answered the question "Can it play Crysis"


    Edit: spelling
    Reply
  • Toadster88
    society worries about young kinds learning shooting games because when they grow up they'll be bad actors
    so what do we do? teach AI shooting games (LOL)
    Reply
  • watzupken
    This just proves while humans may not be super good at something, but we have been designed to be compact, and still easily handle a lot of day to day task. To get AI to play a game, I presume it requires very beefy AI hardware to store and process the data. All these requires a lot of power for just basic things people do.
    Reply
  • taffyrailway
    > but we wish that the researchers had shared some video of RDR2 gameplay using their AI agent. We wonder how other multimodal AIs could perform in RDR2 via CRADLE?

    Link at the top of the paper to the project website

    https://baai-agents.github.io/Cradle/
    Which contains two videos of the AI in action and GitHub code.

    Cx-D708BedYView: https://www.youtube.com/watch?v=Cx-D708BedY

    Oa4Ese8mMD0View: https://www.youtube.com/watch?v=Oa4Ese8mMD0
    Reply
  • Joseph_138
    Get ready for play through videos on Youtube, done entirely by AI.
    Reply
  • mtrantalainen
    watzupken said:
    This just proves while humans may not be super good at something, but we have been designed to be compact, and still easily handle a lot of day to day task. To get AI to play a game, I presume it requires very beefy AI hardware to store and process the data. All these requires a lot of power for just basic things people do.
    You have to remember that this game was designed for humans and the UI is result of a lot of playtesting with actual humans.

    Imagine a human trying to play a game originally optimized for a robot!
    Reply
  • ivan_vy
    Joseph_138 said:
    Get ready for play through videos on Youtube, done entirely by AI.
    content farm are gonna have a field day generating cheap content, YT is about to be even worse to watch
    Reply
  • NeonSplatters
    42% of CEOs think AI could destroy humanity.

    Chinese researchers: "Hey guys, let's let AI play an outlaw and member of the Van der Linde gang, who must deal with the decline of the Wild West while attempting to survive against government forces, rival gangs, and other adversaries!"
    Reply
  • ivan_vy
    a matter of time (color me surprised if not already happening) that AI gonna be playing war ARMA-like simulators by the thousands of instances.
    Reply