Microsoft Taught AI How To Beat 'Ms. Pacman'

Ms. Pacman doesn't seem like a hard game. All you have to do is guide its titular character through mazes to gobble up pellets and avoid a group of ghosts. Underneath that veneer of simplicity, however, is a complicated game that's surprisingly well-suited to helping AI learn how to think like humans do. Or at least that's what Microsoft said today when it announced that it made an AI capable of getting the game's highest possible score.

That's because the company took a novel approach to teaching its AI how to master Ms. Pacman. It gave 150 "agents" a specific task, such as eating a certain pellet or avoiding a ghost, and then created a "top agent" that decided how to move based on the other agents' feedback. The AI balanced each individual agents' desire to accomplish a certain goal with the top agent's mission to get the maximum score of 999,990.

The top agent took into account how many agents advocated for going in a certain direction, but it also looked at the intensity with which they wanted to make that move. For example, if 100 agents wanted to go right because that was the best path to their pellet, but three wanted to go left because there was a deadly ghost to the right, it would give more weight to the ones who had noticed the ghost and go left.

That's similar to how many of us think. We collect information, decide how important it is, and then act based on that judgment. That skill is key to many games, including classics like Ms. Pacman. There's often so much happening on-screen at any given time that responding to each stimuli would be impossible. Instead, you have to make quick decisions based on the information at hand and hope that you made the right choice.

The system was also taught via reinforcement learning. This means it was given a positive or negative response for every action, then told to figure out how to get the most positive responses. Instead of teaching the AI by showing it how pro players approach Ms. Pacman--a process called supervised learning--the AI had to figure things out for itself. (It's kind of like having your kids solve a problem instead of solving it for them.)

Just like Google's work on AlphaGo, teaching AI to become a Ms. Pacman whiz isn't Microsoft's end goal. Instead, the company said that the approach learned from this experiment could be used to train AI how to handle other tasks, such as managing schedules or improving natural language processing. The company shared what it learned in a paper titled "Hybrid Reward Architecture for Reinforcement Learning."

This effort highlights the value that games offer to AI research. Many of them come naturally to us, but teaching computers how to play them as well as AlphaGo played Go or Microsoft's AI played Ms. Pacman is much harder. The potential for mastery is there, as both of these AI proved, but the journey is the most exciting part. Games also make it easy for us to understand how far AI has come. Knowing an AI can recognize cats isn't impressive--watching it dominate a game shortly after it came into existence, let alone learned how to play the game, is much cooler.

Nathaniel Mott is a freelance news and features writer for Tom's Hardware US, covering breaking news, security, and the silliest aspects of the tech industry.

20 Comments Comment from the forums

dstarr3

I suppose it is kind of scary to think that we already have AI-controlled cars when the most sophisticated AI we have is just now getting a grip on arcade and board games. We might be jumping the gun slightly on the whole AI-controlled car thing.
Reply
jimmysmitty

19816656 said:
I suppose it is kind of scary to think that we already have AI-controlled cars when the most sophisticated AI we have is just now getting a grip on arcade and board games. We might be jumping the gun slightly on the whole AI-controlled car thing.

Cars are different due to the radar sensors, the same ones used to tell you when you are getting clos to hitting an object, and the cameras which can interpret and analyze the data in the image in real time.

Reply
dstarr3

It's not so much about the sensing technology as much as it is the decision-making technology. It reminds of the interesting philosophical dilemma that an autonomous car could face: An accident occurs in front of the the car, and not avoiding the accident would surely kill the passengers in the car. However, the only way to avoid the accident is to drive onto the sidewalk and kill pedestrians. So who does the computer decide to kill and how?

If computers are just now figuring out Pacman, I don't know if they're ready for problems as difficult as that one.
Reply
sh4dow83

19817003 said:
If computers are just now figuring out Pacman, I don't know if they're ready for problems as difficult as that one.

Oh please - as if humans have the capability of figuring out such a problem. You said it yourself - it's a dilemma.

And that's just the ideal case. That somebody will sit there, think "That's quite a problem" and ponder it.

But there's no time!!
So what happens instead?
Human instinct to protect oneself? Mowing down a bunch of kids because even someone who is terminally ill can't think clearly in that split second?
Maybe hesitation - possibly ending up in a whole big family most of whom still have their lives ahead of them in that car getting killed instead of "just" one very old person?

Which I find raises the question - humans are so flawed that I wonder whether machines could on average ever make decisions that are worse. Because even if it would make the decision purely based on number of casualties - most of the time, that's probably a good guess.

Of course... knowing our screwed up world, I suspect that somebody would put in a whitelist for special rich people with special implants sooner or later...
Reply
darkfoxrs

A human can go to jail... but a car?
Reply
derekullo

If a kid jumps in front of your car from 20 feet away while you are going 40 miles per hour (56 feet per second) the 2 objects would collide in about 0.35 seconds.

For comparison a blink is 0.3 to 0.4 seconds.

An unlucky/inattentive human driver may not have time to react at all and collide at full speed.

A lucky/experienced human driver may break hard and swerve out of the way into a randomly chosen lane, left or right, without caring about any cars on either side of him knowing the cosmetic damage to a car can be repaired but a person is much harder to fix.
This driver may not hit the kid nor collide with any other vehicle meaning all obstacles are avoided and no damage was done to anything / person, of course most drivers would be visibly shaken by this event.
The issue is that not everyone has the experience, reaction time and reliability to do this every single time.

A computer driving a car would know the car's exact limits rain or shine along with the current traction on each wheel.
I remember the 2016 GMC Sierra Denali commercial "1000 times a second".
Thus allowing the car to either swerve around the kid completely or maneuver the car into a nearby open lane, being able to monitor each lane and even a complete 360° around the vehicle.

A group of AI powered vehicles could even communicate with each other to help one of them avoid an obstacle.

Say car A and car B are driving side by side on a 2 lane road, 2 one direction a medium and 2 in the opposite direction.
A kid jumps into the path of car A.
Car A asks car B if it is possible to slow down so I can immediately take your position to avoid this obstacle.
Car B says sure and instantly applies its brakes.
Car A then swerves into car B's lane avoiding the kid.

AI may would make for faster trips as well.

Think of a red light.
When the light turns green all the cars don't go at once, they go in sequence.
Car 1 releases his brake and accelerates, car 2 sees the brake lights on car 1 have disappeared and in 0.3 seconds car 2 releases his brakes and pushes the accelerator and this process goes on and on till it gets to your car.

With AI as soon as the light turns green all the AI vehicles could accelerate at a steady rate all at the same time.
Technically the acceleration rate does not have to be steady it just has to be the same for each car so none of them "move" in relation to each other, always 15 feet in front and 15 feet behind for all cars meaning you could have 10 Tesla Model S P100D accelerating a full speed in a line and as long as each one of them maintains the same acceleration everything is fine, although I'm sure some regulatory company will institute a framework, maximum acceleration, maximum speed, minimum space between vehicles and of course all different variables for when it's raining, we still have to obey the laws of physics.
Reply
chaosmassive

skynet is coming,
the end is near
Reply
urbanj

19817560 said:
When the light turns green...

Thing you forgot about in a world where ALL cars are AI, is that there would be NO RED LIGHT :P

...we might also consider not having windows at all in vehicles at that time either, as the sight of our AI driven cars zipping past one another within fractions of an inch at high speeds would like cause most people to change their pants upon exiting the vehicle :lol:

Reply
derekullo

19817871 said:
19817560 said:
When the light turns green...

Thing you forgot about in a world where ALL cars are AI, is that there would be NO RED LIGHT :P

...we might also consider not having windows at all in vehicles at that time either, as the sight of our AI driven cars zipping past one another within fractions of an inch at high speeds would like cause most people to change their pants upon exiting the vehicle :lol:

That is true, if all cars were AI then we wouldn't need red lights, but unless we pass a law making it illegal to drive there will still be people driving who will need a red light and with our cars not recognizing them as fellow AI would most likely slow down around them while we give them the American Salute.

Reply
derekullo

A far more likely scenario, at least in the next couple decades, are humans sharing the job of driving with AI rather than being totally replaced by them.

That above line does feel like a speech they would give before presenting the prototype of Skynet
Reply

Show more comments