Are Games The Best Benchmark For AI?

Unlike the real world, games can provide a theoretically infinite amount of data to train created AI systems

A few years ago, OpenAI held a tournament to test the potential of OpenAI Five, a game engine designed to play multiplayer battle arena games like Dota 2. During the same tournament, OpenAI Five defeated a team of professional players twice. Later, when the model was made publicly available, OpenAI Five managed to win against 99.4 per cent of people who matched up against it online.

Since then, many tech firms have invested heavily in games for research, but that approach is now changing. According to researchers, labs today have not been using games as benchmarks, instead of shifting their focus to other domains, including natural language processing. While some believe that games might lead to newer insights, spawning AI systems with several commercial applications, others think that an AI created to play games is limited by scope and design.

The Scenario

Most AI applied to games fall into the category of reinforcement learning, where a system consisting of agents is given a set of actions that it can use to its provided environment. The system usually starts by knowing nothing about the environment and receives rewards based on how its efforts bring it closer to a goal. As the system gradually starts receiving feedback from the environment, it learns sequences of actions that can help maximise its rewards. But unlike the real world, games can provide a theoretically infinite amount of data to train created AI systems. For example, to develop OpenAI Five, OpenAI made the system play the equivalent of 180 years’ worth of games every day for weeks.

Games have been conceived as potential AI benchmarks for many decades, as they distil problems into actions, states, and rewards and yet require certain reasoning to excel at while possessing a structure in keeping with how computers solve problems. Labs now focus on AI that can play imperfect information games, like poker, with a high skill level. In complete contrast to perfect information games such as chess, imperfect information games have hidden information from the players themselves, during the game, e.g., another player’s hand in poker.

But despite convenience from a research perspective, it was found that games are flawed AI benchmarks because of their abstractness and relative simplicity. Even the best game-playing systems present today, such as AlphaStar, usually struggle to reason about the states of other similar AI systems. They don’t adapt well to new environments, and can’t quickly solve problems they haven’t seen before, particularly problems that must be solved over a long time.

As an example, a reinforcement learning model that can play StarCraft 2 at an expert level won’t play a game with similar mechanics at any level of competency, as even slight changes to the original game will now degrade the model’s performance.

AI researcher and game designer at the Queen Mary University of London, Mike Cook, agrees that games “are not that special” as a benchmark for AI. He says that games’ role in society and culture matters.

Innovative Implementations

Nvidia, which has a vested interest in gaming hardware, stands firm behind the idea that games remain an important area of AI research, particularly for reinforcement learning. According to Nvidia, games are “clearly defined sandboxes” with rules and objectives that the real world lacks.

Microsoft, too, believes in the power of games being a platform for potential AI development, pointing to efforts like the ongoing Project Paidia. A joint initiative between Microsoft Research Cambridge, Microsoft-owned game studio Ninja Theory, Project Paida aims to drive research in reinforcement learning by enabling AI systems to learn to cooperate with other video game players.

DeepMind recently created another engine named XLand that can generate environments where researchers can train AI systems on several tasks. Each new task is then generated according to a system’s training history and in a way to help distribute the system’s skills across several challenges, such as “capturing the flag”.

After observing over a month of training, DeepMind claims that its systems in XLand demonstrate human-like behaviour such as teamwork, object permanence, awareness of time, and knowledge of the high-level structure of games that they may encounter.

Image Source: DeepMind

Meanwhile, researchers at Meta (formerly Facebook), are not entirely convinced that state-of-the-art game environments like XLand can achieve what its creators had set out to accomplish. AI systems trained in XLand have to repetitively stumble upon an exciting area by chance, and then be encouraged to revisit the area until it is no longer “interesting”, unlike humans.

One potential solution can be to test AI against games that pose a more relevant, general challenge to AI. Games that would require a mix of creativity, bluffing, intuition, and humour. AI that understands how humans think and generates its thought process is a holy grail for a subset of AI research to demonstrate a theory of mind.

Engaging algorithmic lessons could be learned from suitable games, like simulations or those requiring complex language use. AI systems need to learn to accomplish tasks further from these few demonstrations, understanding without providing verbal explanations or prior training, but such a learning approach is one to be discovered in the near future.

If you liked reading this, you might like our other stories

Big Problems To Address In AI & ML Datasets
Company Closeup: OpenAI – Redefining Artificial General Intelligence