How Did OpenAI’s Bot Defeat the Team of Dota2 Semi-Pros?

21 August 2018
dota 2 bot beat team of professionals

How Did OpenAI’s Bot Defeat the Team of Dota2 Semi-Pros?

Games have become a widely adopted way of benchmarking the advances of artificial intelligence. Since Deep Blue won a chess game against Gary Kasparov in 1997, AI had many remarkable advances.…

Games have become a widely adopted way of benchmarking the advances of artificial intelligence. Since Deep Blue won a chess game against Gary Kasparov in 1997, AI had many remarkable advances. Machines have topped the best humans at most games held up as measures of human intellect, including chess, Scrabble, Othello, even Jeopardy!

A few years ago we witnessed Google’s AphaGo winning a game against world champion in Go (2,500-year-old game that’s exponentially more complex than chess and requires a bit of intelligence and strategy instead of just trying all possible combination of actions), showing the impressive advancement of artificial intelligence, especially in the past few years . Although impressive, it was still questionable if we are talking about intelligence and if this is the way to measure it.

Time for AI to become the best Dota2 team

It’s 2018 now, and we are witnessing another human champion defeated by an AI agent, this time in Dota2. A few months ago, OpenAI, a non-profit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel has announced that they are planning an official Dota2 match between their Dota2-playing AI agent and a team of former pros in Dota2. Not chess, not Othello or Go, but one of the world’s most popular online strategy games: Valve’s Dota 2. It immediately caught the attention: both of the AI community and the Dota gamers community.

But, for OpenAI’s Dota player this was not the first game it was about to play. Previously, in its first generations, it was constrained only to 1-vs-1 matches (which are supposed to be less complex) but it had been tested many times before even going 5-vs.-5. In June this year, OpenAI’s Dota player managed to beat five teams of amateur players in June (in 5-vs.5 settings), including one made up of Valve employees. The challenge to beat a team of pros has been called right after this success.

To make the game manageable for AI a few (low impact) constraints were introduced in the game such as narrower hero pool to choose from and item delivery couriers that are invincible. Those have been said to not affect much the measure of OpenAI Five’s success.

On August 5th, OpenAI Five (as the AI agent player was called) defeated the team of former pros in a best-of-three series of games, much easier than expected! It won the first game without having a tower destroyed by the opponent, which is truly remarkable. Moreover, it showed intelligence in the decisions and strategic decision-making as a result of good team play.

How did they do it?

OpenAI Five is actually a set of five single-layer, 1,024-unit long short-term memory (LSTM) recurrent neural networks assigned to each hero (in the 5-vs.-5 setting). The fact that it is not a multi-modular super-complex system but just a set of simple recurrent neural networks, makes it even more impressive.

“People used to think that this kind of thing was impossible using today’s deep learning”, as told by Greg Brockman, one of the co-founders and CEO of OpenAI. The simple setting of five networks that do not even communicate with each other has defeated a team of human masters. Even though the heroes (in fact, the networks) don’t communicate, there is still a noticeable team chemistry. Actually, the teamwork is designed to be controlled by a hyperparameter that can be set from 0 to 1 and adds a weight onto how much each hero should care about its own individual rewards compared to the average reward from the whole team. In this way, OpenAI Five is able to come up with its own strategies.

Elon Msk on Dota 2 AlphaGo
Elon Musk on Twitter about OpenAI’s win against human pros

All of the networks get an input every four frames about the state of the game. Each input is in fact 20,000 mostly floating-point numbers that encode vital information such as the location and health of visible units, giving it access to the same knowledge that human team can have. On average, the time of reaction is faster than human’s giving a slight advantage to the OpenAI’s agents.

The networks were trained with OpenAI’s Proximal Policy Optimization and self-play. They “play” 180 years’ worth of games every day — 80 percent against themselves and 20 percent against past selves. To make all of this possible, OpenAI used 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform.

OpenAI announced that they are working on improving OpenAI Five even more and that it will compete in “The International” – the biggest international Dota2 tournament, where teams can win millions of dollars. OpenAI Five will have a chance to become the champion of the famous Dota2 and to show again how much we have advanced Artificial Intelligence.

Evolving Simple Programs for Playing Atari Games

2 August 2018
Neural network playing Atari game

Evolving Simple Programs for Playing Atari Games

While a great number of researchers in Artificial Intelligence have focused their efforts on Deep Reinforcement Learning trying to beat human players on Atari games, researchers from the University of…

While a great number of researchers in Artificial Intelligence have focused their efforts on Deep Reinforcement Learning trying to beat human players on Atari games, researchers from the University of Toulouse and the University of York have proposed a different approach using Evolutionary Algorithms. Using Cartesian Genetic Programming (CGP) they achieve state-of-the-art results on a number of games and are competitive on many other.

Not only that this represents the first (as the authors’ report) use of Cartesian Genetic Programming as a game playing agent, but also one of the few studies where CGP is used in reinforcement learning tasks. Surprisingly, this evolutionary approach beats Deep Reinforcement Learning (Deep-Q Networks) approaches many benchmark games in the Arcade Learning Environment (ALE). Taking into account the fact that Deep RL methods have been applied to this problem for quite some time, gradually improving the results through many refinements, combining many techniques, the success of the proposed CGP approach is remarkable as it is the first attempt towards this problem.

What is Cartesian Genetic Programming?

As the name suggests, Cartesian Genetic Programming is a special form of Genetic Programming where programs are encoded as directed, acyclic graphs indexed by Cartesian Coordinates. The idea is to represent a program as a graph in which exist three types of nodes: input nodes, output nodes and functional nodes (which are defined by a set of evolved genes) in a Cartesian coordinate system. The functional nodes encode an operation (function) and connect to input nodes or other functional nodes via their cartesian coordinates.

As in all evolutionary algorithms, defining a proper genotype is very important and sometimes crucial in the process of finding an optimal solution by evolution. In CGP, the choice of a function set is important and in this work, many of the standard mathematical and comparison functions are used as well as a list of list processing functions. The researchers tried to keep the function set as simple as possible, however, they report that the function set used in this work is pretty large and they leave the search for a minimal necessary function set as a future work. The function set used is given in the tables below.

A part of the function set used in the proposed method.
A part of the function set used in the proposed method.

The Method

The Arcade Learning Environment (ALE) provides games where playing can be seen as processing an image input and giving a specific command for playing at different time step. As mentioned before, playing Arcade games requires learning how to play by maximizing the game score. That is why RL has been used in most of the previous approaches as well as in this work.

The Genotype

As discussed above, in CGP a program is defined by a graph, and this graph is positioned in a rectangular grid with R rows and C columns. Each node is allowed to connect to any node of the previous columns based on a distance – connectivity parameter. A node is represented by four float numbers in the range [0.0, 0.1] corresponding to x input, y input, function, p parameter. The x and y values are used to determine the inputs to n. The function gene is cast to an integer and used to index into the list of available functions, determining the function that the node uses. The parameter p is passed to the function as additional input as some functions require.

In the Cartesian grid, C columns are used and 1 row, where all the nodes are arranged. It starts with input nodes that are not evolved and continues with functional nodes ordered by the ordering in the genome. Output genes are rounded down to determine the indices of the nodes which will give the final program output. Each node in the graph has an output value, which is initially set to the scalar 0. At each step, first, the output values of the program input nodes are set to the program inputs. Then, the function of each program node is computed once, using the outputs from connected nodes of the previous step as inputs. The scheme of an active graph is given in the image.

Evolution

At initialization, a random genome is created using G uniformly distributed values in [0.0, 1.0]. This individual is evaluated and is considered the first elite individual. At each generation, λ offspring are generated using genetic mutation. These offspring are each evaluated and, if their fitness is greater than or equal to that of the elite individual, they replace the elite. This process is repeated until Neval individuals have been evaluated; in other words, for Neval / λ generations.

Evolution Parameters

The genetic mutation operator randomly selects M nodes of the program node genes and sets them to new random values, drawn from a uniform random distribution in [0.0, 1.0]. The final parameters used are:

  • C = 40 (40 active nodes)
  • r = 0.1
  • λ = 10 (number of offspring generated at each generation).
  • M nodes = 0.1
  • M output = 0.6

Playing Atari Games

Using CGP to play Atari
Scheme: Using CGP to play Atari. Red, green, blue pixel matrices are input to the evolved program, and evolved outputs determine the final controller action.

Given the genotype and evolution parameters, the CGP approach was evaluated on Arcade Games in the ALE environment where exist 18 actions: directional movements of the controller (8), button pressing (1), no action (1), and controller movement while button pressing (8). The ALE game screen is represented by RGB channels of a sequence of images defining the input to the method. To be able to compare with human-players frame skipping was introduced with the probability of 0.25 to simulate delay or control lag.

The success of many of the arcade games is explained by the researchers by the evolution process which selects individuals based on their overall performance and not based on individual actions. The policy which the agent represents will, therefore, tend towards actions which, on average, give very good rewards. In the paper, they show that the agents very often employ simple playing strategies, that are on average effective.

The Kung-Fu Master crouching approach and the functional graph of the player
The Kung-Fu Master crouching approach and the functional graph of the player

In conclusion, the success of Cartesian Genetic Programming in RL task is remarkable and its capability to evolve simple, yet effective programs is very clear. This interesting approach raises again the question if we need evolutionary algorithms to push further the capabilities of AI.