Quantcast
Channel: GameDev.net
Viewing all articles
Browse latest Browse all 17825

Reinforcement Learning for Games

$
0
0
This article assumes that the reader already knows what neural networks are and how they operate.
If you do not know what neural networks are, I recommend trying out the great tutorials at AI Junkie.

Neural networks are often overlooked when considering game AI. This is because they once received a lot of hype but the hype didn't amount to much. However, neural networks are still an area of intense research, and numerous learning algorithms have been developed for each of the 3 basic types of learning: supervised, unsupervised, and reinforcement learning.

Reinforcement learning is the learning algorithm that allows an agent to learn from its environment and improve itself on its own. This is the class of learning algorithms we will focus on in this article. This article will discuss the use of genetic algorithms as well as an algorithm the author has researched for single-agent reinforcement learning. This article assumes that the neural networks are simple integrate and fire, non-spiking sigmoidal activation neural networks.

Genetic Algorithms


The concept


Genetic algorithms are one of the simplest but also one of the most effective reinforcement learning methods. It does have one key limitation though: It has to operate on multiple agents (AI's). Nevertheless, genetic algorithms can be a great tool for creating neural networks via the process of evolution.

Genetic algorithms are part of a broader range of evolutionary algorithms. Their basic operation proceeds as follows:

1. Initialize a set of genes
2. Evaluate the fitnesses of all genes
3. Mate genes based on how well they performed (performing crossover and mutation)
4. Replace old genes with the new children
5. Repeat steps 2 - 4 until a termination criterion is met

The genes can be either a direct encoding of the traits of the AI (neuron weights in the neural network case), or an indirect, "generative" encoding. Evaluating the fitnesses is where most of the processing time is spent, since it involves simulating a phenotype created from each genotype to assert how good it is at finishing the task. The fitnesses are recorded, and are typically scaled such that the lowest fitness is always 0. The reason for this is the mating step. Here, genes are selected based on their fitnesses using a selection function. A popular selection function is the fitness proportional "roulette wheel" selection function, which randomly chooses genes with likelihoods proportional to their fitnesses. Some selection functions such as fitness proportional selection require all positive fitnesses, which is why they are typically rescaled so the lowest is always 0.

When a pair of parents is selected, their genes are crossed over using a crossover function. This function depends on your encoding, but you generally want some traits of each parent to make it into the child without being destructive. After crossover, the genes are also mutated (randomly altered slightly) to help force the algorithm to perform more exploration of possible solutions.

Over time, genotypes will improve themselves, and often will even learn how to exploit faults in whichever system they are operating!

Next, we will discuss a particular gene encoding method for neural networks.

NEAT


The code accompanying performs the genetic algorithm following the NEAT (Neuro-Evolution of Augmenting Topologies) methodology created by Kenneth Stanley. As a result, the neural network genes are encoded by storing connections between neurons as index pairs (indexed into the neuron array) along with the associated weight, as well as the biases for each of the neurons.

This is all the information that is needed to construct a completely functional neural network from genes. However, along with this information, both the neuron biases and connection genes have a special "innovation number" stored along with them. These numbers are unique, a counter is incremented each time an innovation number is assigned. That way, when network genes are being mated, we can tell if connections share a heritage by seeing if their innovation numbers match. These can then be crossed over directly, while the genes without innovation number matches can be assigned randomly to the child neural networks.

This description is lacking in detail, but intends to simply provide an overview of the way the genetic algorithm included in the software package functions.

While this genetic algorithm works very well for many problems, it requires that many agents are simulated at a time rather than one just learning by itself. So, we will briefly cover another method of neural network training.

Local Dopamine Weight Update Rule with Weight and Output Traces


The concept


This method quite possibly has already been invented, but I could not find a paper describing the same method so far. This method applies to how neuron weights are updated when learning in a single agent scenario. This method is entirely separate from network topology selection. As a result, the included software package uses a genetic algorithm to evolve a topology for use with the single agent reinforcement learning system. Of course, one could also simply grow a neural network by randomly attaching new neurons over time.

Anyways, I discovered this technique after a lot of trial and error while trying to find a weight update rule for neural networks that operates using information available at the neuron/synapse level. It therefore could be biologically plausible. The method uses a reward signal, dopamine, to determine when the network should be rewarded or punished. In order to make this method work, one needs to add a output trace (floating point variable) to each neuron, as well as a weight trace to each neuron weight, except for the bias weight. Other than that, one only needs the reward signal dopamine, which can take on any value where positive means reward the network and negative means punish the network (0 is therefore neutral and doesn't do anything). When one has this information, all one needs to do is update the neural network weights after every normal update cycle using the following code (here in C++):

m_output = Sigmoid(potential * activationMultiplier);

m_outputTrace += -traceDecay * m_outputTrace + 2.0f * m_output - 1.0f;

// Weight update
for(size_t i = 0; i < numInputs; i++)
{
	m_inputs[i].m_trace += -traceDecay * m_inputs[i].m_trace + m_outputTrace * (m_inputs[i].m_pInput->m_outputTrace * 0.5f + 0.5f);
	m_inputs[i].m_weight += alpha * m_inputs[i].m_trace * dopamine;
}

// Bias update
m_bias += alpha * m_outputTrace * dopamine;

Where m_output is the output of the neuron, traceDecay is a value that defines how quickly the network forgets (ranges from [0, 1]), alpha is the learning rate, and m_inputs is an array of connections.

This code works as follows:

The output trace is simply an average output over time that decays if left untouched. The weight update simply moves the weight if dopamine is not equal to 0 (it doesn't have perfect fitness yet) in the direction that would cause it to output its average output less often if dopamine was negative or more often if dopamine was positive. However, it does so based on the weight trace, which measures how much a connection has contributed to the firing of the neuron over time, and therefore helps judge how eligible the weight is for a weight update. The bias doesn't use a weight trace, since it is always eligible for a weight update.

This method is able to solve the XOR problem with considerable ease (it easily learns an XOR gate). I tested it with simple feed-forward neural networks (1 hidden layer with 2 neurons), a growing neural cloud, and networks resulting from the NEAT algorithm.

Use in Games?


These methods seem like total overkill for games. But, they can do things that traditional methods can't. For instance, with the genetic algorithm, you can create a physics-based character controller like this one.

The animation was not made by an animator; rather the AI learned how to walk by itself. This results in an animation that can react to the environment directly. The AI in the video was created using the same software package this article revolves around (linked below).

The second discussed technique can be used to have game characters or enemies learn from experience. Enemies can for instance be assigned a reward for how close they get to a player, so that they try to get as close as possible to the player given a few sensory inputs. This method can also be used by virtual pet type games, where you can reward or punish a pet to achieve the desired behavior.

Using the Code


The software package accompanying this article contains a manual on how to use the code.

The software package can be found at: https://sourceforge.net/projects/neatvisualizers/

Conclusion


I hope this article has provided some insight and inspiration for the use of reinforcement learning neural network AI in games. Now get out there and make some cool game AI!

Article Update Log


2 July 2013: Initial release
14 July 2013: Heavy edits

Viewing all articles
Browse latest Browse all 17825

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>