■ Reinforcement learning checkers/draughts player for beginners

Whereas some under lockdown and key will discover the foundation of all physics (isn't the answer 42?) I set myself the humbler task of teaching my computer the game of Checkers. The oxymoron is that I don't know how to play the game myself, but this is no impediment for reinforcement learning techniques, where a PC will reach, after some trial and error, a player status far exceeding that of your humble narrator.

The game of checkers is the Drosophila of AI experimentation, where any aspiring AI amateur will try to teach the machine to exceed its master in achievement. It was the first big success of artificial intelligence when A. Samuel developed his checkers player (1960s) in whatever they called computers back then. Since then it was treated untold number of times by scientists, culminating to its complete and final solution (claimed perfect play by Chinook in 2007).

Thus there's no point seeking merit by yet another exposition of the checkers AI tale, so I thought I'd take a different tack: use it to teach you, the accidental browser of this blog, the basics of reinforcement learning, which are much simpler than you would have thought — including the full source code of RL learning and playing checkers, which anyone can try without reliance on cloud mega-CPU resources or AI architectures.

You would have thought that AI takes exceptional intelligence, but instead it requires just high school mathematics, so it is within the grasp of most people with functioning brain cells. It is mostly art combined with large data sets and massive CPU power — none of which are needed for our checkers player. The basic idea of RL is to let the PC play with itself thousands of games, and keep track of those actions that lead to victories, and learn to evaluate board states that are most likely to bring reward rather than penalty. It is the ultimate carrot and stick game, and the computer can do it almost on its own, as long as we supply the following:

State transition model. A fancy name that means "how the game is played". How the checkers board is setup, which are the allowable moves, how you capture opponent's pawns etc. It is game-specific and takes up most of the source code.
Board value estimation. How does the PC choose the best move at each game point? It must be able to assign a numeric value for each board state, and choose the one with the highest value (=probability of winning). This is where most math comes in, but at its simplest the value estimator is a linear function of board features (e.g. number of kings), V(s,w) = w*s and the task is to find the weights w that lead to victories for our guy.
Game exploration policy. Here we let the PC loose to play against itself to learn the above value function. Its only feedback is the outcome of the game (win or lose), which is assigned to earlier board positions using a discounting policy. The partially known value function V(s,w) is used to make decisions (moves) during learning, with a fixed percent of pure random (exploratory) moves.
The 2 game opponents share the same value function V: the main player plays for maximum board value, whereas the "other" plays for minimum (maximum from its point of view), aka the minimax principle.

The said high school math is trying to find the weights that best predict the game outcomes, which is known as least squares estimation. Instead of solving the full optimization problem, the weights w are progressively estimated using the following recursive stochastic semi-gradient descent formula:

It's called semi-gradient euphemistically, because it is half-correct mathematically!

Here the target value T(s) is set by each game's outcome, and V(s,w) is our current value prediction, which we try to improve. As for the gradient (derivative), it is the feature vector s itself in case of linear function estimation! (or backpropagation if using a neural network estimator). The multiplier α is the learning rate, one of the many numerical factors we must tweak to ensure the PC learns a good game policy.

I won't say it is all too easy, as there are many core techniques to master, and solution parameters to tweak by guesswork (trial and error), but it isn't rocket science. There is a free book for reinforcement learning by the main man of RL Sutton, and having the complete source code will guide you through all the questions that may occur to you that theory books cannot answer.

Click to download checkers AI learner/player (217 KB)

Includes VS6 (C++) source code without dependencies, for training and an ANN-based player for your amusement and edification
FANN library is included for neural network training

Along with the source code you will find a simple DOS (text mode) self-taught player CHECKERS.EXE, that isn't much to look at, but can be adequately managed with the keyboard (kings show green). It became checkers-aware after just 15000 games. As I don't know to play checkers, it beats me every time. If you are a better draughts player, please let me know if it is any good or rubbish? Thank you!

The player can get harder to beat if you enable rollouts, where it will search a few positions in the future to decide the best move. Just pass the number of steps to look ahead as a command line argument, e.g. CHECKERS.EXE 4

draughts player