■ Cart-Pole control using artificial intelligence

If you ever tried to balance your umbrella on the palm of your hand, or ridden a Segway self-balancing scooter, you are familiar with the classic inverted pendulum (or cart-pole) control problem. The idea is to move the "cart" so as to keep the pole standing upright on it for as long as possible. (the video is not mine)

This is a classic control problem that has received endless study from systems and artificial intelligence undergraduates over the years. But I never shy off reinventing the wheel for personal edification, so here's another cart-pole study which you may find interesting, especially because you can try it out for yourself if you have MatLab (all the code is included).

If you prefer Python, have a look at the OpenAI gym CartPole model

The simplest setting is a one-dimensional cart with an inverted pendulum (pole) fixed on it. The cart can move left and right to jerk the pole back in its upright position, whenever it tries to fall down. The only available control action is the force applied to the cart to move it left and right. A controller is successful if it manages to keep the pole in the upright position (within a range of angles) for as long as possible, while remaining within a small box in the x-direction.

If you are like me and don't have an actual physical implementation of the system, you can use a first principles model that can be used in a computer simulation (the ODEs are integrated with the Euler method), then apply various control policies on it to see which works the best. The mathematical model most people use for the inverted pendulum is based on Newton's laws of motion.

The easiest way to keep the pole standing is to use feedback control (PID). Without knowing the exact model, the controller monitors the pole angle and tries to bring it back to 0 (=vertical) in the face of various disturbances. I chose the design parameters with a little trial and error and it controls it nicely and reasonably, using continuous control action.

Self-taught control using artificial intelligence

There are many ways a computer can teach itself to control the cart-pole system. Most reported solutions use the so called bang-bang control (e.g. see Barto et al 1983), where the agent is limited to two control actions, one kick to the left and one to the right, of equal magnitude. The system state x is described by 4 variables, the cart position and speed, and the pole's angle and angular speed.

One approach is to try out various random decision surfaces on the state using linear weight coefficients: if w*x >= 0 we use positive force, else we push in the other direction. Surprisingly this tactic can find a "good" controller after a few random trials, that can keep the pole flying for 10 seconds or more, albeit erratically (see the blue curve in the figure). We tilt the pole 5 degrees off the vertical and let the random controller recover it. The controller just manages to do it but the angle fluctuates widely. Compare this to the smooth response of the PID controller (black curve), which eliminates the angle offset completely after 4-5 seconds of balancing, using traditional feedback control.

A similar idea is hill-climbing search. Instead of trying purely random weights for the decision surface (force direction), we apply small changes to the weights w and check for improvement (the controller managing to keep the pole within limits for longer time). This can be combined with simulated annealing, to explore the optimization problem more robustly and escape from local optima. The resulting controller is a bit better but not by much (see the matlab source files for more details). Figure 1: recovery of a 5 degree starting pole angle

Perhaps the most successful AI algorithm is reinforcement learning, which I previously explorerd playing tic-tac-toe and backgammon (and others used it to defeat the world Go champion and such feats). In its simplest form (Q-learning), the computer learns the best action for each system state, and stores it in a big table. But the cart-pole is a continuous problem, with infinite number of states, so one approach is to split up the range of each continuous state variable into boxes, effectively discretising the state space. Then traditional Q-learning algorithms can be applied. The resulting controller is not ideal (see the red curve in the above figure), but keeps the angle tighter around vertical. Keep in mind that this controller was designed to keep the pole flying the longest, not to keep it at zero angle. Some points of interest:

The controller has 216 states and 3 actions per state (positive or negative force, or no force at all)
The only reward offered is the penalty when the angle falls a lot off the vertical. It could be also possible to offer a small positive reward each time the pole remains within limits.
Unlike other fixed policy controllers, this one doesn't have a maximum limit for each utility. Theoretically the perfect policy would keep the pole upright forever. So testing for "convergence" is a bit ad-hoc.
Due to the discretization of states, it is possible for the state to remain the same after an action is taken, but only some of the times. This is non standard for reinforcement learning and I handled it with another ad-hoc decision (see the matlab files for details).

There are tons of tweaks you can apply here and there, so if you have MatLab and have nothing better to do in a cold rainy afternoon, download the cart pole matlab files and experiment away for your best inverted pendulum controller!

Click to download cartpole AI matlab project (9KB)