This is a classic control problem that has received endless study from systems and artificial intelligence undergraduates over the years. But I never shy off reinventing the wheel for personal edification, so here's another cart-pole study which you may find interesting, especially because you can try it out for yourself if you have MatLab (all the code is included).
If you prefer Python, have a look at the OpenAI gym CartPole model
The simplest setting is a one-dimensional cart with an inverted pendulum (pole) fixed on it. The cart can move left and right to jerk the pole back in its upright position, whenever it tries to fall down. The only available control action is the force applied to the cart to move it left and right. A controller is successful if it manages to keep the pole in the upright position (within a range of angles) for as long as possible, while remaining within a small box in the x-direction.
If you are like me and don't have an actual physical implementation of the system, you can use a first principles model that can be used in a computer simulation (the ODEs are integrated with the Euler method), then apply various control policies on it to see which works the best. The mathematical model most people use for the inverted pendulum is based on Newton's laws of motion. |
The easiest way to keep the pole standing is to use feedback control (PID). Without knowing the exact model, the controller monitors the pole angle and tries to bring it back to 0 (=vertical) in the face of various disturbances. I chose the design parameters with a little trial and error and it controls it nicely and reasonably, using continuous control action.
One approach is to try out various random decision surfaces on the state using linear weight coefficients: if w*x >= 0 we use positive force, else we push in the other direction. Surprisingly this tactic can find a "good" controller after a few random trials, that can keep the pole flying for 10 seconds or more, albeit erratically (see the blue curve in the figure). We tilt the pole 5 degrees off the vertical and let the random controller recover it. The controller just manages to do it but the angle fluctuates widely. Compare this to the smooth response of the PID controller (black curve), which eliminates the angle offset completely after 4-5 seconds of balancing, using traditional feedback control.
A similar idea is hill-climbing search. Instead of trying purely random weights for the decision surface (force direction), we apply small changes to the weights w and check for improvement (the controller managing to keep the pole within limits for longer time). This can be combined with simulated annealing, to explore the optimization problem more robustly and escape from local optima. The resulting controller is a bit better but not by much (see the matlab source files for more details). |
Figure 1: recovery of a 5 degree starting pole angle |
Perhaps the most successful AI algorithm is reinforcement learning, which I previously explorerd playing tic-tac-toe and backgammon (and others used it to defeat the world Go champion and such feats). In its simplest form (Q-learning), the computer learns the best action for each system state, and stores it in a big table. But the cart-pole is a continuous problem, with infinite number of states, so one approach is to split up the range of each continuous state variable into boxes, effectively discretising the state space. Then traditional Q-learning algorithms can be applied. The resulting controller is not ideal (see the red curve in the above figure), but keeps the angle tighter around vertical. Keep in mind that this controller was designed to keep the pole flying the longest, not to keep it at zero angle. Some points of interest:
There are tons of tweaks you can apply here and there, so if you have MatLab and have nothing better to do in a cold rainy afternoon, download the cart pole matlab files and experiment away for your best inverted pendulum controller!
Post a comment on this topic »