■ Improve number crunching performance by switching C++ compiler

Artificial intelligence algorithms are very demanding in terms of CPU resources. Training an artificial neural network (ANN) with backpropagation takes millions of intense arithmetic iterations. AI research diverts good money to cloud computing for training ANN with Tensorflow and such deep learning libraries. I on the other hand am merely an AI amateur enthusiast. Instead of tensorflow and losing my head in the clouds, I use a down to earth basic backpropagation algorithm and my laptop for some machine learning experiments.

Trying to teach my computer to play the game of backgammon, it takes almost one hour to do a basic training run of 250,000 games. Tuning the learning parameters is an "art" (read: nobody knows how to do it properly), so it takes lots of runs to feel the effect of learning rate, number of hidden units and such parameters. Therefore it becomes imperative to optimize the code and reduce the execution time for a single run of the algorithm.

So I ran the code through the visual studio profiler and saw that most of the time was spent in the forward path of the ANN (predicting outputs), which means a bunch of floating point multiplications. Not much can be done to improve that part of the algorithm. Switching from double to float variables would save space but probably make performance worse (or better? the jury is still out on this double/float choice).

ps. I went ahead and tried switching all double variables to float (single precision), and the results are platform dependent:

32 bits. Execution time triples, everything must be converted to double and back, so avoid at all costs!
64 bits. A slight improvement, see below.

In x64 using the explicit single precision expf() in the sigmoid squashing function instead of exp() gave a 4% speed increase! There is a difference in the evolution of the weight values because of the lower precision but in the stochastic scheme of ANNs it may not be important.

Then it occured to me that I have many versions of visual studio, would that make any difference? Getting a performance improvement just switching a compliler (without changing a line of code) would be fantastic. As it turns out, visual studio 2012 compiler generates number crunching code 20% faster than VS6!

compiler	platform	Execution time
Visual studio 6	x86 (32 bit)	140 sec
Visual studio 2012	x86 (32 bit)	116 sec
Visual studio 2012	x64 (64 bit)	113 sec

Table 1. Effect of compiler for running 12,000 training games

In all cases the release version of the code was timed, built for maximum speed (/O2 compiler switch). Note how the 64 bit code is 2% better than the equivalent 32 bit code; not much but every little helps! Surprisingly other VS 2012 compiler options that look performance-sapping didn't make any difference. No detectable performance gain after tweaking:

Disabling C++ exceptions (might have affected deep recursion)
No buffer security check (/GS-)
Enhanced instruction set (/arch:SSE2)
Floating point model (/fp:fast)

After this windfall, no-effort gain, I spent a whole day trying to optimize multiplication loops and matrix storage, and in the end all I managed was shaving 4-5 seconds, which is very poor return for the diurnal mental expenditure. The branch predictors and cache prefetch took the best out of me.

The morale of the story is that my trusty and favorite VS6 is not the first choice for resource-hungry computations. Touche and conceded!