• Price List
• Contact

Backpropagation Neural Networks.

The most popular and powerful type of NN used in Cortex software package for technical analysis of Stocks and FOREX financial markets.

While learning about the Neural Networks, I heve found two categories of tutorials. First is written by matematicians and for the matematicians. To understand it, you need to be used to use your brain in particular way, which most of the population can not do. From the user's point of view, this kind of tutorial is not very friendly, as it is offering no tools, just the theory. And from the programmer's point of view...

You see, there is a difference in the way matematician (x = f(y)) and a programmer (pnInput = GetMovingAverage(pnSensorInput)) thinks. The difference that makes understanding of the algorithms a complex task for people that prefer the Hungarian Notation. Also, the mathematical solution does not always focus on the implementation details, so when the algorithm described in the textbook is translated into computer program - it usually does not work. And then the programmer is left on his own.

In this article I will try to talk about one particular type of Neural Networks, called backpropagation networks. It is the most popular network for practical applications and a very powerful tool.

I am going to use examples produced by the program that you can find on this site, called Cortex.

This tutorial is NOT just for programmers. Anyone can read it and understand it. It's just I had a choice of using mathematical formulas or programming code to describe algorithms - and the choice was made to use programming code. Skip it, if you do not understand it.

What is a Neural Network?

The area of Neural Networks probably belongs to the borderline between the Artifficial Intelligence and Approximation Algorythms. Think of it as of algorithms for "smart approximation". The NNs are used in (to name few) universal approximation (mapping input to the output), tools capable of learning from their environment, tools for finding non-evident dependencies between data and so on.

The Neural Networking algorithms (at least some of them) are modelled after the brain (not necessarily - human brain) and how it processes the information. The brain is a very efficient tool. Having about 100,000 times slover responce time than computer chips, it (so far) beats the computer in complex tasks, such as image and sound recognition, motion control and so on. It is also about 10,000,000,000 times more efficient than the computer chip in terms of energy consumption per operation.

The brain is a multi layer structure (think 6-7 layers of neurons, if we are talking about human cortex) with 10^11 neurons, structure, that works as a parallel computer capable of learning from the "feedback" it receives from the world and changing its design (think of the computer hardware changing while performing the task) by growing new neural links between neurons or altering activities of existing ones. To make picture a bit more complete, let's also mention, that a typical neuron is connected to 50-100 of the other neurons, sometimes, to itself, too.

To put it simple, the brain is composed of neurons, interconnected.

Structure of a neuron.

Our "artifficial" neuron will have inputs (all N of them) and one output: As you can see, the neuron has:
Set of nodes that connects it to inputs, output, or other neurons, these nodes are also called synapses.
A Linear Combiner, which is a function that takes all inputs and produces a single value. A simple way of doing it is by adding together the dInput (in the case if you are not a programmer - a "d" prefix means "double", we use it so that the name (dInput) represents the floating point number) multiplied by the Synaptic Weight dWeight:

```for(int i = 0; i < nNumOfInputs; i++)
dSum = dSum + dInput[i] * dWeight[i];
```

An Activation Function. We do not know what the Input will be. Consider this example - the human ear can function near the working jet engine and in the same time - if it was only ten times more sensitive, we would be able to hear a single molecule hitting the membrain in our ears! What does that mean? It means that the input should not be linear. When we go from 0.01 to 0,02, the difference should be comparable with going from 100 to 200.

How do we make a non-linear input? By applying the Activation function. It will take ANY input from minus infinity to plus infinity and squeeze it into the -1 to 1 or into 0 to 1 interval.

Finally, we have a treshold. What the INTERNAL ACTIVITY of a neuron should be when there is no input? Should there be some treshold input before we have the activity? Or should the activity be present as some level (in this case it is called a bias rather than a treshold) when the input is zero?

For simplicity, we (as well as the rest of the world) will replace the treshold with an EXTRA input, with weight that can change during the learning process and the input is fixed and always equal (-1). The effect, in terms of mathematical equations, is exactly the same, but the programmer has a little more breathing room ;)

A neural Net.

A single neuron by itself is not a very useful pattern recognition tool. The real power of neural networks comes when we combine neurons into the multilayer structures, called... well... neural networks.

The following image represents a simple neural net: As you can see, there are 3 layers in our network (we can make it more, but if we make it less - we will have a less capable net. Making 4 layers is sometimes useful when you are looking for a non-evident things. And I have never seen a problem that requires 5 layers. For 99 percent of tasks, 3 layers is the best choice). There are N neurons in the first layer, where N equals number of inputs. There are M neurons in the output layer, where M equals number of outputs. For example, when you are building the network capable of predicting the stock price, you might want the (yesterday's) hi, lo, close, volume as inputs and close as the output.

You may have any number of neurons in the inner (also called "hidden") layers. Just remember, that if you have too few, the quality of a prediction will drop and the net doesn't have enough "brains". And if you make it too many - it will have a tendency to "remember" the right answers, rather than predicting them. Then your neural net will work very well on the familiar data, but will fail on the data that was never presented before. Finding the compromice is more of an art, than science.

Teaching the Neural Net.

The NN receives inputs, which can be a pattern of some kind. In case of an image recognition software, for example, it would be pixels from the photo sensitive matrix of some kind, in case of a stock price prediction, it would be the "hi" (input 1), "low" (input 2) and so on.

After the neuron in the first layer received its input, it applies the Linear Combiner and the Activation Function to the inputs and produces the Output. This output, as you can see from the picture, will become the input (one of them) for the neurons in the next layer. So the next layer will feed forward the data, to the next layer. And so on, until the last layer is reached.

Let's use our example with the stock price. We will try to use yesterday's stock price to predict today's price. Which is the same as using today's price to predict tomorrow's price...

When we work with yesterday's price, we not only know the price for the "day - 1", but also the price we are trying to predict, called the DESIRED OUTPUT of the Neural Net. When we compare the two values, we can compute the Error:
dError = dDesiredOutput - dOutput;

Now we can adjust this particular neuron to work better with this particular input. For example, if the dError is 10% of the dOutput, we can we can increase all synaptic weights of the neuron by 10%.

The problem with this approach is that the next input will require a different adjustment.

But what if for each pattern we perform a SMALL adjustment in the right direction? To do it, we need to introduce couple of new variables.

The learning rate. Say, we found that for this particular pattern, an adjustment should be 10%. Then we perform the following operation:
dNewWeight = dOldWeight * dAdjustment * dLearningRate.

The learning rate (dLearningRate) is a importance of a single pattern. For example, we can set it to 0.01, then it will take 100 patterns to make a 10% adjustment.

Momentum is not something that we NEED, but it can speed up calculations signifficantly. Consider this: we have 100 patterns and we noticed that each moves us 0.01% towards some value. Wouldn't it be better to move faster - as long as we keep moving in the same direction? Think of the learning rate as of acceleration and think of momentum as of the speed.

As the NN is learning, the errors will decrease (as the network is getting better) and we will have to adjust the learning rate and momentum. You can download the Cortex software that does it already ;)

Once we decided what adjustment we need to apply to the neurons in the output layer, we can backpropagate the changes to the previous layers of the network. Indeed, as soon as we have desired outputs for the output layer, we can make adjustment to reduce the error (the difference between the output and the desired output). Adjustment will change weights of the input nodes of the neurons in the output layer.

But the input nodes of the last layer are OUTPUT nodes of the previous layer! So we have the actual output of the previous layer and the desired output (after correction) - and we can adjust the previous layer of the net! And so on, until we reach the first layer.

If you know C++, consider this example:

```double dOld;
for(int i = 0; i < m_nInputs - 1; i++)
{
dOld = m_pdWeights[i];
if(nLayerType != KH_INPUT_LAYER)
{
m_pdWeights[i] = m_pdWeights[i] +
m_dMomentum * (m_pdWeights[i] - m_pdOldWeights[i])
+ m_dLr * m_dGrad * m_ppFrom[i]->m_dOutput;
}
else
{
m_pdWeights[i] = m_pdWeights[i] +
m_dMomentum * (m_pdWeights[i] - m_pdOldWeights[i])
+ m_dLr * m_dGrad * pdIn[i];
}
m_pdOldWeights[i] = dOld;
}

dOld = m_pdWeights[i];
m_pdWeights[i] = m_pdWeights[i] +
m_dMomentum * (m_pdWeights[i] - m_pdOldWeights[i]) +
m_pdOldWeights[i] = dOld;
```

Notice the last few lines - we are dealing with an "extra" node that represents a treshold.

Feedforward Backpropagation Algorithm summary.

Initialization.

We need to create a network and to set the synaptic weights to some random values.

Feed forward the training patterns.

Generally, we want to have two different sets of data, one for "training" and one for "testing". The reason for that is simple - if we only test the NN on the same set of data, that was used for the training, we do not know if it learned to "predict" or to "memorize" the patterns.

Present a training example:

```double Neuron::Forward(double* pnInput, int nLayerType)
{
double dLinearCombiner = (-1) * m_pdWeights[m_nInputs - 1];

if(nLayerType != KH_INPUT_LAYER)
for(int i = 0; i < m_nInputs - 1; i++)
dLinearCombiner +=
m_pdWeights[i] * m_ppFrom[i]->m_dOutput;
else
for(int i = 0; i < m_nInputs - 1; i++)
dLinearCombiner += pnInput[i] * m_pdWeights[i];

m_dOutput = Activation(dLinearCombiner);

return m_dOutput;
}
```

Back propagation.

After the input pattern was presented to the network and processed by all layers, we have errors (the difference between what we want and what we got) that can be used to adjust the network.

```double Neuron::Back(double* pdIn, double dDesiredOutput,
int nNum, int nLayerType, BOOL nDoUpdates)
{
if(nLayerType == KH_OUTPUT_LAYER)
m_dError = (dDesiredOutput - m_dOutput);
else
{
double d = 0.0;
for(int i = 0; i < m_nOutputs; i++)
d += m_ppTo[i]->m_dError
* m_ppTo[i]->m_pdWeights[m_pnNodeNumber[i]];
d /= m_nOutputs;
m_dError = d;
}

m_dGrad = m_dError * m_dOutput * (1 - m_dOutput);

{
double dOld;
for(int i = 0; i < m_nInputs - 1; i++)
{
dOld = m_pdWeights[i];
if(nLayerType != KH_INPUT_LAYER)
m_pdWeights[i] = m_pdWeights[i] +
m_dMomentum * (m_pdWeights[i]
- m_pdOldWeights[i]) +
else
m_pdWeights[i] = m_pdWeights[i] +
m_dMomentum * (m_pdWeights[i]
- m_pdOldWeights[i]) +
m_pdOldWeights[i] = dOld;
}

dOld = m_pdWeights[i];
m_pdWeights[i] = m_pdWeights[i] +
m_dMomentum * (m_pdWeights[i]
- m_pdOldWeights[i]) + m_dLr * m_dGrad * (-1);
m_pdOldWeights[i] = dOld;
}

// Fuzzy controller

...
}
```

The "fuzzy controller" is used to come up with the creative values of learning rate and momentum, as without them the Neural Networks can be extremely slow as sometimes the oscillations, local minimums and other obstacles can take a lot of processor time.

Practical considerations.

Regardles if you are using the Cortex neural network solution, or your own program, there are come things to keep in mind.

Speed.

The neural networks are very fast when implemented as a hardware structure, but not all of them are fast on the computer (designed as a non-parallel tool). The smaller the NN is, the faster it works, which is especially important when you work with real time applications, like voice recognition, for example.

The right choice of data.

You can feed ANYTHING into the NN. And it can become a problem. By providing the irrelevant data (is "open" relevant for the stock price prediction? How about 100 stock indicators that you can get online? How about currency exchange rates?) you are introducing the noice to the system and making the learning more difficult.

Stability of a solution.

When you perform training of a NN, you might see the error decreasing for the learning set of data, and increasing for the "test" data. It might be because the network is "overtrained" and is now memorizing the patterns rather than learning to be creative. Or it can be due to some local minimum - and in this case the situation may improve as the training continues. See the Cortex tutorial for examples.

When the NN parameters are wrong or when your algorithm for adjusting weights, learning rate and momentum is not working well, the oscillations may happen. Imagine the network that produced an error -2. And it was adjusted. And the new error is +2. And the next error is -2 again... How long will it take for this system to learn? Very long... On the other side, if the learning rate and momentum are small, the network parameters will improve towards the best solution - but at a very low speed. It might take hours or days even on the fastest computer to optimize such a network for a simple problem.

The solution is to use what I called a Fuzzy Controller, which is basically the way to dynamically adjust learning rate and momentum depending on the current and past values of network errors. It is implemented in the Cortex software.

Can we use our results?

When it comes to practical applications, we need to consider the fact that data are constantly changing. The network that works today may not work tomorrow - so we have to create (teach) a new one. Do we have enough time? Do we know for sure that it will work?

Consider the following example. We created a neural network that produces "buy", "sell" and "hold" signals for the stock market. It worked fine for few days, and then we decided to teach a new net with the new data. And it works fine - except there is a "hold" where we had a "sell" yesterday. We already sold our shares, but this new network does not approve - what should we do?

This problem has nothing to do with neural computations, by the way. If done properly, it will work just fine.

Where now?

Download the Cortex package. It is a simple interface and powerful underlying neural network. You can use it as a data analysing tool (which means - by itself) or - if you are an advanced user - you can teach a NN and then use the result from your own application. You can also use the built-in scripting language, to perform some tasks in automatic way.

How? The Cortex comes as three pieces. One is the user interface, another is a scripting language, and the last one is a DLL. After you have teached the NN, you don't need the User Interface - just the DLL.

If you don't know how - read the manual. And if you still don't understand, or simply do not want to concern yourself with DLL, C++ and other programming issues - keep using the Cortex user's interface - it can do everything you need.

As for the scripting language... Imagine, that you have data that need to be converted to the form Cortex can understand. You can do it using the scripting language.

Or imagine, that you want to find out, what MINIMUM number of neurons you can use in your network. You can do it by hand, trying N=5, N=6, N=7 and so on, and waiting for each NN to learn, so that you can take a look at the charts it produces. Or you can write a simple script (based on the examples that come with Cortex, so all you need to do is to edit existing code), and Cortex will AUTOMATICALLY try all the necessary combinations of parameters you want, while you are doing something else.

Or let's say, you want to create and test a complete trading system, one that you can then use for a "real" trading. You can do it, using the Cortex and its scripting language, no matter if this trading system uses Neural Networks, or not.

There are more types of the neural networks, some of them are very specialized. I might write about them, so keep checking this page.