Thursday, 4 October 2012

Change in Neural Net Training Plans

After having spent the last few days training my NNs and seeing how long it is taking on my new data I have decided to change my training plans. I had been simultaneously training (on two separate computers) my decision tree idea alongside a more "normal" multi-class NN in the hope of eventually comparing the two. However, I anticipate that if I continued with this two pronged approach it would take about a month to finish, and I'd like quicker results than that. Also my attempt to use the hyperbolic tangent activation function hasn't been too successful and I'm not sure whether it's my coding or some deeper theoretical reason why it isn't working satisfactorily. Another reason is that the Coursera Neural Nets for Machine Learning course has just started, the syllabus for which is shown below:-

Lecture 1: Introduction
Lecture 2: The Perceptron learning procedure
Lecture 3: The backpropagation learning procedure
Lecture 4: Learning feature vectors for words
Lecture 5: Object recognition with neural nets
Lecture 6: Optimisation: How to make the learning go faster
Lecture 7: Recurrent neural networks and advanced optimisation
Lecture 8: How to make neural networks generalise better
Lecture 9: Combining multiple neural networks to improve generalisation
Deep Autoencoders (including semantic hashing and image search with binary codes)
Hopfield Nets and Simulated Annealing
Boltzmann machines and the general learning algorithm
Restricted Boltzmann machines and contrastive divergence learning
Applications of Restricted Boltzmann machines to collaborative filtering and document modelling.
Stacking restricted Boltzmann machines or shallow autoencoders to make deep nets.
The wake-sleep algorithm and its contrastive version
Recent applications of generatively pre-trained deep nets
Deep Boltzmann machines and how to pre-train them
Modelling hierarchical structure with neural nets

I think that rather than ploughing on with the training of my decision tree NN it would perhaps be better to finish this course before I get too carried away with myself with new NN ideas; for example, lecture 9, or the "stacking of Boltzman machines," might give me much better insight to the issues involved.

For these reasons I have decided to retrain my "reserve NN" on my enlarged data set with my new feature set, using both computers available to me, whilst I work through the above course. I expect that this reserve NN will be fully trained before the course ends, so then I will be free to experiment with my newly acquired knowledge.

Monday, 1 October 2012

Dominant Cycle Periods in End of Day Data

As part of the training of my neural net classifier it has become necessary, in order to speed up the training process, for me to have knowledge of the relative frequency of occurrences of specific dominant cycle periods. This is due to my having increased the size of my training data to almost five and a half million training/cross validation examples of my feature set. Given the amount of data now involved in training I have to subset it by period and train my neural nets on one period at a time. To decide the period training order I created a histogram of historical, dominant cycle period occurrences in all the commodity contracts/forex pairs I follow for all the data I have. The histogram is shown below.
The bin range is from 10 to 50, with a maximum occurring, the cyan line, at 18. The NN training schedule will follow this order:- 18, 19, 17, 20 etc... working downwards along those bins with the remaining outstanding maximum occurrences.

Looking at this histogram I'm struck by a couple of things: although all the indicators I use are adaptive to the current measured period I'm sure many people use "classical" TA indicators with their recommended default values, for example the 14 day RSI or 20 day Bollinger Band. The above histogram shows how "hit and miss" this can be. The 20 day Bollinger Band would seem to be OK with its look back parameter approximately covering the full cyclic period of the most frequently occurring periods in the data, but the RSI? The theoretical optimum for the RSI is half a period, which at 14 is a cyclic period of 28 days, the red line in the histogram. Not so good! And what about the Commodity Channel Index (CCI)? Its creator originally recommended a one third period look back, which based on the histogram above would imply a default setting of 6 or 7. Just goes to show what value there is in some very basic statistical analysis.