Wednesday, 24 September 2025

Expressing an Indicator in Neural Net form, Part 2.

Following on from my previous post, I have been experimenting with various tweaks to the basic set-up of expressing an indicator in the form of a neural net, shown again below to avoid the necessity of having readers flip between posts. 

The tweaks explored relate to the weight matrices, activation functions, targets and loss functions, the end results of which are now briefly summarized in turn.

The weight matrices are sparse and with shared weights within each separate matrix because they simply calculate the 4 indicator outputs t which represent the indicator and 3 lagged outputs of it, hence sharing the same weights. The input weights matrix is a 40 by 20 matrix with only 10 unique parameters to train/update (expressed during training by averaging across the 40 non zero weights in the matrix). Similarly, the output weights matrix only has 1 unique trainable weight. Together with bias units (not shown in the diagram above) and the 4 decision weights, the whole neural net has a total of 16 trainable weights.

The activation units are exactly as discussed in my previous post and shown in the diagram above, namely tanh on the 2 hidden layers and a sigmoid on the final output layer.

For the targets I used the sign of the slope of a sliding, centred smooth of the OHLC bar opens, on the premise that the first trade opportunity would be acted upon at the opening price following a buy/sell signal calculated on the close of a bar. The smoothing eliminates the whipsaws that would result from a single adverse return in an otherwise smooth set of returns.   

The loss is a combination of a weighted cross-entropy loss and a Sortino ratio loss. The weights for the weighted cross-entropy loss are simply the absolute values of the returns, the idea being that it is more important to get the direction right on the big returns. The Sortino loss is implemented by minimizing the negative of the Sortino ratio, i.e. maximizing the ratio. There are 2 versions of this Sortino loss: an analytical gradient that spreads the cost over each training sample per training epoch similar to the cross-entropy loss, and a global loss which uses finite differences over the 4 output decision weights. Both of these Sortino losses were coded with the help of deepseek-talkai. The losses are mixed via a mixing parameter, Lambda, which varies from 0 to 1, with 0 being a pure cross-entropy loss and 1 being pure Sortino loss. 

The following .gif shows the training data equity curves for each type of loss (see the titles of each plot) for every hundredth epoch up to epoch 600 over a week's worth of 10 minute forex data limited to the trading hours described in my previous post. The legend in the top left corner identifies each equity curve.

Obviously these plots show that it is possible to improve results over those of the original indicator (shown in red in the above .gif), but of course these are in-sample results. My next post will be about cross validation and out-of sample test results.
 

Wednesday, 3 September 2025

Expressing an Indicator in Neural Net Form

Recently I started investigating relative rotation graphs with a view to perhaps implementing a version of this for use on forex currency pairs. The underlying idea of a relative rotation graph is to plot an asset's relative strength compared to a benchmark and the momentum of this relative strength and to plot this in a form similar to a Polar coordinate system plot, which rotates around a zero point representing zero relative strength and zero relative strength momentum. After some thought it occurred to me that rather than using relative strength against a benchmark I could use the underlying relative strengths of the individual currencies, as calculated by my currency strength indicator, against each other. Furthermore, these underlying log changes in the individual currencies can be normalised using the ideas of brownian motion and then averaged together over different look back periods to create a unique indicator.

This indicator was coded up in the "traditional" way that will be familiar to anybody who has ever tried coding a trading indicator in any coding language. However, I thought that if the calculation of this indicator could be expressed in the form of a Feedforward neural network then the optimisation opportunities available using backpropagation and regularization could be used to tweek the indicator in more useful ways than just varying a look back length and amount of averaging. After some work I was able to make this work in just these two lines of Octave code:

indicator_middle_layer = tanh( full_feature_matrix * input_weights ) ;
indicator_nn_output = tanh( indicator_middle_layer * output_weights ) ; 

Of course, prior to calling these two lines of code, there is some feature engineering to create the input full_feature_matrix, and the input weights and output_weights matrices taken together are mathematically equivalent to the original indicator calculations. Finally, because this is a neural net expression of the indicator, the non-linear tanh activation function is applied to the hidden middle and output layers of the net.

The following plot shows the original indicator in black and the neural net version of it in blue 

over the data shown in this plot of 10 minute bars of the EURUSD forex pair. 

The red indicator in the plot above is the 1 bar momentum of the neural net indicator plot.

To judge the quality of this indicator I used the entropy measure (measured over 14,000+ 10 minute bars of EURUSD), the results of which are shown next.

entropy_indicator_original = 0.9485
entropy_indicator_nn_output = 0.9933

An entropy reading of 0.9933 is probably as good as any trading indicator could hope to achieve (a perfect reading is 1.0) and so the next thing was to quickly back test the indicator performance. Based on the logic of the indicator the obvious long (short) signals are being above (below) the zeroline, or equivalently the sign of the indicator, and for good measure I also tested the sign of the momentum and some simple moving averages thereof.

The following plot shows the equity curves of this quick test where it is visually clear that the blue equity curves are "the best" when plotted in relation to the black "buy and hold" equivalent equity curve. These represent the equity curves of a 3 bar simple moving average of the 1 bar momentum of both the original formulation of the indicator and the neural net implementation. I would point out that these equity curves represent the theoretical equity resulting from trading the London session after 9:00am BST and stopping at 7:00am EST (usually about noon BST) and then starting trading again at 9:00am EST until 17:00pm EST. This schedule avoids the opening sessions (7:00 to 9:00am) in both London and New York because, from my observations of many OHLC charts such as shown above, there are frequently wild swings where price is being pushed to significant points such as volume profile clusters, accumulations of buy/sell orders etc. and in my opinion no indicator can be expected to perform well in that sort of environment. Also avoided are the hours prior to 7:00am BST, i.e. the Asian session or overnight session.    

Although these equity curves might not, at first glance, be that impressive, especially as they do not account for trading costs etc. my intent on doing these tests was to determine the configuration of a final "decision" output layer to be added to the neural net implementation of the indicator. A 3 bar simple moving average of the 1 bar momentum implies the necessity to include 4 consecutive readings of the indicator output as input to a final " decision" layer. The following simple, hand-drawn sketch shows what I mean:
A discussion of this will be the subject of my next post.