- I intend to use the indicator as the target function for future Neural net training
- the indicator represents a reward to risk ratio, which indirectly reflects price action itself, but without the noise of said action
- this reward to risk ratio is of much more direct concern, from a trading perspective, than accurately predicting price
- since the indicator is now included as a feature in the matching algorithm, testing the indicator is, very indirectly, a test of the matching algorithm too

This shows two sampling distributions of the mean for Long MFE/MAE indicator values > 0.5, the upper pane for sample sizes of 20 and the lower pane for 75. For simplicity I shall only discuss the Long > 0.5 version of the indicator, but everything that follows applies equally to the Short version. As expected the upper pane shows greater variance, and for the envisioned test a whole series of these sampling distributions will be produced for different sampling rates. The way I intend it to work is as follows:

- take a single bar in the history and see what the value of the MFE/MAE indicator value is 3 bars later (assume > 0.5 for this exposition, so we compare to long sampling distributions only)
- get the top 20 matched bars for the above selected bar and the corresponding 20 indicator values for 3 bars later and take the mean of these 20 indicator values
- check if this mean falls within the sampling distribution of the mean of 20, as shown in the upper pane above by the vertical black line at 0.8 on the x axis. If it does fall with the sampling distribution, we accept the null hypothesis that the 20 best matches in history future indicator values and the value of the indicator after the bar to be matched come from the same distribution
- repeat the immediately preceding step for means of 21, 22, ... etc until such time as the null hypothesis can be rejected, shown in the lower pane above. At this point, we then then declare an upper bound on the historical number of matches for the bar to be predicted

where the cyan and red lines are the +/- 2 standard deviations above/below a notional mean value for the whole distribution of approximately 0.85, and the chart can be considered to be a type of control chart. The upper and lower control lines converge towards the right, reflecting the decreasing variance of increasingly large N sample means, as shown in the first chart above. The green line represents the cumulative N sample mean of the best N historical matches' future values. I have shown it as decreasing as it is to be expected that as more N matches are included, the greater the chance that incorrect matches, unexpected price reversals etc. will be caught up in this mean calculation, resulting in the mean value moving into the left tail of the sampling distribution. This effect combines with the shrinking variance to reach a critical point (rejection of the null hypothesis) at which the green line exits below the lower control line.

The purpose of all the above is provide a principled manner to choose the number N matches from the Cauchy-Schwarz matching algorithm to supply instances of training data to the envisioned neural net training. An incidental benefit of this approach is that it is indirectly a hypothesis test of the fundamental assumption underlying the matching algorithm; namely that past price action has predictive ability for future price action, and furthermore, it is a test of the MFE/MAE indicator. Discussion of the results of these tests in a future post.