This post is about the selection of the ucl and lcl multipliers. From the tests performed so far it has been established that all the data for any single sine wave period can be aggregated into one overall distribution for that period. The nature of each distribution, as seen from looking at the density plots, suggests to me that taking the extreme values of each distribution would be acceptable as the sharp drop offs in the tails mean that these values would not be far removed from those that would chop off the tails but still encompass the vast majority of the data in said distribution (we are talking here about differences of only a few hundredths of a decimal point in the multiplier values). So that is exactly what I have done: aggregated all the distributions and taken the maximum and minimum values and plotted them against the sine wave period on the x-axis (plot shown above).
The thing that strikes me when looking at this plot is that there seems to be a natural upper and lower boundary that is consistent across all periods, which is quite fortuitous as this means it will not be necessary to write a complicated function with numerous if statements to check the period and apply a unique multiplier value for that period; it will simply suffice to apply the upper and lower boundary values regardless of period. For the time being at least, I have decided to set these boundaries values at the 0.025 and 0.975 quantile levels because
- "Quantiles are useful measures because they are less susceptible to long-tailed distributions and outliers. Empirically, if the data you are analyzing are not actually distributed according to your assumed distribution, or if you have other potential sources for outliers that are far removed from the mean, then quantiles may be more useful descriptive statistics than means and other moment-related statistics."
The next thing to do is write this function and test it.