Wednesday, 7 March 2012

Robust Repeated Median

Following on from a quick post to this Trading Blox forum thread I provide the C++ .oct function code for my implementation of the Robust Repeated Median in the code box below. Due to formatting issues the headers are not shown; they should be:-
#include octave/oct.h
#include octave/dColVector.h
#include octave/parse.h // necessary for the call to feval
#include octave/ov.h // necessary for conversion to double
enclosed within a header brace e.g. "<>"
DEFUN_DLD (my_rep_median, args, , "Input is a column vector of values, output is a repeated median straight line fit of these values.")
{
    octave_value_list retval_list;

    if ( args(0).length () < 3 )
    {
        error ( "Invalid arguments. Input is a column vector of values" );
        return retval_list;
    }
    
    if ( error_state )
    {
        error ( "Invalid arguments. Input is a column vector of values" );
        return retval_list;
    }
    
    ColumnVector rep_median_compile_input = args(0).column_vector_value (); // input vector
    ColumnVector rep_median_compile_output = rep_median_compile_input; // copy of input vector to hold repeated median straight line fit

    octave_value_list median_calc_return; // declare an octave value to hold output of feval call to Octave median function
    double slope; // declare a double value to hold final value of slope
    double intercept; // declare a double value to hold final value of intercept
    ColumnVector i_slopes ( args(0).length () - 1 ); // array to hold individual point-to-point slopes
    ColumnVector i_intercepts ( args(0).length () - 1 ); // array to hold individual point-to-point intercepts
    ColumnVector slope_medians ( args(0).length () ); // array to hold medians of non-matched point-to-point slopes
    ColumnVector intercept_medians ( args(0).length () ); // array to hold medians of non-matched point-to-point intercepts
    int jj_tmp_count; // non-matched point-to-point count for filling above arrays (necessary because of jj != ii loop condition)

    for (octave_idx_type ii (0); ii < args(0).length (); ii++) 
    {

      jj_tmp_count = 0; // initialise to 0 for each separate ii loop

        for (octave_idx_type jj (0); jj < args(0).length (); jj++)
        {
             if (jj != ii) // condition avoids matching a point to itself!
             {
                i_slopes(jj_tmp_count) = ( rep_median_compile_input(jj) - rep_median_compile_input(ii) ) / ( (jj) - (ii) ); // slope between distinct points

                i_intercepts(jj_tmp_count) =  rep_median_compile_input(jj) - ( i_slopes(jj_tmp_count) * jj ); // calc intercept b = y - mx

                jj_tmp_count = jj_tmp_count + 1; // increment counter only when condition avoids matching a point to itself!
             }
        }                
 
                median_calc_return = feval ( "median", octave_value (i_slopes) );
                slope_medians(ii) = median_calc_return(0).double_value ();

                median_calc_return = feval ( "median", octave_value (i_intercepts) );
                intercept_medians(ii) = median_calc_return(0).double_value ();

    }    

        median_calc_return = feval ( "median", octave_value (slope_medians) );
        slope = median_calc_return(0).double_value (); 

        median_calc_return = feval ( "median", octave_value (intercept_medians) );
        intercept = median_calc_return(0).double_value (); 

    for (octave_idx_type ii (0); ii < args(0).length (); ii++) // loop to fill output column_vector with repeated_median line fit values
    {
        rep_median_compile_output(ii) = slope * ii + intercept;
    }

    retval_list(2) = intercept;
    retval_list(1) = slope;
    retval_list(0) = rep_median_compile_output;

    return retval_list;
}
For more information about the Robust Repeated Median, interested readers are referred to this webpage and this pdf file. Taking the example given in this linked pdf, the above function gives this plot
where it can be seen that the regression line is completely unaffected by the two outliers. As always, if readers see any errors in my code or can suggest improvements, please let me know.

No comments: