Matrix Science Mascot Parser toolkit
 
Loading...
Searching...
No Matches

The Shapiro-Wilk W test. More...

#include <ms_shapiro_wilk.hpp>

Public Member Functions

 ms_shapiro_wilk ()
 
 ms_shapiro_wilk (std::deque< std::pair< size_t, double > > x, long n, long n1, long n2)
 The Shapiro-Wilk W test.
 
void appendSampleValue (double y)
 Add a new sample value to the list to be tested.
 
void calculate (long n, long n1, long n2)
 Calculate results using values previously added using appendSampleValue().
 
void clearSampleValues ()
 Clear current vector of X values.
 
double getErrorCode () const
 Returns the error code for the Shapiro-Wilks W-statistic.
 
double getPValue () const
 Returns the P-value for the Shapiro-Wilks W-statistic.
 
double getResult () const
 Returns the Shapiro-Wilks W-statistic.
 

Static Public Member Functions

static void swilk (bool init, std::deque< std::pair< size_t, double > > x, long n, long n1, long n2, std::deque< double > &a, double &w, double &pw, int &ifault)
 AS R94: Calculate the Shapiro-Wilk W test statistic and p-value directly.
 

Detailed Description

The Shapiro-Wilk W test.

Testing for normality

Testing for outliers and reporting a standard deviation for the protein ratio can only be performed if the peptide ratios are consistent with a sample from a normal distribution (in log space). If the peptide ratios do not appear to be from a normal distribution, this may indicate that the values are meaningless, and something went systematically wrong with the the analysis. On the other hand, it may indicate something interesting, like the peptides have been mis-assigned and actually come from two proteins with very different ratios, so that the distribution is bimodal. Interpretation of test success or failure must be done on a case by case basis.

Shapiro-Wilk W test

In the Shapiro-Wilk W test, the null hypothesis is that the sample is taken from a normal distribution. This hypothesis is rejected if the critical value P for the test statistic W is less than 0.05. The routine used is valid for sample sizes between 3 and 2000.

Source code for the Shapiro-Wilk W test algorithm

References:

  1. Royston, J. P. (1982): An Extension of Shapiro and Wilk's W Test for Normality to Large Samples. Journal of the Royal Statistical Society Series C (Applied Statistics) 31(2):115-124.
  2. Royston, J. P. (1982): Algorithm AS 181: The W Test for Normality. Journal of the Royal Statistical Society Series C (Applied Statistics) 31(2):176-180.
  3. Royston, P. (1995): Remark AS R94: A Remark on Algorithm AS 181: The W-test for Normality. Journal of the Royal Statistical Society Series C (Applied Statistics) 44(4):547-551.
  4. Algorithms AS R94, AS 66 and AS 241 from StatLib: http://lib.stat.cmu.edu/

Constructor & Destructor Documentation

◆ ms_shapiro_wilk() [1/2]

Default constructor.

◆ ms_shapiro_wilk() [2/2]

ms_shapiro_wilk ( std::deque< std::pair< size_t, double > >  x,
long  n,
long  n1,
long  n2 
)

The Shapiro-Wilk W test.

Note
This constructor can only be called from C++. For other languages, use the appendSampleValue() and calculate() functions.

After calling this constructor, check for any error using getErrorCode() and then check that getPValue() returns a value greater than 0.05 to determine if it is a normal distribution.

Parameters
xis a list of the sample values in increasing order. The first (optional) item in the pair is typically used for an index. This value is not used by the algorithm, but provided as a convenience. The second item in the pair is the sample value.
nis the total sample size (including any right-censored values).
n1is the number of uncensored cases (n1 <= n).
n2is the integer part of n/2.

Member Function Documentation

◆ appendSampleValue()

void appendSampleValue ( double  x)

Add a new sample value to the list to be tested.

Add a new sample value to the list to be tested when calling calculate(). Sample values must be added in increasing order.

Parameters
xis the sample value to be added to the list.

◆ calculate()

void calculate ( long  n,
long  n1,
long  n2 
)

Calculate results using values previously added using appendSampleValue().

Calculate results using values previously added using appendSampleValue().

◆ clearSampleValues()

void clearSampleValues ( )

Clear current vector of X values.

If multiple calls to calculate() are to be made for different sample values, then call this function before calling appendSampleValue().

◆ getErrorCode()

double getErrorCode ( ) const

Returns the error code for the Shapiro-Wilks W-statistic.

Possible error codes are:

  • 0 for no error
  • 1 if n1 < 3
  • 2 if n > 5000 (a non-fatal error, but the accuracy of the p-value is not guaranteed in this case)
  • 3 if n2 < n/2
  • 4 if n1 > n or (n1 < n and n < 20).
  • 5 if the proportion censored (n - n1)/n > 0.8.
  • 6 if the data have zero range.
  • 7 if the sample values are not sorted in increasing order
  • 8 if error return from ppnd7 (which should never occur in normal operation)
Returns
error code

◆ swilk()

void swilk ( bool  init,
std::deque< std::pair< size_t, double > >  x,
long  n,
long  n1,
long  n2,
std::deque< double > &  a,
double &  w,
double &  pw,
int &  ifault 
)
static

AS R94: Calculate the Shapiro-Wilk W test statistic and p-value directly.

Translated to C++ from the F77 version of AS R94 in StatLib.

Royston, P. (1995): Remark AS R94: A Remark on Algorithm AS 181: The W-test for Normality. Journal of the Royal Statistical Society Series C (Applied Statistics) 44(4):547-551.

In the F77 version of this function, the w parameter could be used to alter the function's behaviour. That feature has not been retained here.

Parameters
[in]initIf false, initialise the scratch vector a.
[in]xSample values sorted in increasing order.
[in]nTotal sample size (usually x.size() cast to long).
[in]n1Sample size less censored cases (n1 <= n; often n1 = x.size()).
[in]n2(long)(n/2)
[in,out]aScratch vector used by the algorithm.
[out]wThe Shapiro-Wilk W statistic calculated from the data.
[out]pwThe P-value of the statistic under the null hypothesis.
[out]ifaultError code, documented in getErrorCode(). If 0 or 2, then both w and pw were calculated. Otherwise an error occurred.

The documentation for this class was generated from the following files: