The Shapiro-Wilk W test. More...
#include <ms_shapiro_wilk.hpp>
Public Member Functions | |
ms_shapiro_wilk () | |
ms_shapiro_wilk (std::deque< std::pair< size_t, double > > x, long n, long n1, long n2) | |
The Shapiro-Wilk W test. | |
void | appendSampleValue (double y) |
Add a new sample value to the list to be tested. | |
void | calculate (long n, long n1, long n2) |
Calculate results using values previously added using appendSampleValue(). | |
void | clearSampleValues () |
Clear current vector of X values. | |
double | getErrorCode () const |
Returns the error code for the Shapiro-Wilks W-statistic. | |
double | getPValue () const |
Returns the P-value for the Shapiro-Wilks W-statistic. | |
double | getResult () const |
Returns the Shapiro-Wilks W-statistic. | |
Static Public Member Functions | |
static void | swilk (bool init, std::deque< std::pair< size_t, double > > x, long n, long n1, long n2, std::deque< double > &a, double &w, double &pw, int &ifault) |
AS R94: Calculate the Shapiro-Wilk W test statistic and p-value directly. | |
The Shapiro-Wilk W test.
Testing for outliers and reporting a standard deviation for the protein ratio can only be performed if the peptide ratios are consistent with a sample from a normal distribution (in log space). If the peptide ratios do not appear to be from a normal distribution, this may indicate that the values are meaningless, and something went systematically wrong with the the analysis. On the other hand, it may indicate something interesting, like the peptides have been mis-assigned and actually come from two proteins with very different ratios, so that the distribution is bimodal. Interpretation of test success or failure must be done on a case by case basis.
In the Shapiro-Wilk W test, the null hypothesis is that the sample is taken from a normal distribution. This hypothesis is rejected if the critical value P for the test statistic W is less than 0.05. The routine used is valid for sample sizes between 3 and 2000.
Source code for the Shapiro-Wilk W test algorithm
References:
ms_shapiro_wilk | ( | ) |
Default constructor.
ms_shapiro_wilk | ( | std::deque< std::pair< size_t, double > > | x, |
long | n, | ||
long | n1, | ||
long | n2 | ||
) |
The Shapiro-Wilk W test.
After calling this constructor, check for any error using getErrorCode() and then check that getPValue() returns a value greater than 0.05 to determine if it is a normal distribution.
x | is a list of the sample values in increasing order. The first (optional) item in the pair is typically used for an index. This value is not used by the algorithm, but provided as a convenience. The second item in the pair is the sample value. |
n | is the total sample size (including any right-censored values). |
n1 | is the number of uncensored cases (n1 <= n). |
n2 | is the integer part of n/2. |
void appendSampleValue | ( | double | x | ) |
Add a new sample value to the list to be tested.
Add a new sample value to the list to be tested when calling calculate(). Sample values must be added in increasing order.
x | is the sample value to be added to the list. |
void calculate | ( | long | n, |
long | n1, | ||
long | n2 | ||
) |
Calculate results using values previously added using appendSampleValue().
Calculate results using values previously added using appendSampleValue().
void clearSampleValues | ( | ) |
Clear current vector of X values.
If multiple calls to calculate() are to be made for different sample values, then call this function before calling appendSampleValue().
double getErrorCode | ( | ) | const |
Returns the error code for the Shapiro-Wilks W-statistic.
Possible error codes are:
|
static |
AS R94: Calculate the Shapiro-Wilk W test statistic and p-value directly.
Translated to C++ from the F77 version of AS R94 in StatLib.
Royston, P. (1995): Remark AS R94: A Remark on Algorithm AS 181: The W-test for Normality. Journal of the Royal Statistical Society Series C (Applied Statistics) 44(4):547-551.
In the F77 version of this function, the w
parameter could be used to alter the function's behaviour. That feature has not been retained here.
[in] | init | If false, initialise the scratch vector a. |
[in] | x | Sample values sorted in increasing order. |
[in] | n | Total sample size (usually x.size() cast to long). |
[in] | n1 | Sample size less censored cases (n1 <= n; often n1 = x.size() ). |
[in] | n2 | (long)(n/2) |
[in,out] | a | Scratch vector used by the algorithm. |
[out] | w | The Shapiro-Wilk W statistic calculated from the data. |
[out] | pw | The P-value of the statistic under the null hypothesis. |
[out] | ifault | Error code, documented in getErrorCode(). If 0 or 2, then both w and pw were calculated. Otherwise an error occurred. |