Protein abundance in one component relative to another in a quantitation experiment, derived from a sample of peptide ratios. More...
#include <ms_protein_quant_ratio.hpp>
Public Member Functions | |
ms_protein_quant_ratio (const ms_protein_quant_ratio &src) | |
Copying constructor. | |
ms_protein_quant_ratio (const std::string &accession, int dbIdx, const std::string &ratioName) | |
ms_protein_quant_ratio (const std::string &accession, int dbIdx, const std::string &ratioName, double value, double stdev, double stderror, double hypothesisPvalue, double normalityPvalue, unsigned int sampleSize) | |
~ms_protein_quant_ratio () | |
Destructor. | |
void | copyFrom (const ms_protein_quant_ratio *src) |
Copies all content from another instance of the class. | |
std::string | getAccession () const |
Return the protein accession associated with this ratio. | |
ms_peptide_quant_key_vector | getActiveKeys () const |
Return the peptide quant keys used in calculating the protein ratio. | |
int | getDB () const |
Return the protein database index associated with this ratio. | |
ms_peptide_quant_key_vector | getExcludedKeys () const |
Return peptide quant keys associated with the protein accession that were skipped due to being excluded from protein ratio calculation. | |
double | getHypothesisPvalue () const |
Return the p-value of this protein ratio under the null hypothesis. | |
double | getHypothesisPvalue (std::string &whyUnavailable) const |
Return the p-value of this protein ratio under the null hypothesis. | |
double | getNormalityPvalue () const |
Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed. | |
double | getNormalityPvalue (std::string &whyUnavailable) const |
Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed. | |
ms_peptide_quant_key_vector | getOutlierKeys () const |
Return peptide quant keys associated with the protein accession that were excluded due to being outliers. | |
std::string | getRatioName () const |
Return the name of the ratio. | |
unsigned int | getSampleSize () const |
Return the size of the sample of peptide ratios used in deriving this protein ratio. | |
ms_peptide_quant_key_vector | getSkippedKeys () const |
Return peptide quant keys associated with the protein accession that were skipped due to being negative or infinite. | |
double | getStandardDeviation () const |
Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios. | |
double | getStandardDeviation (std::string &whyUnavailable) const |
Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios. | |
double | getStandardError () const |
Return the standard error of the protein ratio, calculated from the standard deviation of the sample. | |
double | getStandardError (std::string &whyUnavailable) const |
Return the standard error of the protein ratio, calculated from the standard deviation of the sample. | |
double | getValue () const |
Return the numerical value of the ratio. | |
bool | isLogNormal (double threshold=0.05) const |
Boolean flag: do the peptide ratios used in calculating this protein ratio appear to be (log)normally distributed? | |
bool | isMissing () const |
Boolean flag: is this ratio defined? | |
bool | isSignificant (double threshold=0.05) const |
Boolean flag: is the protein ratio statistically significantly different from the null hypothesis? | |
bool | operator!= (const ms_protein_quant_ratio &right) const |
C++ inequality. | |
ms_protein_quant_ratio & | operator= (const ms_protein_quant_ratio &right) |
C++ assignment operator. | |
bool | operator== (const ms_protein_quant_ratio &right) const |
C++ equality. | |
Protein abundance in one component relative to another in a quantitation experiment, derived from a sample of peptide ratios.
A protein ratio in Mascot is calculated from a sample of peptide ratios, where the peptides form a subset of identified peptides associated with a given protein accession; not all identified peptides have ratios. Protein ratios are usually calculated by ms_quantitation and classes derived from it.
Each protein ratio must be associated with a particular protein accession, database index and ratio name. The database index refers to databases used in a Mascot search, and can be 0 if the protein accession does not arise from a Mascot search. The ratio name is the name of a report ratio in a particular quantitation experiment. This can be any free form text, for example "115/114" or "control/diseased", defined in the quantitation method.
This class is merely a container of numerical values. It does not contain information how the protein ratio was calculated from peptide ratios, nor what kind of statistical significance testing was done, if any.
The following conventions are followed:
ms_protein_quant_ratio | ( | const std::string & | accession, |
int | dbIdx, | ||
const std::string & | ratioName | ||
) |
accession | The protein accession associated with this ratio. |
dbIdx | The database index associated with this ratio, usually related to databases used in a Mascot search. |
ratioName | The name of the ratio, e.g. "115/114". |
ms_protein_quant_ratio | ( | const std::string & | accession, |
int | dbIdx, | ||
const std::string & | ratioName, | ||
double | value, | ||
double | stdev, | ||
double | stderror, | ||
double | hypothesisPvalue, | ||
double | normalityPvalue, | ||
unsigned int | sampleSize | ||
) |
accession | The protein accession associated with this ratio. |
dbIdx | The database index associated with this ratio, usually related to databases used in a Mascot search. |
ratioName | The name of the ratio, e.g. "115/114". |
value | The numerical value of this ratio. Usually ratios should be positive, but it is possible to give 0 or a negative value. Interpretation of such values is the caller's responsibility. |
stdev | The sample standard deviation of the ratio, based on the peptide ratios. Use 0 if sample size is 1, and -1 if it is unavailable. |
stderror | The standard error of the protein ratio, based on the standard deviation and sample size. Use 0 if sample size is 1, and -1 if it is unavailable. |
hypothesisPvalue | The p-value of a hypothesis test about this protein ratio. Use -1 if it is unavailable. |
normalityPvalue | The p-value that the peptide ratios appear to be normally (lognormally) distributed. Use -1 if it is unavailable. |
sampleSize | The sample size, which must be at least 1. Use the constructor for the undefined ("missing") protein ratio if sample size is 0. |
ms_peptide_quant_key_vector getActiveKeys | ( | ) | const |
Return the peptide quant keys used in calculating the protein ratio.
ms_peptide_quant_key_vector getExcludedKeys | ( | ) | const |
Return peptide quant keys associated with the protein accession that were skipped due to being excluded from protein ratio calculation.
A peptide ratio may be excluded manually from peptide ratio calculation, for reasons other than having negative value or being an outlier. For example, visual inspection may reveal the quality of a particular peptide ratio may be too poor, and that peptide ratio is excluded from all protein ratio calculation. The peptide quant key of the ratio would then appear here if it is associated with this protein ratio.
double getHypothesisPvalue | ( | ) | const |
Return the p-value of this protein ratio under the null hypothesis.
Note that you need to know how the protein ratio was calculated, what statistical assumptions have been made and what the null hypothesis is as well as what statistic the protein ratio is compared against, to interpret the meaning of the p-value.
Python: note that the name of this method in Python is getHypothesisPvalue_only().
double getHypothesisPvalue | ( | std::string & | whyUnavailable | ) | const |
Return the p-value of this protein ratio under the null hypothesis.
Same functionality as getHypothesisPvalue(), except that if the value is unavailable (negative), the reason is returned in the string argument.
See Multiple return values in Perl, Java, Python and C#.
Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example
my $str = ''; # ensure the variable is defined my $pvalue = $ratio->getHypothesisPvalue($str);
whyUnavailable | If p-value is unavailable, the reason is returned in this string argument. |
double getNormalityPvalue | ( | ) | const |
Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.
Peptide ratios are multiplicative, and it is reasonable to assume they are log-normally distributed, that is, the log-transformed peptide ratios have normal distribution. Normality is usually required for outlier testing and hypothesis testing. The hypothesis p-value may be meaningless if the normality p-value is not below a threshold; the threshold value is up to you to decide.
Python: note that the name of this method in Python is getNormalityPvalue_only().
double getNormalityPvalue | ( | std::string & | whyUnavailable | ) | const |
Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.
Same functionality as getNormalityPvalue(), except that if the value is unavailable (negative), the reason is returned in the string argument.
See Multiple return values in Perl, Java, Python and C#.
Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example
my $str = ''; # ensure the variable is defined my $pvalue = $ratio->getNormalityPvalue($str);
whyUnavailable | If p-value is unavailable, the reason is returned in this string argument. |
ms_peptide_quant_key_vector getOutlierKeys | ( | ) | const |
Return peptide quant keys associated with the protein accession that were excluded due to being outliers.
The vector of outlier peptide quant keys may be empty if outlier detection is disabled or there were no outliers. It may be possible to override this decision and manually include peptide quant keys that would otherwise be ignored as outliers.
unsigned int getSampleSize | ( | ) | const |
Return the size of the sample of peptide ratios used in deriving this protein ratio.
ms_peptide_quant_key_vector getSkippedKeys | ( | ) | const |
Return peptide quant keys associated with the protein accession that were skipped due to being negative or infinite.
The vector of skipped keys may be empty if there were no negative or infinite peptide ratios. It is never possible to include skipped keys in protein ratio calculation.
This collection will also include peptide ratios that were left out of the calculation of the ratio for the Average protocol because they were not one of the highest available peptide intentities.
double getStandardDeviation | ( | ) | const |
Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.
Sample standard deviation is usually calculated using the standard formula: compute the sample variance as variance = 1/(n-1) * sum( (x_i - avg)^2 )
, where x_i
are the log peptide ratios, i
is iterated from 1 to n
(the sample size) and avg
is the sample mean, and then take the square root to get the standard deviation. The value returned by this function is transformed back from the log space: exp(sqrt(variance))
.
A related but different quantity is the standard error of the protein ratio estimator; see getStandardError().
Python: note that the name of this method in Python is getStandardDeviation_only().
double getStandardDeviation | ( | std::string & | whyUnavailable | ) | const |
Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.
Same functionality as getStandardDeviation(), except that if the value is unavailable (negative), the reason is returned in the string argument.
See Multiple return values in Perl, Java, Python and C#.
Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example
my $str = ''; # ensure the variable is defined my $stdev = $ratio->getStandardDeviation($str);
whyUnavailable | If standard deviation is unavailable, the reason is returned in this string argument. |
double getStandardError | ( | ) | const |
Return the standard error of the protein ratio, calculated from the standard deviation of the sample.
The standard error of the protein ratio estimate is a function of sample standard deviation and sample size. In the simplest case when the log protein ratio is the mean of log peptide ratios, the log standard error is s/sqrt(n)
, where s
is the standard deviation and n
is the sample size. In that case, the value returned by this function is transformed back from log space: exp(s/sqrt(n))
.
The formula depends on the protein ratio type. For details of the calculation, see ms_quant_stats::arithmeticStandardErrorOfMean(), ms_quant_stats::arithmeticStandardErrorOfMedian(), and ms_quant_stats::weightedArithmeticStandardError().
Python: note that the name of this method in Python is getStandardError_only().
double getStandardError | ( | std::string & | whyUnavailable | ) | const |
Return the standard error of the protein ratio, calculated from the standard deviation of the sample.
Same functionality as getStandardError(), except that if the value is unavailable (negative), the reason is returned in the string argument.
See Multiple return values in Perl, Java, Python and C#.
Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example
my $str = ''; # ensure the variable is defined my $stderr = $ratio->getStandardError($str);
whyUnavailable | If standard error is unavailable, the reason is returned in this string argument. |
double getValue | ( | ) | const |
Return the numerical value of the ratio.
Note that you need to know the context and how the protein ratio has been calculated to interpret the value.
bool isLogNormal | ( | double | threshold = 0.05 | ) | const |
Boolean flag: do the peptide ratios used in calculating this protein ratio appear to be (log)normally distributed?
The normality test hypothesis is that the peptide ratios are log-normally distributed, or equivalently that the log-transformed peptide ratios are normally distributed. If the p-value is below a threshold, say 0.05, then the hypothesis should be rejected. This method returns true if the p-value does not fall below threshold, which means you have no evidence to reject the normality hypothesis.
(Strictly speaking p above threshold is not enough evidence to explicitly accept the hypothesis, although that is what the method name implies.)
If the p-value is -1 (no testing was conducted), then there is no evidence against the hypothesis and the method returns true.
threshold | The significance threshold; default 0.05. |
bool isMissing | ( | ) | const |
Boolean flag: is this ratio defined?
The ratio can be undefined ("missing") if there is not enough data to calculate it. If this method returns true, then the return values of other methods are undefined.
bool isSignificant | ( | double | threshold = 0.05 | ) | const |
Boolean flag: is the protein ratio statistically significantly different from the null hypothesis?
The null hypothesis depends on the context where the protein ratio was calculated. If the p-value is below threshold, then there is evidence against the hypothesis and this method returns true; in other words, the protein ratio differs from the hypothesised value in a "statistically significant" way.
If the p-value is -1 (no testing was conducted), then there is no evidence against the hypothesis and the method returns false.
threshold | The significance threshold; default 0.05. |