Matrix Science Mascot Parser toolkit
 
Loading...
Searching...
No Matches
ms_protein_quant_ratio Class Reference

Protein abundance in one component relative to another in a quantitation experiment, derived from a sample of peptide ratios. More...

#include <ms_protein_quant_ratio.hpp>

Public Member Functions

 ms_protein_quant_ratio (const ms_protein_quant_ratio &src)
 Copying constructor.
 
 ms_protein_quant_ratio (const std::string &accession, int dbIdx, const std::string &ratioName)
 
 ms_protein_quant_ratio (const std::string &accession, int dbIdx, const std::string &ratioName, double value, double stdev, double stderror, double hypothesisPvalue, double normalityPvalue, unsigned int sampleSize)
 
 ~ms_protein_quant_ratio ()
 Destructor.
 
void copyFrom (const ms_protein_quant_ratio *src)
 Copies all content from another instance of the class.
 
std::string getAccession () const
 Return the protein accession associated with this ratio.
 
ms_peptide_quant_key_vector getActiveKeys () const
 Return the peptide quant keys used in calculating the protein ratio.
 
int getDB () const
 Return the protein database index associated with this ratio.
 
ms_peptide_quant_key_vector getExcludedKeys () const
 Return peptide quant keys associated with the protein accession that were skipped due to being excluded from protein ratio calculation.
 
double getHypothesisPvalue () const
 Return the p-value of this protein ratio under the null hypothesis.
 
double getHypothesisPvalue (std::string &whyUnavailable) const
 Return the p-value of this protein ratio under the null hypothesis.
 
double getNormalityPvalue () const
 Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.
 
double getNormalityPvalue (std::string &whyUnavailable) const
 Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.
 
ms_peptide_quant_key_vector getOutlierKeys () const
 Return peptide quant keys associated with the protein accession that were excluded due to being outliers.
 
std::string getRatioName () const
 Return the name of the ratio.
 
unsigned int getSampleSize () const
 Return the size of the sample of peptide ratios used in deriving this protein ratio.
 
ms_peptide_quant_key_vector getSkippedKeys () const
 Return peptide quant keys associated with the protein accession that were skipped due to being negative or infinite.
 
double getStandardDeviation () const
 Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.
 
double getStandardDeviation (std::string &whyUnavailable) const
 Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.
 
double getStandardError () const
 Return the standard error of the protein ratio, calculated from the standard deviation of the sample.
 
double getStandardError (std::string &whyUnavailable) const
 Return the standard error of the protein ratio, calculated from the standard deviation of the sample.
 
double getValue () const
 Return the numerical value of the ratio.
 
bool isLogNormal (double threshold=0.05) const
 Boolean flag: do the peptide ratios used in calculating this protein ratio appear to be (log)normally distributed?
 
bool isMissing () const
 Boolean flag: is this ratio defined?
 
bool isSignificant (double threshold=0.05) const
 Boolean flag: is the protein ratio statistically significantly different from the null hypothesis?
 
bool operator!= (const ms_protein_quant_ratio &right) const
 C++ inequality.
 
ms_protein_quant_ratiooperator= (const ms_protein_quant_ratio &right)
 C++ assignment operator.
 
bool operator== (const ms_protein_quant_ratio &right) const
 C++ equality.
 

Detailed Description

Protein abundance in one component relative to another in a quantitation experiment, derived from a sample of peptide ratios.

A protein ratio in Mascot is calculated from a sample of peptide ratios, where the peptides form a subset of identified peptides associated with a given protein accession; not all identified peptides have ratios. Protein ratios are usually calculated by ms_quantitation and classes derived from it.

Each protein ratio must be associated with a particular protein accession, database index and ratio name. The database index refers to databases used in a Mascot search, and can be 0 if the protein accession does not arise from a Mascot search. The ratio name is the name of a report ratio in a particular quantitation experiment. This can be any free form text, for example "115/114" or "control/diseased", defined in the quantitation method.

This class is merely a container of numerical values. It does not contain information how the protein ratio was calculated from peptide ratios, nor what kind of statistical significance testing was done, if any.

The following conventions are followed:

  • If the ratio is undefined ("missing"), the value, sample size, standard deviation and p-values are meaningless. They will be set to 0 or false, whichever is appropriate.
  • If sample size is 1, standard deviation must be 0. (This is by definition of how standard deviation is calculated.) If standard deviation could not be calculated for some other reason, it must be -1.
  • A negative p-value means no hypothesis testing was conducted. A value of 0 is implausibly small and should be disregarded.

Constructor & Destructor Documentation

◆ ms_protein_quant_ratio() [1/2]

ms_protein_quant_ratio ( const std::string &  accession,
int  dbIdx,
const std::string &  ratioName 
)
Parameters
accessionThe protein accession associated with this ratio.
dbIdxThe database index associated with this ratio, usually related to databases used in a Mascot search.
ratioNameThe name of the ratio, e.g. "115/114".

◆ ms_protein_quant_ratio() [2/2]

ms_protein_quant_ratio ( const std::string &  accession,
int  dbIdx,
const std::string &  ratioName,
double  value,
double  stdev,
double  stderror,
double  hypothesisPvalue,
double  normalityPvalue,
unsigned int  sampleSize 
)
Parameters
accessionThe protein accession associated with this ratio.
dbIdxThe database index associated with this ratio, usually related to databases used in a Mascot search.
ratioNameThe name of the ratio, e.g. "115/114".
valueThe numerical value of this ratio. Usually ratios should be positive, but it is possible to give 0 or a negative value. Interpretation of such values is the caller's responsibility.
stdevThe sample standard deviation of the ratio, based on the peptide ratios. Use 0 if sample size is 1, and -1 if it is unavailable.
stderrorThe standard error of the protein ratio, based on the standard deviation and sample size. Use 0 if sample size is 1, and -1 if it is unavailable.
hypothesisPvalueThe p-value of a hypothesis test about this protein ratio. Use -1 if it is unavailable.
normalityPvalueThe p-value that the peptide ratios appear to be normally (lognormally) distributed. Use -1 if it is unavailable.
sampleSizeThe sample size, which must be at least 1. Use the constructor for the undefined ("missing") protein ratio if sample size is 0.

Member Function Documentation

◆ getActiveKeys()

ms_peptide_quant_key_vector getActiveKeys ( ) const

Return the peptide quant keys used in calculating the protein ratio.

Returns
The vector of peptide quant keys used in calculating this protein ratio. The vector has either size zero or its size is the sample size, depending on which constructor was used to create the protein ratio object.

◆ getExcludedKeys()

ms_peptide_quant_key_vector getExcludedKeys ( ) const

Return peptide quant keys associated with the protein accession that were skipped due to being excluded from protein ratio calculation.

A peptide ratio may be excluded manually from peptide ratio calculation, for reasons other than having negative value or being an outlier. For example, visual inspection may reveal the quality of a particular peptide ratio may be too poor, and that peptide ratio is excluded from all protein ratio calculation. The peptide quant key of the ratio would then appear here if it is associated with this protein ratio.

Returns
vector of excluded keys

◆ getHypothesisPvalue() [1/2]

double getHypothesisPvalue ( ) const

Return the p-value of this protein ratio under the null hypothesis.

Note that you need to know how the protein ratio was calculated, what statistical assumptions have been made and what the null hypothesis is as well as what statistic the protein ratio is compared against, to interpret the meaning of the p-value.

Python: note that the name of this method in Python is getHypothesisPvalue_only().

Returns
The p-value of this protein ratio under some hypothesis. A negative value (-1) means hypothesis testing was not conducted or the p-value could not be calculated.

◆ getHypothesisPvalue() [2/2]

double getHypothesisPvalue ( std::string &  whyUnavailable) const

Return the p-value of this protein ratio under the null hypothesis.

Same functionality as getHypothesisPvalue(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $pvalue = $ratio->getHypothesisPvalue($str);
Parameters
whyUnavailableIf p-value is unavailable, the reason is returned in this string argument.
Returns
The p-value of this protein ratio under some hypothesis. A negative value (-1) means hypothesis testing was not conducted or the p-value could not be calculated.

◆ getNormalityPvalue() [1/2]

double getNormalityPvalue ( ) const

Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.

Peptide ratios are multiplicative, and it is reasonable to assume they are log-normally distributed, that is, the log-transformed peptide ratios have normal distribution. Normality is usually required for outlier testing and hypothesis testing. The hypothesis p-value may be meaningless if the normality p-value is not below a threshold; the threshold value is up to you to decide.

Python: note that the name of this method in Python is getNormalityPvalue_only().

Returns
The p-value that the peptide ratios used in calculating this protein ratio seem log-normally distributed. A very low value, e.g. below 0.05, indicates non-normality. A negative value (-1) means sample size was too small for normality testing or the value could not be calculated.

◆ getNormalityPvalue() [2/2]

double getNormalityPvalue ( std::string &  whyUnavailable) const

Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.

Same functionality as getNormalityPvalue(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $pvalue = $ratio->getNormalityPvalue($str);
Parameters
whyUnavailableIf p-value is unavailable, the reason is returned in this string argument.
Returns
The p-value that the peptide ratios used in calculating this protein ratio seem log-normally distributed. A very low value, e.g. below 0.05, indicates non-normality. A negative value (-1) means sample size was too small for normality testing or the value could not be calculated..

◆ getOutlierKeys()

ms_peptide_quant_key_vector getOutlierKeys ( ) const

Return peptide quant keys associated with the protein accession that were excluded due to being outliers.

The vector of outlier peptide quant keys may be empty if outlier detection is disabled or there were no outliers. It may be possible to override this decision and manually include peptide quant keys that would otherwise be ignored as outliers.

Returns
vector of outlier keys

◆ getSampleSize()

unsigned int getSampleSize ( ) const

Return the size of the sample of peptide ratios used in deriving this protein ratio.

Returns
The number of peptide ratios used in ratio calculation. This is the same size as getActiveKeys().

◆ getSkippedKeys()

ms_peptide_quant_key_vector getSkippedKeys ( ) const

Return peptide quant keys associated with the protein accession that were skipped due to being negative or infinite.

The vector of skipped keys may be empty if there were no negative or infinite peptide ratios. It is never possible to include skipped keys in protein ratio calculation.

This collection will also include peptide ratios that were left out of the calculation of the ratio for the Average protocol because they were not one of the highest available peptide intentities.

Returns
vector of skipped keys

◆ getStandardDeviation() [1/2]

double getStandardDeviation ( ) const

Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.

Sample standard deviation is usually calculated using the standard formula: compute the sample variance as variance = 1/(n-1) * sum( (x_i - avg)^2 ), where x_i are the log peptide ratios, i is iterated from 1 to n (the sample size) and avg is the sample mean, and then take the square root to get the standard deviation. The value returned by this function is transformed back from the log space: exp(sqrt(variance)).

A related but different quantity is the standard error of the protein ratio estimator; see getStandardError().

Python: note that the name of this method in Python is getStandardDeviation_only().

Returns
The sample standard deviation of this protein ratio. Negative (-1) means the value is unavailable.

◆ getStandardDeviation() [2/2]

double getStandardDeviation ( std::string &  whyUnavailable) const

Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.

Same functionality as getStandardDeviation(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $stdev = $ratio->getStandardDeviation($str);
Parameters
whyUnavailableIf standard deviation is unavailable, the reason is returned in this string argument.
Returns
The sample standard deviation of this protein ratio. Negative (-1) means the value is unavailable.

◆ getStandardError() [1/2]

double getStandardError ( ) const

Return the standard error of the protein ratio, calculated from the standard deviation of the sample.

The standard error of the protein ratio estimate is a function of sample standard deviation and sample size. In the simplest case when the log protein ratio is the mean of log peptide ratios, the log standard error is s/sqrt(n), where s is the standard deviation and n is the sample size. In that case, the value returned by this function is transformed back from log space: exp(s/sqrt(n)).

The formula depends on the protein ratio type. For details of the calculation, see ms_quant_stats::arithmeticStandardErrorOfMean(), ms_quant_stats::arithmeticStandardErrorOfMedian(), and ms_quant_stats::weightedArithmeticStandardError().

Python: note that the name of this method in Python is getStandardError_only().

Returns
The standard error of the protein ratio. Negative (-1) means the value is unavailable.

◆ getStandardError() [2/2]

double getStandardError ( std::string &  whyUnavailable) const

Return the standard error of the protein ratio, calculated from the standard deviation of the sample.

Same functionality as getStandardError(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $stderr = $ratio->getStandardError($str);
Parameters
whyUnavailableIf standard error is unavailable, the reason is returned in this string argument.
Returns
The standard error of the protein ratio. Negative (-1) means the value is unavailable.

◆ getValue()

double getValue ( ) const

Return the numerical value of the ratio.

Note that you need to know the context and how the protein ratio has been calculated to interpret the value.

Returns
The numerical value of this protein ratio.

◆ isLogNormal()

bool isLogNormal ( double  threshold = 0.05) const

Boolean flag: do the peptide ratios used in calculating this protein ratio appear to be (log)normally distributed?

The normality test hypothesis is that the peptide ratios are log-normally distributed, or equivalently that the log-transformed peptide ratios are normally distributed. If the p-value is below a threshold, say 0.05, then the hypothesis should be rejected. This method returns true if the p-value does not fall below threshold, which means you have no evidence to reject the normality hypothesis.

(Strictly speaking p above threshold is not enough evidence to explicitly accept the hypothesis, although that is what the method name implies.)

If the p-value is -1 (no testing was conducted), then there is no evidence against the hypothesis and the method returns true.

Parameters
thresholdThe significance threshold; default 0.05.
Returns
True if the normality p-value is not below the threshold; false otherwise. If the normality p-value is not available, the method returns false.

◆ isMissing()

bool isMissing ( ) const

Boolean flag: is this ratio defined?

The ratio can be undefined ("missing") if there is not enough data to calculate it. If this method returns true, then the return values of other methods are undefined.

Returns
True if the ratio is missing; false otherwise.

◆ isSignificant()

bool isSignificant ( double  threshold = 0.05) const

Boolean flag: is the protein ratio statistically significantly different from the null hypothesis?

The null hypothesis depends on the context where the protein ratio was calculated. If the p-value is below threshold, then there is evidence against the hypothesis and this method returns true; in other words, the protein ratio differs from the hypothesised value in a "statistically significant" way.

If the p-value is -1 (no testing was conducted), then there is no evidence against the hypothesis and the method returns false.

Parameters
thresholdThe significance threshold; default 0.05.
Returns
True if the hypothesis p-value is below the threshold; false otherwise. If the p-value is not available, the method returns false.

The documentation for this class was generated from the following files: