Protein abundance in one component relative to another in a quantitation experiment, derived from a sample of peptide ratios. More...

#include <ms_protein_quant_ratio.hpp>

Public Member Functions
	ms_protein_quant_ratio (const ms_protein_quant_ratio &src)
	Copying constructor.

	ms_protein_quant_ratio (const std::string &accession, int dbIdx, const std::string &ratioName)

	ms_protein_quant_ratio (const std::string &accession, int dbIdx, const std::string &ratioName, double value, double stdev, double stderror, double hypothesisPvalue, double normalityPvalue, unsigned int sampleSize)

	~ms_protein_quant_ratio ()
	Destructor.

void	copyFrom (const ms_protein_quant_ratio *src)
	Copies all content from another instance of the class.

std::string	getAccession () const
	Return the protein accession associated with this ratio.

ms_peptide_quant_key_vector	getActiveKeys () const
	Return the peptide quant keys used in calculating the protein ratio.

int	getDB () const
	Return the protein database index associated with this ratio.

ms_peptide_quant_key_vector	getExcludedKeys () const
	Return peptide quant keys associated with the protein accession that were skipped due to being excluded from protein ratio calculation.

double	getHypothesisPvalue () const
	Return the p-value of this protein ratio under the null hypothesis.

double	getHypothesisPvalue (std::string &whyUnavailable) const
	Return the p-value of this protein ratio under the null hypothesis.

double	getNormalityPvalue () const
	Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.

double	getNormalityPvalue (std::string &whyUnavailable) const
	Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.

ms_peptide_quant_key_vector	getOutlierKeys () const
	Return peptide quant keys associated with the protein accession that were excluded due to being outliers.

std::string	getRatioName () const
	Return the name of the ratio.

unsigned int	getSampleSize () const
	Return the size of the sample of peptide ratios used in deriving this protein ratio.

ms_peptide_quant_key_vector	getSkippedKeys () const
	Return peptide quant keys associated with the protein accession that were skipped due to being negative or infinite.

double	getStandardDeviation () const
	Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.

double	getStandardDeviation (std::string &whyUnavailable) const
	Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.

double	getStandardError () const
	Return the standard error of the protein ratio, calculated from the standard deviation of the sample.

double	getStandardError (std::string &whyUnavailable) const
	Return the standard error of the protein ratio, calculated from the standard deviation of the sample.

double	getValue () const
	Return the numerical value of the ratio.

bool	isLogNormal (double threshold=0.05) const
	Boolean flag: do the peptide ratios used in calculating this protein ratio appear to be (log)normally distributed?

bool	isMissing () const
	Boolean flag: is this ratio defined?

bool	isSignificant (double threshold=0.05) const
	Boolean flag: is the protein ratio statistically significantly different from the null hypothesis?

bool	operator!= (const ms_protein_quant_ratio &right) const
	C++ inequality.

ms_protein_quant_ratio &	operator= (const ms_protein_quant_ratio &right)
	C++ assignment operator.

bool	operator== (const ms_protein_quant_ratio &right) const
	C++ equality.

Detailed Description

Protein abundance in one component relative to another in a quantitation experiment, derived from a sample of peptide ratios.

A protein ratio in Mascot is calculated from a sample of peptide ratios, where the peptides form a subset of identified peptides associated with a given protein accession; not all identified peptides have ratios. Protein ratios are usually calculated by ms_quantitation and classes derived from it.

Each protein ratio must be associated with a particular protein accession, database index and ratio name. The database index refers to databases used in a Mascot search, and can be 0 if the protein accession does not arise from a Mascot search. The ratio name is the name of a report ratio in a particular quantitation experiment. This can be any free form text, for example "115/114" or "control/diseased", defined in the quantitation method.

This class is merely a container of numerical values. It does not contain information how the protein ratio was calculated from peptide ratios, nor what kind of statistical significance testing was done, if any.

The following conventions are followed:

If the ratio is undefined ("missing"), the value, sample size, standard deviation and p-values are meaningless. They will be set to 0 or false, whichever is appropriate.
If sample size is 1, standard deviation must be 0. (This is by definition of how standard deviation is calculated.) If standard deviation could not be calculated for some other reason, it must be -1.
A negative p-value means no hypothesis testing was conducted. A value of 0 is implausibly small and should be disregarded.

Constructor & Destructor Documentation

◆ ms_protein_quant_ratio() [1/2]

ms_protein_quant_ratio	(	const std::string &	accession,
		int	dbIdx,
		const std::string &	ratioName
	)

Parameters

accession	The protein accession associated with this ratio.
dbIdx	The database index associated with this ratio, usually related to databases used in a Mascot search.
ratioName	The name of the ratio, e.g. "115/114".

◆ ms_protein_quant_ratio() [2/2]

ms_protein_quant_ratio	(	const std::string &	accession,
		int	dbIdx,
		const std::string &	ratioName,
		double	value,
		double	stdev,
		double	stderror,
		double	hypothesisPvalue,
		double	normalityPvalue,
		unsigned int	sampleSize
	)

Parameters

accession	The protein accession associated with this ratio.
dbIdx	The database index associated with this ratio, usually related to databases used in a Mascot search.
ratioName	The name of the ratio, e.g. "115/114".
value	The numerical value of this ratio. Usually ratios should be positive, but it is possible to give 0 or a negative value. Interpretation of such values is the caller's responsibility.
stdev	The sample standard deviation of the ratio, based on the peptide ratios. Use 0 if sample size is 1, and -1 if it is unavailable.
stderror	The standard error of the protein ratio, based on the standard deviation and sample size. Use 0 if sample size is 1, and -1 if it is unavailable.
hypothesisPvalue	The p-value of a hypothesis test about this protein ratio. Use -1 if it is unavailable.
normalityPvalue	The p-value that the peptide ratios appear to be normally (lognormally) distributed. Use -1 if it is unavailable.
sampleSize	The sample size, which must be at least 1. Use the constructor for the undefined ("missing") protein ratio if sample size is 0.

Member Function Documentation

◆ getActiveKeys()

ms_peptide_quant_key_vector getActiveKeys ( ) const

Return the peptide quant keys used in calculating the protein ratio.

Returns: The vector of peptide quant keys used in calculating this protein ratio. The vector has either size zero or its size is the sample size, depending on which constructor was used to create the protein ratio object.

◆ getExcludedKeys()

ms_peptide_quant_key_vector getExcludedKeys ( ) const

Return peptide quant keys associated with the protein accession that were skipped due to being excluded from protein ratio calculation.

A peptide ratio may be excluded manually from peptide ratio calculation, for reasons other than having negative value or being an outlier. For example, visual inspection may reveal the quality of a particular peptide ratio may be too poor, and that peptide ratio is excluded from all protein ratio calculation. The peptide quant key of the ratio would then appear here if it is associated with this protein ratio.

Returns: vector of excluded keys

◆ getHypothesisPvalue() [1/2]

double getHypothesisPvalue ( ) const

Return the p-value of this protein ratio under the null hypothesis.

Note that you need to know how the protein ratio was calculated, what statistical assumptions have been made and what the null hypothesis is as well as what statistic the protein ratio is compared against, to interpret the meaning of the p-value.

Python: note that the name of this method in Python is getHypothesisPvalue_only().

Returns: The p-value of this protein ratio under some hypothesis. A negative value (-1) means hypothesis testing was not conducted or the p-value could not be calculated.

◆ getHypothesisPvalue() [2/2]

double getHypothesisPvalue ( std::string & whyUnavailable ) const

Return the p-value of this protein ratio under the null hypothesis.

Same functionality as getHypothesisPvalue(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $pvalue = $ratio->getHypothesisPvalue($str);

Parameters

whyUnavailable If p-value is unavailable, the reason is returned in this string argument.

Returns: The p-value of this protein ratio under some hypothesis. A negative value (-1) means hypothesis testing was not conducted or the p-value could not be calculated.

◆ getNormalityPvalue() [1/2]

double getNormalityPvalue ( ) const

Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.

Peptide ratios are multiplicative, and it is reasonable to assume they are log-normally distributed, that is, the log-transformed peptide ratios have normal distribution. Normality is usually required for outlier testing and hypothesis testing. The hypothesis p-value may be meaningless if the normality p-value is not below a threshold; the threshold value is up to you to decide.

Python: note that the name of this method in Python is getNormalityPvalue_only().

Returns: The p-value that the peptide ratios used in calculating this protein ratio seem log-normally distributed. A very low value, e.g. below 0.05, indicates non-normality. A negative value (-1) means sample size was too small for normality testing or the value could not be calculated.

◆ getNormalityPvalue() [2/2]

double getNormalityPvalue ( std::string & whyUnavailable ) const

Return the p-value of the hypothesis that the peptide ratios were (log)normally distributed.

Same functionality as getNormalityPvalue(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $pvalue = $ratio->getNormalityPvalue($str);

Parameters

whyUnavailable If p-value is unavailable, the reason is returned in this string argument.

Returns: The p-value that the peptide ratios used in calculating this protein ratio seem log-normally distributed. A very low value, e.g. below 0.05, indicates non-normality. A negative value (-1) means sample size was too small for normality testing or the value could not be calculated..

◆ getOutlierKeys()

ms_peptide_quant_key_vector getOutlierKeys ( ) const

Return peptide quant keys associated with the protein accession that were excluded due to being outliers.

The vector of outlier peptide quant keys may be empty if outlier detection is disabled or there were no outliers. It may be possible to override this decision and manually include peptide quant keys that would otherwise be ignored as outliers.

Returns: vector of outlier keys

◆ getSampleSize()

unsigned int getSampleSize ( ) const

Return the size of the sample of peptide ratios used in deriving this protein ratio.

Returns: The number of peptide ratios used in ratio calculation. This is the same size as getActiveKeys().

◆ getSkippedKeys()

ms_peptide_quant_key_vector getSkippedKeys ( ) const

Return peptide quant keys associated with the protein accession that were skipped due to being negative or infinite.

The vector of skipped keys may be empty if there were no negative or infinite peptide ratios. It is never possible to include skipped keys in protein ratio calculation.

This collection will also include peptide ratios that were left out of the calculation of the ratio for the Average protocol because they were not one of the highest available peptide intentities.

Returns: vector of skipped keys

◆ getStandardDeviation() [1/2]

double getStandardDeviation ( ) const

Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.

Sample standard deviation is usually calculated using the standard formula: compute the sample variance as variance = 1/(n-1) * sum( (x_i - avg)^2 ), where x_i are the log peptide ratios, i is iterated from 1 to n (the sample size) and avg is the sample mean, and then take the square root to get the standard deviation. The value returned by this function is transformed back from the log space: exp(sqrt(variance)).

A related but different quantity is the standard error of the protein ratio estimator; see getStandardError().

Python: note that the name of this method in Python is getStandardDeviation_only().

Returns: The sample standard deviation of this protein ratio. Negative (-1) means the value is unavailable.

◆ getStandardDeviation() [2/2]

double getStandardDeviation ( std::string & whyUnavailable ) const

Return the sample standard deviation of the protein ratio, calculated from the sample of peptide ratios.

Same functionality as getStandardDeviation(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $stdev = $ratio->getStandardDeviation($str);

Parameters

whyUnavailable If standard deviation is unavailable, the reason is returned in this string argument.

Returns: The sample standard deviation of this protein ratio. Negative (-1) means the value is unavailable.

◆ getStandardError() [1/2]

double getStandardError ( ) const

Return the standard error of the protein ratio, calculated from the standard deviation of the sample.

The standard error of the protein ratio estimate is a function of sample standard deviation and sample size. In the simplest case when the log protein ratio is the mean of log peptide ratios, the log standard error is s/sqrt(n), where s is the standard deviation and n is the sample size. In that case, the value returned by this function is transformed back from log space: exp(s/sqrt(n)).

The formula depends on the protein ratio type. For details of the calculation, see ms_quant_stats::arithmeticStandardErrorOfMean(), ms_quant_stats::arithmeticStandardErrorOfMedian(), and ms_quant_stats::weightedArithmeticStandardError().

Python: note that the name of this method in Python is getStandardError_only().

Returns: The standard error of the protein ratio. Negative (-1) means the value is unavailable.

◆ getStandardError() [2/2]

double getStandardError ( std::string & whyUnavailable ) const

Return the standard error of the protein ratio, calculated from the standard deviation of the sample.

Same functionality as getStandardError(), except that if the value is unavailable (negative), the reason is returned in the string argument.

See Multiple return values in Perl, Java, Python and C#.

Perl: note that this method does not return multiple values. If you want to receive the explanation string, give an empty string object as argument; for example

    my $str = ''; # ensure the variable is defined
    my $stderr = $ratio->getStandardError($str);

Parameters

whyUnavailable If standard error is unavailable, the reason is returned in this string argument.

Returns: The standard error of the protein ratio. Negative (-1) means the value is unavailable.

◆ getValue()

double getValue ( ) const

Return the numerical value of the ratio.

Note that you need to know the context and how the protein ratio has been calculated to interpret the value.

Returns: The numerical value of this protein ratio.

◆ isLogNormal()

bool isLogNormal ( double threshold = 0.05 ) const

Boolean flag: do the peptide ratios used in calculating this protein ratio appear to be (log)normally distributed?

The normality test hypothesis is that the peptide ratios are log-normally distributed, or equivalently that the log-transformed peptide ratios are normally distributed. If the p-value is below a threshold, say 0.05, then the hypothesis should be rejected. This method returns true if the p-value does not fall below threshold, which means you have no evidence to reject the normality hypothesis.

(Strictly speaking p above threshold is not enough evidence to explicitly accept the hypothesis, although that is what the method name implies.)

If the p-value is -1 (no testing was conducted), then there is no evidence against the hypothesis and the method returns true.

Parameters

threshold The significance threshold; default 0.05.

Returns: True if the normality p-value is not below the threshold; false otherwise. If the normality p-value is not available, the method returns false.

◆ isMissing()

bool isMissing ( ) const

Boolean flag: is this ratio defined?

The ratio can be undefined ("missing") if there is not enough data to calculate it. If this method returns true, then the return values of other methods are undefined.

Returns: True if the ratio is missing; false otherwise.

◆ isSignificant()

bool isSignificant ( double threshold = 0.05 ) const

Boolean flag: is the protein ratio statistically significantly different from the null hypothesis?

The null hypothesis depends on the context where the protein ratio was calculated. If the p-value is below threshold, then there is evidence against the hypothesis and this method returns true; in other words, the protein ratio differs from the hypothesised value in a "statistically significant" way.

If the p-value is -1 (no testing was conducted), then there is no evidence against the hypothesis and the method returns false.

Parameters

threshold The significance threshold; default 0.05.

Returns: True if the hypothesis p-value is below the threshold; false otherwise. If the p-value is not available, the method returns false.

The documentation for this class was generated from the following files:

ms_protein_quant_ratio.hpp
ms_quant_ratios.cpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ ms_protein_quant_ratio() [1/2]

◆ ms_protein_quant_ratio() [2/2]

Member Function Documentation

◆ getActiveKeys()

◆ getExcludedKeys()

◆ getHypothesisPvalue() [1/2]

◆ getHypothesisPvalue() [2/2]

◆ getNormalityPvalue() [1/2]

◆ getNormalityPvalue() [2/2]

◆ getOutlierKeys()

◆ getSampleSize()

◆ getSkippedKeys()

◆ getStandardDeviation() [1/2]

◆ getStandardDeviation() [2/2]

◆ getStandardError() [1/2]

◆ getStandardError() [2/2]

◆ getValue()

◆ isLogNormal()

◆ isMissing()

◆ isSignificant()