This class encapsulates a peptide from the mascot results file. More...
#include <ms_peptide.hpp>
Public Types | |
enum | PSM { PSM_COMPLETE = 0 , PSM_CROSSLINK_ALPHA = 1 , PSM_CROSSLINK_BETA = 2 } |
Type of data to return from accessor methods. More... | |
enum | PSM_TYPE { PSM_STANDARD = 0 , PSM_DECOY = 1 , PSM_ERRTOL = 2 , PSM_ERRTOLDECOY = 3 , PSM_LIBRARY = 4 , PSM_CROSSLINK = 5 , N_PSM_TYPES = 6 } |
Specifies the search pass and origin of the peptide match. More... | |
Public Member Functions | |
ms_peptide () | |
Default constructor. | |
ms_peptide (const int query, const int rank, const SEARCH_PHASE searchPhase, const double delta, const double observed, const int charge, const int numIonsMatched, const int peaksUsedFromIons1, const double ionsScore, const std::string &seriesUsedStr, const int peaksUsedFromIons2, const int peaksUsedFromIons3, const ms_peptide &pepAlpha, const ms_peptide &pepBeta, const ms_linker_site &linkerSite, const ms_mascotresults *pResults=0, const bool storeReloadableInfo=true) | |
Constructor to initialise a crosslinked peptide. | |
ms_peptide (const int query, const int rank, const SEARCH_PHASE searchPhase, const int missedCleavages, const double mrCalc, const double delta, const double observed, const int charge, const int numIonsMatched, const std::string &peptideStr, const int peaksUsedFromIons1, const std::string &varModsStr, const std::string &summedModsStr, const std::string &localModsStr, const std::string &monoLinkStr, const ms_linker_site_vector &loopLinks, const double ionsScore, const std::string &seriesUsedStr, const int peaksUsedFromIons2, const int peaksUsedFromIons3, const std::string &primaryNlStr, const std::string &summedModsNlStr, const std::string &localModsNlStr, const std::string &substStr, const std::string &componentStr, const ms_mascotresults *pResults=0, const bool storeReloadableInfo=true) | |
Constructor to initialise all values. | |
ms_peptide (const ms_peptide &src) | |
Copying constructor. | |
ms_peptide (int query, int rank, int missedCleavages, double mrCalc, double delta, double observed, int charge, int numIonsMatched, std::string peptideStr, int peaksUsedFromIons1, std::string varModsStr, double ionsScore, std::string seriesUsedStr, int peaksUsedFromIons2, int peaksUsedFromIons3, const ms_mascotresults *pResults=0, bool storeReloadableInfo=true) | |
Constructor to initialise most commonly used values. | |
~ms_peptide () | |
Destructor. | |
bool | anyPercolatorResults () const |
Return true if there are percolator results for this peptide. | |
bool | clearReloadableInfo () |
To save on memory, may need to call this function. | |
void | copyFrom (const ms_peptide *src) |
Copies all content from another instance of the class. | |
std::string | getAmbiguityString (const PSM psmComponent=PSM_COMPLETE) const |
Used for X, B and Z residues in source databases where Mascot then substitutes for a residue. | |
bool | getAnyMatch () const |
Returns true if there was a peptide match to this spectrum. | |
void | getAnyProteinTermination (bool &isNterminus, bool &isCterminus, const PSM psmComponent=PSM_COMPLETE) const |
Determines if the Mascot search matched the peptide as a C or N terminus. | |
int | getCharge () const |
Returns the charge state for the parent mass. | |
std::string | getComponentStr (const PSM psmComponent=PSM_COMPLETE) const |
Returns the quantitation method component name used for the peptide match. | |
double | getDelta () const |
Returns the difference between the calculated and experimental relative masses. | |
double | getExpectationValue () const |
Returns the expectation value. | |
int | getFirstProtAppearedIn () const |
Returns the hit numberof the first protein that contains this peptide. | |
ms_linker_site_vector | getIntactLinks () const |
Returns a vector of links between crosslinked peptides. | |
void | getIntactLinks (ms_linker_site_vector &vec) const |
Returns a vector of links between crosslinked peptides. | |
double | getIonsIntensity () const |
Returns the total intensity of all of the ions in the spectrum. | |
double | getIonsScore () const |
Returns the ions score. | |
bool | getIsFromErrorTolerant () const |
Returns true if this peptide came from the error tolerant search. | |
bool | getIsFromLibrary () const |
Returns true if this peptide came from the library search. | |
bool | getLessThanMinPepLen () const |
Returns true if the length of the peptide sequence is less than the minimum used for grouping. | |
std::string | getLocalModsNlStr (const PSM psmComponent=PSM_COMPLETE) const |
Returns neutral loss information associated with any query-level modification for the peptide. | |
std::string | getLocalModsStr (const PSM psmComponent=PSM_COMPLETE) const |
Query-level variable modifications as a string of digits. | |
ms_linker_site_vector | getLoopLinks (const PSM psmComponent=PSM_COMPLETE) const |
Returns a vector of looplinks. | |
void | getLoopLinks (ms_linker_site_vector &vec, const PSM psmComponent=PSM_COMPLETE) const |
Returns a vector of looplinks. | |
int | getMissedCleavages (const PSM psmComponent=PSM_COMPLETE) const |
Returns the number of missed cleavages. | |
std::string | getMonoLinkStr (const PSM psmComponent=PSM_COMPLETE) const |
Returns monolink information associated with a linker used as a variable modification. | |
double | getMrCalc (const PSM psmComponent=PSM_COMPLETE) const |
Returns the calculated relative mass for this peptide . | |
double | getMrExperimental () const |
Returns the observed mz value as a relative mass. | |
int | getNum13C (const double tol, const std::string tolu, const std::string mass_type) const |
Returns the number of 13C peaks offset required to get a match with the supplied tolerance. | |
int | getNumberOfLinkedPeptides () const |
Returns the number of peptides in this peptide-spectrum match. | |
int | getNumIonsMatched () const |
Returns the number of ions matched. | |
int | getNumProteins () const |
Returns the number of proteins that contains this peptide. | |
double | getObserved () const |
Returns the observed mass / charge value. | |
int | getPeaksUsedFromIons1 () const |
Returns number of peaks used from ions1 . | |
int | getPeaksUsedFromIons2 () const |
Returns number of peaks used from ions2 . | |
int | getPeaksUsedFromIons3 () const |
Returns number of peaks used from ions3 . | |
int | getPeptideLength (const PSM psmComponent=PSM_COMPLETE) const |
Returns the length in residues of the sequence found for the peptide. | |
std::string | getPeptideStr (bool substituteAmbiguous=true, const PSM psmComponent=PSM_COMPLETE) const |
Returns the sequence found for the peptide. | |
void | getPercolatorScores (double *posteriorErrorProbability, double *qValue, double *internalPercolatorScore, double *mascotIonsScore) const |
Returns the percolator scores and original Mascot ions score. | |
int | getPrettyRank () const |
Similar to getRank() except that equivalent scores get the same rank. | |
std::string | getPrimaryNlStr (const PSM psmComponent=PSM_COMPLETE) const |
Returns neutral loss information associated with any modification for the peptide. | |
const ms_protein * | getProtein (int num) const |
Returns a pointer to a protein that contains this peptide. | |
const std::vector< int > | getProteins () const |
Return a list of hit numbers for the proteins that contain this peptide. | |
int | getQuery () const |
Each peptide is associate with a query. | |
int | getRank () const |
Return the 'rank' of the peptide match. | |
std::string | getSeriesUsedStr () const |
Returns the series used as a string. | |
std::string | getSummedModsNlStr (const PSM psmComponent=PSM_COMPLETE) const |
Returns neutral loss information associated with any summed modification for the peptide. | |
std::string | getSummedModsStr (const PSM psmComponent=PSM_COMPLETE) const |
Summed variable modifications as a string of digits. | |
std::string | getVarModsStr (const PSM psmComponent=PSM_COMPLETE) const |
Variable modifications as a string of digits. | |
bool | isSamePeptideStr (ms_peptide *peptide, bool substituteAmbiguous=true) const |
Returns true if the two peptides are identical. | |
bool | isSameSummedModsStr (ms_peptide *peptide) const |
Returns true if the two summed variable modifications are identical. | |
bool | isSameVarModsStr (ms_peptide *peptide) const |
Returns true if the two variable modifications are identical. | |
ms_peptide & | operator= (const ms_peptide &right) |
C++ assignment operator. | |
void | setLocalModsStr (const std::string str, const PSM psmComponent=PSM_COMPLETE) |
Query-level variable modifications as a string of digits. | |
void | setSummedModsStr (const std::string str, const PSM psmComponent=PSM_COMPLETE) |
Summed variable modifications as a string of digits. | |
void | setVarModsStr (const std::string str, const PSM psmComponent=PSM_COMPLETE) |
Variable modifications as a string of digits. | |
This class encapsulates a peptide from the mascot results file.
This class is used for protein summary and peptide summary results. There is generally no need to create an object of this class. Simply open the results file as ms_proteinsummary or ms_peptidesummary and call ms_proteinsummary::getPeptide() or ms_peptidesummary::getPeptide().
To create an ms_peptide object that is not in a Mascot results file, use ms_aahelper::createPeptide().
enum PSM |
Type of data to return from accessor methods.
Mascot 2.7 and later can identify intact crosslinked peptides. The ms_peptide object contains data for both the peptide sequence(s) and the spectrum match.
The PSM flag specifies the type of data returned from accessor methods. It is used as an optional argument with, for example, getPeptideStr().
See Using enumerated values and static const ints in Perl, Java, Python and C# and Crosslinked search results.
enum PSM_TYPE |
Specifies the search pass and origin of the peptide match.
Mascot 3.0 and later stores the value of this enum in the psm_type columns of the MSR files for each peptide match.
ms_peptide | ( | ) |
Default constructor.
The constructor should generally only be used from within the library.
ms_peptide | ( | int | query, |
int | rank, | ||
int | missedCleavages, | ||
double | mrCalc, | ||
double | delta, | ||
double | observed, | ||
int | charge, | ||
int | numIonsMatched, | ||
std::string | peptideStr, | ||
int | peaksUsedFromIons1, | ||
std::string | varModsStr, | ||
double | ionsScore, | ||
std::string | seriesUsedStr, | ||
int | peaksUsedFromIons2, | ||
int | peaksUsedFromIons3, | ||
const ms_mascotresults * | pResults = 0 , |
||
bool | storeNonPersistantInfo = true |
||
) |
Constructor to initialise most commonly used values.
This constructor should generally only used from within the library.
Not all values can be initialised from this constructor, but this constructor will remain available in future versions of the library.
Consider using ms_aahelper::createPeptide or ms_mascotresults::getPeptide(const int, const int) const instead of this constructor.
query | is the query number in the range 1..ms_mascotresfilebase::getNumQueries() |
rank | is the rank number. See getRank() for details |
missedCleavages | is the number of missed cleavages, or -1 if there was no match |
mrCalc | is the relative mass of the peptide. See getMrCalc() |
delta | is the difference between the calculated and experimental relative masses |
observed | is the observed mass / charge value. See getObserved() |
charge | is the charge state for the parent mass |
numIonsMatched | is the number of ions matched. See getNumIonsMatched() |
peptideStr | is the peptide sequence |
peaksUsedFromIons1 | is the number of peaks used from ions1. See getPeaksUsedFromIons1() |
varModsStr | is the variable modifications as a string of digits. See getVarModsStr() |
ionsScore | is the ions score. See getIonsScore() |
seriesUsedStr | is the ions series used as a string. See getSeriesUsedStr() |
peaksUsedFromIons2 | is the number of peaks used from ions2. See getPeaksUsedFromIons2() |
peaksUsedFromIons3 | is the number of peaks used from ions2. See getPeaksUsedFromIons3() |
pResults | is a pointer to the results object and should be supplied where possible |
storeNonPersistantInfo | should be set to true |
bool anyPercolatorResults | ( | ) | const |
Return true if there are percolator results for this peptide.
See Using Percolator scores for further information.
std::string getAmbiguityString | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Used for X, B and Z residues in source databases where Mascot then substitutes for a residue.
Mascot 2.0 and later will try and substitute ambiguous 'residues' with a residue to get a match. Specifically:
This string is read from the h1_q1_subst=
line of the results file for the protein summary, and from the q1_p1_subst
line of the results file for the peptide summary.
Calling getPeptideStr(true) will return the substituted string.
Calling this function will return a string with the format:
pos1,ambig1,matched1...,posn,ambigm,matchedn
For example:
3,B,N,4,X,A
AFBXK
AFNAK
If the peptide match is a crosslinked peptide, you must specify psmComponent. PSM_CROSSLINK_ALPHA returns the alpha peptide ambiguity string, and PSM_CROSSLINK_BETA returns the beta peptide ambiguity string. If you pass PSM_COMPLETE, the method returns the empty string.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
bool getAnyMatch | ( | ) | const |
Returns true if there was a peptide match to this spectrum.
Internally (and in the results file) this is signified by the first value on the line, which is the number of missed cleavages. See also getMissedCleavages().
void getAnyProteinTermination | ( | bool & | isNterminus, |
bool & | isCterminus, | ||
const PSM | psmComponent = PSM_COMPLETE |
||
) | const |
Determines if the Mascot search matched the peptide as a C or N terminus.
This is determined from the residues string . The residues string is a series of residues sepearated by colons (':'). If an individual residue starts with a hyphen ('-') then the peptide is an N-terminus for a protein. If an individual residue ends with a one ('1') then the peptide is a C-terminus for a protein.
If the peptide match is a crosslinked peptide, you must specify psmComponent. PSM_CROSSLINK_ALPHA returns the values for the alpha peptide, and PSM_CROSSLINK_BETA for the beta peptide. If you pass PSM_COMPLETE, the returned values are false
.
isNterminus | Set to true if the peptide was matched as an N-terminus, otherwise false. |
isCterminus | Set to true if the peptide was matched as a C-terminus, otherwise false. |
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
int getCharge | ( | ) | const |
Returns the charge state for the parent mass.
This will be 0 for an Mr value, or 1,2,3,4 etc. If an error has occurred, then the charge will be -100.
std::string getComponentStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns the quantitation method component name used for the peptide match.
If a search was performed using a 'precursor' quantitation method, then peptide hits may have a quantitation method component name associated with them.
The entry in the results file might be, for example:
q1_p2_comp=light
Component strings are only saved in the results file for methods with 'Isotopes' defined. For example, with the 15N Metabolic [MD] method. In other cases, for example with the SILAC methods, the component needs to be determined from the modifications assigned to the peptide.
To determine if a method has isotopes, use the following code (error handling omitted to make it more readable):
my $resfile = msparser::ms_mascotresfile_msr->new($filename); my $qfile = new msparser::ms_quant_configfile; $qfile->setSchemaFileName( "http://www.matrixscience.com/xmlns/schema/quantitation_2" . " ../html/xmlns/schema/quantitation_2/quantitation_2.xsd" . " http://www.matrixscience.com/xmlns/schema/quantitation_1" . " ../html/xmlns/schema/quantitation_1/quantitation_1.xsd" ); my $numIsotopes = 0; if ($resfile->getQuantitation($qfile)) { my $method = $qfile->getMethodByNumber(0); for (my $idx= 0; $idx < $method->getNumberOfComponents(); $idx++) { my $comp = $method->getComponentByNumber($idx); $numIsotopes += $comp->getNumberOfIsotopes(); } } if ($numIsotopes > 0) { ... expect getComponentStr to return values... }
See also ms_mascotresults::getComponentString
If the match is an intact crosslinked peptide, you must specify psmComponent. PSM_CROSSLINK_ALPHA returns the alpha component and PSM_CROSSLINK_BETA the beta component. If you pass PSM_COMPLETE, the method returns either the alpha component (if alpha component == beta component) or the empty string.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
double getDelta | ( | ) | const |
Returns the difference between the calculated and experimental relative masses.
double getExpectationValue | ( | ) | const |
Returns the expectation value.
int getFirstProtAppearedIn | ( | ) | const |
Returns the hit numberof the first protein that contains this peptide.
ms_linker_site_vector getIntactLinks | ( | ) | const |
Returns a vector of links between crosslinked peptides.
In Mascot 2.7, the number of links between crosslinked peptides is limited to 1.
void getIntactLinks | ( | ms_linker_site_vector & | vec | ) | const |
Returns a vector of links between crosslinked peptides.
In Mascot 2.7, the number of links between crosslinked peptides is limited to 1.
vec | is vector of links, or the empty vector if getNumberOfLinkedPeptides() == 0. |
double getIonsIntensity | ( | ) | const |
Returns the total intensity of all of the ions in the spectrum.
This value is only guaranteed to be valid for ms_peptidesummary when sorting unassigned by intensity and using the ms_peptide object returned by the ms_mascotresults::getUnassigned() function.
The same value is returned by ms_inputquery::getTotalIonsIntensity()
double getIonsScore | ( | ) | const |
Returns the ions score.
Note that ms_protein::getPeptideIonsScore() returns the ions score in the context of the protein match and will generally be slightly lower than the return value from this function. The Mascot results pages display the score returned from this function because results from similar proteins are displayed together.
bool getIsFromErrorTolerant | ( | ) | const |
Returns true if this peptide came from the error tolerant search.
This peptide may have been found as a result of a standard error tolerant or an integrated error tolerant search. See Error tolerant searches.
bool getIsFromLibrary | ( | ) | const |
Returns true if this peptide came from the library search.
This peptide may have been found as a result of a library or an integrated library search. See Spectral libraries.
bool getLessThanMinPepLen | ( | ) | const |
Returns true if the length of the peptide sequence is less than the minimum used for grouping.
If the function returns true, this peptide has not been considered when grouping proteins.
In a crosslinked search, if the match is crosslinked, either alpha or beta sequence must be longer than MinPepLenInPepSummary for the match to be considered in protein grouping. The test for a standard (linear) match is to compare its sequence length, as before.
mascot.dat
. std::string getLocalModsNlStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns neutral loss information associated with any query-level modification for the peptide.
Neutral loss information is encoded in the same way as variable modification neutral losses (getPrimaryNlStr()). The neutral loss string is a string of digits, one digit for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, while numbers from 1 onwards are indices to the neutral loss vector of the modification, ms_modification::getNeutralLoss().
For example, suppose the peptide match has a query-level modification at residue 6, so that the query-level mods string is 000000100
and the neutral loss string is 000000200
. To find the value of neutral loss 2, load the query-level modification object 1 (see getLocalModsStr()) and get the neutral loss vector with ms_modification::getNeutralLoss(). The neutral loss vector is 0-based, so neutral loss 2
is the second element of the vector, at index 1.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
std::string getLocalModsStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Query-level variable modifications as a string of digits.
One digit is used for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, while digits and letters 1..9 and A..W indicate the corresponding modification in the IT_MODS array of the input query. The digits can be used to look up the modification name using ms_inputquery::getLocalVarModName().
For example, suppose the query-level mods string of a peptide match is 000000100
(residue 6 modified). To load the modification object:
1
). Note that query-level modifications can never include error tolerant modifications, so the string never contains an 'X'.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
ms_linker_site_vector getLoopLinks | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns a vector of looplinks.
If this is a linear peptide match, psmComponent is ignored.
If this is a crosslinked peptide match, psmComponent chooses between the alpha or the beta peptide. If psmComponent is PSM_COMPLETE, looplinks for both peptides are returned.
psmComponent | One of ms_peptide::PSM. |
void getLoopLinks | ( | ms_linker_site_vector & | vec, |
const PSM | psmComponent = PSM_COMPLETE |
||
) | const |
Returns a vector of looplinks.
If this is a linear peptide match, psmComponent is ignored.
If this is a crosslinked peptide match, psmComponent chooses between the alpha or the beta peptide. If psmComponent is PSM_COMPLETE, looplinks for both peptides are returned.
vec | is vector of looplinks, which could remain empty. |
psmComponent | One of ms_peptide::PSM. |
int getMissedCleavages | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns the number of missed cleavages.
See also getAnyMatch().
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta).
The psmComponent argument is optional in a non-crosslinked search. For a crosslinked match, it chooses the type of data to return. With PSM_COMPLETE, the method returns -1 if alpha or beta value is -1, and otherwise returns alpha missed cleavages + beta missed cleavages. With PSM_CROSSLINK_ALPHA or PSM_CROSSLINK_BETA, the method returns the alpha or beta value, respectively.
See Crosslinked search results.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
std::string getMonoLinkStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns monolink information associated with a linker used as a variable modification.
Monolink information is encoded in a similar way to variable modifications. If the variable modification at position i is a linker (e.g. Xlink:DSS), the value at position i of the monolink string records the index of the monolink (e.g. "neutral loss" element 1 of Xlink:DSS (K)).
Use the relevant number from this function as an argument to ms_mascotresfilebase::getMonoLinkModification() to determine the monolink delta used for scoring.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
double getMrCalc | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns the calculated relative mass for this peptide .
Takes into account any modifications.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta).
The psmComponent argument is optional in a non-crosslinked search. For a crosslinked match, it chooses the type of data to return. With PSM_COMPLETE, the method returns the mass of the intact molecule: alpha Mr + beta Mr + linker mass. With PSM_CROSSLINK_ALPHA or PSM_CROSSLINK_BETA, the method returns the alpha or beta value, respectively, without the linker.
See Crosslinked search results.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
double getMrExperimental | ( | ) | const |
Returns the observed mz value as a relative mass.
This is equal to getMrCalc() + getDelta(), so note that this will be zero if there was no match because there is no calculated value and no delta. It is generally recommended that you call ms_mascotresfilebase::getObservedMrValue() since it will always return the relative mass, even for no match.
int getNum13C | ( | const double | tol, |
const std::string | tolu, | ||
const std::string | mass_type | ||
) | const |
Returns the number of 13C peaks offset required to get a match with the supplied tolerance.
Sometimes, peak detection chooses the 13C peak rather than the 12C. In extreme cases, it may pick the 13C2 peak. The normal test for a precursor match is
tol > absolute(exp - calc)
Assuming the mass values and tolerance are in Da, and the PEP_ISOTOPE_ERROR field is set to 1, the test will also succeed for
tol > absolute(exp - calc - 1)
If the PEP_ISOTOPE_ERROR field is set to 2, the test will succeed for the above two conditions, plus
tol > absolute(exp - calc - 2)
This means that you can use a tight mass tolerance and still get a match to a 13C peak. If you are using a very high accuracy instrument, note that the precise shifts are the carbon isotope spacings of 1.0033548 and 2.0067096, rather than 1 and 2. However, if average rather than isotopic masses are specified in the search, the values will be 0.9926548 and 1.9853096
tol | Is the tolerance for the search. The return value from matrix_science::ms_searchparams::getTOL() may be used. |
tolu | Is the units of the tolerance supplied when performing the search. The return value from matrix_science::ms_searchparams::getTOLU() may be used. |
mass_type | will either be Monoisotopic or Average. The return value from matrix_science::ms_searchparams::getMASS() may be used. |
int getNumberOfLinkedPeptides | ( | ) | const |
Returns the number of peptides in this peptide-spectrum match.
In Mascot 2.7, there are two possible return values:
int getNumIonsMatched | ( | ) | const |
Returns the number of ions matched.
Mascot begins by selecting a small number of experimental peaks on the basis of normalised intensity. It calculates a probability based score according to the number of matches. It then increases the number of selected peaks and recalculates the score. It continues to iterate until it is clear that the score can only get worse. It then reports the best score it found, which should correspond to an optimum selection, taking mostly real peaks and leaving behind mostly noise.
Mascot is not trying to find all possible matches in the spectrum. Many spectra have "peak at every mass" noise, and can match any ion series from any sequence if there is no intensity discrimination. So, you may look at a peptide view report and see obvious matches that are unlabelled. However, if the peak selection was to be extended to include these additional matches, it would also have to include a number of additional noise peaks, and the score would decrease.
int getNumProteins | ( | ) | const |
Returns the number of proteins that contains this peptide.
double getObserved | ( | ) | const |
Returns the observed mass / charge value.
int getPeaksUsedFromIons1 | ( | ) | const |
Returns number of peaks used from ions1
.
It is possible, but unusual, to specify which ions series particular ions come from: https://www.matrixscience.com/help/sq_help.html#IONS
If a search specifies that some ions are from the b series, some are from the y series and that some are from any series then these will be stored separately in Ions1
, Ions2
and Ions3
. The number of matches to each set of ions is avaible using getPeaksUsedFromIons1(), getPeaksUsedFromIons2() and getPeaksUsedFromIons3().
For most searches, getPeaksUsedFromIons1() is the only function that needs to be used.
ions1
. int getPeaksUsedFromIons2 | ( | ) | const |
Returns number of peaks used from ions2
.
It is possible, but unusual, to specify which ions series particular ions come from: https://www.matrixscience.com/help/sq_help.html#IONS
If a search specifies that some ions are from the b series, some are from the y series and that some are from any series then these will be stored separately in Ions1
, Ions2
and Ions3
. The number of matches to each set of ions is avaible using getPeaksUsedFromIons1(), getPeaksUsedFromIons2() and getPeaksUsedFromIons3().
For most searches, getPeaksUsedFromIons1() is the only function that needs to be used.
ions2
. int getPeaksUsedFromIons3 | ( | ) | const |
Returns number of peaks used from ions3
.
It is possible, but unusual, to specify which ions series particular ions come from: https://www.matrixscience.com/help/sq_help.html#IONS
If a search specifies that some ions are from the b series, some are from the y series and that some are from any series then these will be stored separately in Ions1
, Ions2
and Ions3
. The number of matches to each set of ions is avaible using getPeaksUsedFromIons1(), getPeaksUsedFromIons2() and getPeaksUsedFromIons3().
For most searches, getPeaksUsedFromIons1() is the only function that needs to be used.
ions3
. int getPeptideLength | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns the length in residues of the sequence found for the peptide.
This should be faster than calling getPeptideStr() and determining the length of the returned string.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
std::string getPeptideStr | ( | bool | substituteAmbiguous = true , |
const PSM | psmComponent = PSM_COMPLETE |
||
) | const |
Returns the sequence found for the peptide.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta). The psmComponent argument chooses the type of data to return. If you pass PSM_COMPLETE, the method returns a concatenated sequence: alpha sequence + "][" + beta sequence. The characters "][" represent the alpha C-terminus and beta N-terminus.
The psmComponent argument is optional in a non-crosslinked search.
See Crosslinked search results.
substituteAmbiguous | If true, and if there were any ambiguous residues, then the returned result will have the substituted residues rather than an 'X', 'B' or 'Z'. |
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
void getPercolatorScores | ( | double * | posteriorErrorProbability, |
double * | qValue, | ||
double * | internalPercolatorScore, | ||
double * | mascotIonsScore | ||
) | const |
Returns the percolator scores and original Mascot ions score.
See Using Percolator scores for further information and Multiple return values in Perl, Java, Python and C#.
[out] | posteriorErrorProbability | is the probability that individual match with this Percolator score is random match. |
[out] | qValue | is equivalent to the FDR if all matches with this Percolator score or higher were accepted |
[out] | internalPercolatorScore | is not used in Mascot Parser. See Percolator documentation. |
[out] | mascotIonsScore | is the ions score that Mascot originally assigned. |
int getPrettyRank | ( | ) | const |
Similar to getRank() except that equivalent scores get the same rank.
For a peptide summary, the top 10 peptide matches for each query are saved. These are scored with rank 1 to 10, and the rank can be obtained using the getRank() function. However, if say, the top three matches are the same, it is generally better to say (in a report) that these are all rank 1. The following table shows an example:
Rank | Pretty Rank | Score | Peptide | Protein |
---|---|---|---|---|
1 | 1 | 78 | ABCDEFG | |
2 | 1 | 78 | ABCDEGF | gi|123456 |
3 | 1 | 78 | BACDEFG | |
4 | 4 | 65 | ASDADSD | |
5 | 4 | 65 | SDFSGSD | |
6 | 6 | 12 | DFGHDFG | |
7 | 7 | 8 | SSDFDFD | |
8 | 8 | 7 | RTYRYRY | |
9 | 9 | 4 | RTYRYRY | |
10 | 10 | 2 | TYUTUTU |
For a protein summary, getPrettyRank() will always return the same as getRank().
std::string getPrimaryNlStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns neutral loss information associated with any modification for the peptide.
Neutral loss information is encoded in a similar way to variable modifications. Use the relevant number from this function as an offset into the array returned from ms_searchparams::getVarModsNeutralLosses() to determine the neutral loss used for scoring.
The neutral loss string is a string of digits, one digit for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, 1 indicates NeutralLoss1
, 2 indicates NeutralLoss2
etc., in the masses
section.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
const ms_protein * getProtein | ( | int | num | ) | const |
Returns a pointer to a protein that contains this peptide.
It is normally better to use ms_mascotresults::getAllProteinsWithThisPepMatch() than this function.
This function returns all the proteins that are seen in the top level report that contain this peptide. If grouping is enabled, it will only return proteins that are returned from ms_mascotresults::getHit – it will not return proteins that would be returned from ms_mascotresults::getNextSimilarProtein().
The function will only return proteins up to the number of hits that were requested when creating the ms_mascotresults object.
See Maintaining object references: two rules of thumb.
num | should be in the range 1.. getNumProteins() or this function will return a null value. |
const std::vector< int > getProteins | ( | ) | const |
Return a list of hit numbers for the proteins that contain this peptide.
int getQuery | ( | ) | const |
Each peptide is associate with a query.
int getRank | ( | ) | const |
Return the 'rank' of the peptide match.
Each spectrum will match to a number of different peptides. The best (highest scoring) match will have a rank value of 1.
In a peptide summary, this will be in the range 1..10.
In a protein summary, this will be the hit number (1..50).
std::string getSeriesUsedStr | ( | ) | const |
Returns the series used as a string.
The string is a set of 0s and 1s and 2s.
The 'position' in the string is used to indicate which series:
For example, 00020010000000000
would indicate b series used for scoring, and y series significant, but not used for scoring. Note that earlier versions of Mascot did not look for all these ions series, so the string will not necessarily be 17 characters long.
std::string getSummedModsNlStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Returns neutral loss information associated with any summed modification for the peptide.
Neutral loss information is encoded in a similar way to summed modifications. Use the relevant number from this function as an offset into the array returned from ms_searchparams::getVarModsNeutralLosses() to determine the neutral loss used for scoring.
The neutral loss string is a string of digits, one digit for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, 1 indicates NeutralLoss1
, 2 indicates NeutralLoss2
etc., in the masses
section.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
std::string getSummedModsStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Summed variable modifications as a string of digits.
When two modifications occur at the same site, the extra modification is stored in the q1_p1_summed_mods=
string in the results file. Summed mods are only supported when using quantitation methods with exclusive modifications. The exclusive modification will always be returned by getVarModsStr(), while getSummedModsStr() is used for post translational modifications to the same site.
One digit is used for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, 1 indicates delta1, 2 indicates delta2 etc., in the masses section. An 'X' is used to indicate an error tolerant modification that can be retrieved using matrix_science::ms_peptidesummary::getErrTolModName().
To support numbers greater than 9, the letters A..W are permitted, with A being 10 and W being 32.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta). The psmComponent argument chooses the type of data to return.
The psmComponent argument is optional in a non-crosslinked search.
See Crosslinked search results.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
std::string getVarModsStr | ( | const PSM | psmComponent = PSM_COMPLETE | ) | const |
Variable modifications as a string of digits.
One digit is used for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, 1 indicates delta1, 2 indicates delta2 etc., in the masses section. An 'X' is used to indicate an error tolerant modification that can be retrieved using matrix_science::ms_peptidesummary::getErrTolModName().
To support numbers greater than 9, the letters A..W are permitted, with A being 10 and W being 32.
See also Increase in the number of variable modifications.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta). The psmComponent argument chooses the type of data to return.
The psmComponent argument is optional in a non-crosslinked search.
See Crosslinked search results.
psmComponent | Type of data to return: complete molecule, alpha peptide or beta peptide. |
bool isSamePeptideStr | ( | ms_peptide * | peptide, |
bool | substituteAmbiguous = true |
||
) | const |
Returns true if the two peptides are identical.
Calling this function may be faster than calling getPeptideStr() twice and doing the compare on the returned peptide.
peptide | is a pointer to the peptide object with the sequence to compare. |
substituteAmbiguous | should be set to true to compare sequences after substituting any X for the residue used to get a match. |
bool isSameSummedModsStr | ( | ms_peptide * | peptide | ) | const |
Returns true if the two summed variable modifications are identical.
Calling this function may be faster than calling getSummedModsStr() twice and doing the compare on the returned peptide.
peptide | is a pointer to the peptide object with the summed mods string to compare. |
bool isSameVarModsStr | ( | ms_peptide * | peptide | ) | const |
Returns true if the two variable modifications are identical.
Calling this function may be faster than calling getVarModsStr() twice and doing the compare on the returned peptide.
peptide | is a pointer to the peptide object with the mods string to compare. |
void setLocalModsStr | ( | const std::string | str, |
const PSM | psmComponent = PSM_COMPLETE |
||
) |
Query-level variable modifications as a string of digits.
One digit is used for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, while digits and letters 1..9 and A..W indicate the corresponding modification in the IT_MODS array of the input query.
Note that query-level modifications can never include error tolerant modifications, so the string never contains an 'X'.
Setting the string does not cause a re-calculation of the peptide mass – this needs to be done manually.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta). The psmComponent argument chooses the type of data to set. The caller is responsible for data consistency between PSM_COMPLETE, PSM_CROSSLINK_ALPHA and PSM_CROSSLINK_BETA.
The psmComponent argument is optional in a non-crosslinked search.
See Crosslinked search results.
str | is the new query-level variable modifications as a string of digits. |
psmComponent | Type of data to set: complete molecule, alpha peptide or beta peptide. |
void setSummedModsStr | ( | const std::string | str, |
const PSM | psmComponent = PSM_COMPLETE |
||
) |
Summed variable modifications as a string of digits.
When two modifications occur at the same site, the extra modification is stored in the q1_p1_summed_mods=
string in the results file. Summed mods are only supported when using quantitation methods with exclusive modifications. The exclusive modification will always be returned by getVarModsStr(), while getSummedModsStr() is used for post translational modifications to the same site.
One digit is used for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, 1 indicates delta1, 2 indicates delta2 etc., in the masses section. An 'X' is used to indicate an error tolerant modification that can be retrieved using matrix_science::ms_peptidesummary::getErrTolModName().
To support numbers greater than 9, the letters A..W are permitted, with A being 10 and W being 32.
See also Increase in the number of variable modifications.
Setting the string does not cause a re-calculation of the peptide mass – this needs to be done manually.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta). The psmComponent argument chooses the type of data to set. The caller is responsible for data consistency between PSM_COMPLETE, PSM_CROSSLINK_ALPHA and PSM_CROSSLINK_BETA.
The psmComponent argument is optional in a non-crosslinked search.
See Crosslinked search results.
str | is the new variable modifications as a string of digits. |
psmComponent | Type of data to set: complete molecule, alpha peptide or beta peptide. |
void setVarModsStr | ( | const std::string | str, |
const PSM | psmComponent = PSM_COMPLETE |
||
) |
Variable modifications as a string of digits.
One digit is used for the N terminus, one for each residue and one for the C terminus. Each digit specifies the modification used to obtain the match: 0 indicates no modification, 1 indicates delta1, 2 indicates delta2 etc., in the masses section. An 'X' is used to indicate an error tolerant modification that can be retrieved using matrix_science::ms_peptidesummary::getErrTolModName().
To support numbers greater than 9, the letters A..W are permitted, with A being 10 and W being 32.
See also Increase in the number of variable modifications.
Setting the string does not cause a re-calculation of the peptide mass – this needs to be done manually.
In a crosslinked search, the peptide-spectrum match may contain one linear peptide or the complete crosslinked peptide (alpha and beta). The psmComponent argument chooses the type of data to set. The caller is responsible for data consistency between PSM_COMPLETE, PSM_CROSSLINK_ALPHA and PSM_CROSSLINK_BETA.
The psmComponent argument is optional in a non-crosslinked search.
See Crosslinked search results.
str | is the new variable modifications as a string of digits. |
psmComponent | Type of data to set: complete molecule, alpha peptide or beta peptide. |