Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Machine learning core features

One of the requirements of refining results with machine learning is a set of features (or metrics) for each peptide-spectrum match. When enabled, Percolator uses the features in finding an optimal separation between correct and incorrect matches.

Two types of features are available: core features calculated by Mascot; and features predicted from physico-chemical properties of peptides. Features calculated by Mascot are described below. For predicted features, please refer to Predicting retention time and spectral similarity with MS2Rescore.

Core features calculated by Mascot

Core features calculated by Mascot are always available and always enabled. The complete set of features that can be made available to Percolator is defined in code. You can choose a sub-set of these features using a setting in the Options section of the Mascot configuration file, mascot.dat. The default setting, as shipped, is:

PercolatorFeatures dM, mScore, MIT, MHT, peptideLength, z1, z2, z4, z7, isoSysDM, isoSysDMppm, isoSysDMz, 12C, mc0, mc1, mc2, varmods, varmodsCount, totInt, intMatchedTot, relIntMatchedTot, RMS, RMSppm, meanAbsFragDa, meanAbsFragPPM, rawScore

Features are calculated by a Mascot executable called ms-createpip.exe. The full list of supported features are:

List of features available to Percolator

Feature name Description
retentionTime Retention time in seconds if available
dM Calculated minus observed peptide mass in Da
mScore Mascot score (always on)
lgDScore Mascot score minus Mascot score of next best non-isobaric peptide hit
mrCalc Calculated Mr
charge Charge
dMppm Calculated minus observed peptide mass in ppm
absDM Absolute value of calculated minus observed peptide mass in Da
absDMppm Absolute value of calculated minus observed peptide mass in ppm
isoDM Absolute value of calculated minus observed peptide mass, after eliminating possible isotope errors up to 2 Da, in Da
isoDMppm Absolute value of calculated minus observed peptide mass, after eliminating possible isotope errors up to 2 Da, in ppm
isoDmz Absolute value of calculated minus observed peptide m/z
isoSysDM Same as isoDM but corrected for systematic offset across all peptide matches.
isoSysDMppm Same as isoDMppm but corrected for systematic offset across all peptide matches.
isoSysDmz Same as isoDMz but corrected for systematic offset across all peptide matches.
mc Number of missed cleavages (always 0 if no enzyme)
varmods Number of modified sites divided by number of modifiable sites (set to 0 if number of modifiable sites is 0)
varcount Number of distinct varmods present
varmodsCount The number of variable mods used in the peptide. That is, if there are 10 Met and 5 of these are oxidised, this counts as varmodsCount=1. A peptide with Met-OX, phosphoS, deamidation, and acetylation, would count as varmodsCount=4.
modifiable Total number of modifiable sites
modified Total number of modified residues and terminii
totInt Log total ion intensity. The 20 most intense peaks in each 100 Da bin are used for all features, and totInt reports this value
intMatchedTot Log total matched ion intensity
relIntMatchedTot Total matched ion intensity divided by total ion intensity as a percentage (no logs involved)
fragDeltaMed Median value of all matched fragment errors in Da
fragDeltaIqr Interquartile range value of all matched fragment errors in Da
fragDeltaMedPPM Median value of all matched fragment errors in ppm
fragDeltaIqrPPM Interquartile range value of all matched fragment errors in ppm
fragDeltaPolyFit 2nd order polynomial fit to m/z vs delta. Result is RSquared multiplied by the number of points divided by 100
longest Longest sequence matched ions, reported separately for each ion series (backbone only), as with fracIonsMatched
fracIonsMatched Fraction of calculated ions matched, reported separately for each ion series, with NLs lumped together (e.g. fracIonsMatchedB1, fracIonsMatchedB1deriv, fracIonsMatchedB2, fracIonsMatchedB2deriv)
matchedIntensity Matched ion intensity, reported separately for each ion series, as with fracIonsMatched
qmatch The number of peptide matches for which an ms-ms match was attempted
MIT Mascot identity threshold
MHT Mascot homology threshold
peptideLength Peptide length
z1 1 if charge = 1
z2 1 if charge = 2 or 3
z4 1 if charge = 4, 5, or 6
z7 1 if charge = 7 or more
12C 1 if peptide mass is 12C value (no isotope error)
mc0 1 if missed cleavages = 0 or if no enzyme
mc1 1 if missed cleavages = 0 or 1
mc2 1 if missed cleavages = 2 or more
RMS RMS m/z error for matched fragments
RMSppm RMS ppm error for matched fragments
meanAbsFragDa Mean absolute m/z error for matched fragments
meanAbsFragPPM Mean absolute PPM error for matched fragments
rawscore Simple binomial score using matches to main series sequence ions and p = 2*ITOL*n/100 where n is the number of peaks selected in each 100 Da bin
peptide The peptide string that was matched interpolated with numbers to represent modifications, e.g. X.DAKAAM1AGRLM1IR.X
proteins A tab separated list of accessions of proteins that contain this peptide. Must be last feature in list