Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by John Cottrell (May 8, 2013)

Site analysis

Site analysis in the Peptide View report is based on the score differences between matches with different arrangements of modifications. This started off as a rule of thumb, but was given a quantitative basis by Bernhard Kuster’s group at TUM, who analysed a collection of synthetic analogs of natural phosphopeptides to determine false localisation rate as a function of score difference (Confident Phosphorylation Site Localization Using the Mascot Delta Score).

Although the concept is simple, some aspects are far from self-evident. The first thing to clarify is that site analysis is only performed when three conditions are fulfilled:

1. The top peptide match to a spectrum carries one or more variable modifications for which alternative arrangements are possible
If a search includes Phospho (ST) and Phospho (Y) as variable modifications, and the top-ranking peptide match has two serines and no threonines or tyrosines, you will only see site analysis reported if one of the serines carries a phosphate. If the peptide mass corresponds to zero or two phosphates, then no alternative arrangements are possible and there will be no site analysis.
2. The score for the top match is significant
Site analysis seems pointless unless you are confident of the sequence.
3. There is at least one further match to an alternative arrangement of modifications, which need not be significant
Mascot only saves a maximum of 10 matches per spectrum, which means the lowest scoring match can still have quite a high score. Imagine that the top match is significant, with a score of 42 and the tenth match has a score of 28. If there is no match for an alternative arrangement, all we can say is that the score for the best alternative arrangement must be between 0 and 28, which is quite a wide range and could have a marked effect on the calculation. On the other hand, if the difference in scores between the top and bottom matches is more than 30, it makes little difference to the site analysis whether the score for the best alternative match is just below that of the tenth match or zero, so this condition is likely to be modified or dropped in a future release.

Site analysis is only ever performed for the highest scoring peptide sequence match. If there are significant matches to more than one sequence, this could be because the spectrum is chimeric, and contains fragments from co-eluting, isobaric peptides. Another possibility is that there are few or no peaks in the spectrum to distinguish between the peptides. Or, maybe one or more of the matches are false positives. In such cases, site analysis for the top match is on slightly shaky ground, never mind site analysis for anything other than the top match.

The higher the mass accuracy, the less likely you are to see significant matches to more than one sequence. One exception is when deamidation is included in the search, because it is common to get peptide sequences in the database that differ only in N<->D and Q<->E. Beware of searching with both deamidation and a non-zero setting for #13C, which is meant to allow for the wrong peak of the isotopic distribution being used for the precursor mass. Unless you have extremely high mass accuracy, this can lead to secondary matches to sequences that are deamidated when they should just have a 1 or 2 Da error on the precursor mass or vice versa. The correct match will usually have the higher score because of better fragment ion matching, but this can still be a source of confusion.

Site analysis doesn’t attempt to distinguish between site uncertainty and site occupancy. That is, if peptide contains two phosphorylation sites and one phosphate, and we obtain matches with similar scores for both arrangements, this could be because of lack of information (no peaks to distinguish the two possibilities) or it could be because the sample is a mixture of the two forms (peaks for both possibilities are present). We recognise it would be useful to be more specific about this, and its something we’ll look at for a future release.

Finally, remember that site analysis only considers the modifications selected for the search. Imagine a search where we specify Phospho (ST) and get a match to a peptide with sequence xSxxxxxSYxxxxxxxxTx. Site analysis reports the phosphate is 99% localised on S8. But, if the search was repeated with Phospho (Y) included, the site analysis could easily change to 50% for S8 and 50% for Y9. Site analysis is only meaningful if all the sites for the target modification are included in the search. This is a particular challenge for something like methylation, which is listed in Unimod as being observed on 11 different residues.

Keywords: ,

Comments are closed.