Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by John Cottrell (September 19, 2013)

Modifications round-up, part 1

Much of the complexity in Mascot is associated with modifications. It can be hard to find information about some of aspects of handling modifications unless you already know what you are looking for. In this blog article, the first of two, I’ll collect together some of the topics that come up frequently in support emails. Note that Site analysis was covered in an earlier article.

Limits on the number of variable modifications allowed in a search

There are separate limits for standard searches and error tolerant searches. If Mascot security is disabled, these are global limits set in mascot.dat: MaxVarMods (default 9) and MaxEtVarMods (default of 2). If Mascot security is enabled, these limits can also be set at security group level, in which case changing the mascot.dat settings may have no effect. The limit of 2 can be a bit restrictive for an error tolerant search, and I usually increase this to 4 or 6. The limit of 9 for a standard search is reasonable.

The combinatorial explosion resulting from too many variable modifications is a real problem, leading to searches that take forever and a serious loss of sensitivity. Typically, you pick up a few additional matches to highly modified peptides but lose a much larger number of matches to peptides from low level proteins because of the increase in the score required for a match to be significant. On the other hand, a simple limit on the total number of variable modifications is a crude tool, because different modifications can have very different effects. Modifications that only apply to a protein terminus, such as Acetyl (Protein N-term), or to a specific residue at a peptide terminus, such as Gln->pyro-Glu (N-term Q), cause a negligible increase in the search space. The dramatic effects come from modifications that apply to multiple residues, independent of location, such as Phospho (ST) or Methyl (DE). These are the ones that need to be used very sparingly.

Fixed and variable modifications with the same specificity

You cannot have two fixed modifications in a search with the same specificity. That is, you cannot have both Carbamidomethyl (C) and Propionamide (C) as fixed modifications. If you specify one of these as fixed and the other as variable, you will get matches where C is modified with one or the other but never unmodified. If you also want to match peptides with unmodified C, both modifications need to be variable.

A frequently reported problem is getting the error message "Modification conflict" when submitting an iTRAQ or TMT search. This is because the labels are specified in the quantitation method as fixed modifications, and you’ve either selected them a second time in the search form or chosen another fixed modification with either K or N-term specificity.

Adding a new modification

Mascot takes its modifications from the Unimod database. The only way to add a new modification to the search form on the free, public Mascot Server is to add it to the public Unimod. For an in-house Mascot Server, if you don’t see the modification you require, first check the public Unimod database. If the modification is there, you just need to download a new unimod.xml file to your local Mascot Server. If the modification is not in Unimod, you can either add it to the public Unimod database and wait for the downloadable file to be rebuilt or add it to your local unimod.xml file using the Mascot configuration editor (follow the link under Mascot Utilities on your local Mascot home page).

Grouping of modifications

You’ll have noticed that some modifications have multiple specificities, e.g. Phospho (ST) and Methyl (DE). This is achieved by giving the specificities the same non-zero group number. If you look at the entry for Phospho in the configuration editor, you’ll see that S and T are both group 1 while Y is group 2. The rules governing grouping are:

  • Only simple residue specificities can be grouped, (Position = Anywhere). You cannot group specificities such N-term Q or group residue with terminus specificities.
  • The neutral loss definitions must be identical. This is why Phospho (Y) cannot be grouped with Phospho (ST).

Non-specific modifications

Sometimes, we need to search for modifications that are non-specific or of unknown specificity. Having read the preceding note about grouping, you might be tempted to create a modification such as FuzzyMod (ACDEFGHIKLMNPQRSTVWY). If you try to use such a modification in a search, Mascot will almost certainly run out of memory or address space and crash because of the combinatorial explosion.

Consider the numbers. If modification of any residue is possible then, for a single 20 residue peptide from the database, there are 220 possible modified peptides that need to be tested to see if they fit to the precursor mass and, if they do, matched to the MS/MS spectrum. This increases the search space by a factor of 1 million. Even if the code can handle this in a reasonable amount of memory, you can’t escape from the fact that any match needs to be 1 million times better than if you were searching without the modification to be statistically significant.

If multiple modifications per peptide are expected, this is a very difficult problem for database search, probably intractable. A truly non-specific modification is going to produce a population of peptides which won’t all be modified in the same way. Many arrangements of the modification will be represented, and sets of these will be isobaric. Thus, each MS/MS spectrum will contain fragments from a mixture of precursors and so will be of poor quality in terms of database matching.

If the modification is relatively rare, and you don’t expect more than one instance on most peptides, such as a cross-linker being used to interrogate protein-protein interactions, the problem is much simpler:

  • Add the modification to your local Mascot server for all possible specificities, but don’t group them. This creates up to 20 separate new modifications.
  • Make sure the MS/MS data set includes some unmodified peptides so that you get a hit to the protein. If necessary, spike in some unmodified protein.
  • Perform an automatic error-tolerant search.

If the spectra are good, the highest scoring match for a peptide will have the modification at the correct location. If the spectra are not so good, you may get matches with similar scores for a range of possible locations. Note that the error-tolerant search will not give matches to peptides that carry the new modification with different specificities; a peptide with two FuzzyMod (H) would be OK but you wouldn’t match a peptide with FuzzyMod (H) and FuzzyMod (M).

Neutral losses

If you have ever used the configuration editor to add or edit a modification, you may have been puzzled by the neutral loss options

Scoring
A neutral loss from the MS/MS fragments. The resultant fragments are considered for scoring, e.g. y-98 or b-98 for phosphopeptides. There can be up to 10 scoring neutral losses. During a search, Mascot iterates through the scoring neutral losses. The one that gives the highest score is chosen, and all the other neutral losses are treated as Satellite.
Satellite
A neutral loss specified as satellite is never considered for scoring. If a Satellite neutral loss gives a match to a peak, that peak is removed from the list of noise peaks, which improves the score. We introduced this when we first added support for multiple neutral losses because it looked like it might be useful, but none of the standard modifications in Unimod currently have satellite neutral losses.
Peptide
A neutral loss from the intact peptide precursor. This peak is matched and so not treated as a noise peak for scoring purposes.
Required Peptide
A required peptide neutral loss must be present in the spectrum. This carries some risk, because a perfectly good match could be rejected if this peak happened to be missing.

Keywords: , , ,

Comments are closed.