Non-specific modifications
Sometimes, we need to search for modifications that are non-specific or of unknown specificity. This is not the same as searching for unsuspected modifications, which can be done with an error tolerant search.
You might be tempted to create a modification such as FuzzyMod (ACDEFGHIKLMNPQRSTVWY). If you try to use such a modification in a search, the Mascot search will either take forever, or Mascot can even run out of memory or address space because of the combinatorial explosion.
If modification of any residue is possible then, for a single 20 residue peptide from the database, there are 220 possible modified peptides that need to be tested to see if they fit to the precursor mass and, if they do, matched to the MS/MS spectrum. This increases the search space by a factor of 1 million. Even if the code can handle this in a reasonable amount of memory, you can’t escape from the fact that any match needs to be 1 million times better than if you were searching without the modification to be statistically significant.
If multiple non-specific modifications per peptide are expected, this is a very difficult problem for database search, probably intractable. A truly non-specific modification is going to produce a population of peptides which won’t all be modified in the same way. Many arrangements of the modification will be represented, and sets of these will be isobaric. Thus, each MS/MS spectrum will contain fragments from a mixture of precursors and so will be of poor quality in terms of database matching.
If the modification is relatively rare, and you don’t expect more than one instance on most peptides, such as a cross-linker being used to interrogate protein-protein interactions, the problem is much simpler:
- Add the modification to your local Mascot server for all possible specificities, but don’t group them. This creates up to 20 separate new modifications.
- Make sure the MS/MS data set includes some unmodified peptides so that you get a hit to the protein. If necessary, spike in some unmodified protein.
- Perform an automatic error-tolerant search.
If the spectra are good, the highest scoring match for a peptide will have the modification at the correct location. If the spectra are not so good, you may get matches with similar scores for a range of possible locations. Note that the error tolerant search will not give matches to peptides that carry the new modification with different specificities; a peptide with two FuzzyMod (H) would be OK but you wouldn’t match a peptide with FuzzyMod (H) and FuzzyMod (M).