Variable Modifications in Mascot 2.7
Most protein samples will exhibit some degree of modification which needs to be considered when carrying out a database search
In this article we’ll take a look at some important changes we introduced in Mascot 2.7 in how Mascot handles variable modifications.
Variable modification permutation in Mascot 2.6 and earlier
In Mascot 2.6 or earlier, variable modification permutation is handled in the following way:
- No upper limit on the number of modified sites
- Permutation has built in limits
- If there are less than 16 possible arrangements, all are tested
- Otherwise a sliding window is used since testing all possibilities would be too slow
Variable modification permutation in Mascot 2.7
Modification iteration in Mascot 2.7 uses a single, consistent method – it no longer switches between two methods. This is controlled by 3 user definable parameters, described in table 1 below, with default values which give similar depth and speed of search to Mascot 2.6 or earlier:
Parameter | Description | Default value |
---|---|---|
MaxPepNumVarMods | Max no. of different variable modifications per peptide | 3 |
MaxPepNumModifiedSites | Max no. of modified residues per peptide | 5 |
MaxPepModArrangements | Max no. of arrangements of an individual varmod composition | 64 |
There are two main cases for changing the default values for these parameters:
- For lightly modified samples, or where site localisation isn’t important, you can decrease the limits to reduce search time
- Increase the limits to improve modification site analysis, or if you’re looking at a highly modified sample
Example 1: Speeding up an error tolerant search
One of the consequences of these changes are that we can speed up error tolerant searches. To test this, we took a subset of a Mouse label free dataset from the PRIDE public repository (PXD013086) and processed using Mascot Distiller to give us a peaklist containing approximately 20,000 MS/MS spectra. The samples are lightly modified, with the PRIDE annotation specifying Oxidation and Deamidation as variable modifications. We then carried out error tolerant searches using Mascot 2.6 and 2.7 on equivalent hardware to compare search speed. With Mascot 2.7 we searched using the default modification permutation settings, and tighter settings to try and speed up the search. Results are summarised in table 2 below:
Mascot Version | ModArrangements | NumModifiedSites | NumVarMods | Search time (min) | Speed improvement (%) |
---|---|---|---|---|---|
2.6 | N/A | N/A | N/A | 95 | — |
2.7 | 64 | 5 | 3 | 79 | 17 |
2.7 | 32 | 3 | 3 | 61 | 35 |
Because the sample was only lightly modified, we were able to significantly speed up the search whilst also retaining results by tightening the modification permutation settings in Mascot 2.7.
Example 2: Improve site analysis
To look at how we can use the new parameters to improve site analysis in Mascot 2.7 we took a single file from a middle down Human Histone H4 dataset from the PRIDE public repository (PXD008296). The study was looking at changes in modification patterns across the cell cycle on the N-terminal 23 residues. The variable modifications used for the search were taken from the paper:
- Acetyl (K),Acetyl (Protein N-term),Phospho (ST),Dimethyl (K), Methyl (K),Methyl (R), Trimethyl (K)
- SGRGKGGKGLGKGGAKRHRKVLR
The selected datafile was processed with Mascot Distiller, and the fragment ions decharged to singly charged. The resulting peaklist was searched using Mascot 2.6 and Mascot 2.7. The number of proteoforms (modification patterns) identified are presented in table 3 below:
Mascot Version | NumModifiedSites | NumVarMods | ModArrangements | No. proteoforms |
---|---|---|---|---|
2.6 | N/A | N/A | N/A | 81 |
2.7 | 11 | 7 | 64 | 106 |
2.7 | 11 | 7 | 128 | 109 |
2.7 | 11 | 7 | 256 | 109 |
2.7 | 11 | 7 | 512 | 109 |
As you can see, by increasing the permutation settings in Mascot 2.7, we’ve identified significantly more proteoforms than we found with Mascot 2.6. In addition, we can see many instances in the results where site localisation is improved in Mascot 2.7.
Figure 1 below shows a high scoring match from the Mascot 2.6 search. This reports methylation of arginines 17 and 19, however these residues are not Methylated in vivo, and the sequence ladder shows no fragment ion matches over this region. This is a case where the Mascot 2.6 has used the sliding window permutation method has probably inaccurately clustered modifications.
Click to view full size image
Figure 1: Match to an MS/MS spectrum from the Mascot 2.6 search.
Figure 2 shows the match to the same spectrum from Mascot 2.7 which has reported Acetylation on Lysine 16 and Dimethylation of Lysine 20 – these modifications are known to occur in vivo, and the sequence ladder shows fragment ion matches across the whole region.
Click to view full size image
Figure 2: Match from Mascot 2.7 to the same MS/MS peaklist as shown in figure 1.
So, by increasing the variable modification permutation limits in Mascot 2.7, we’ve been able to get more and better results in a search of a highly modified sample than we could with Mascot 2.6 or earlier.
You can find more details about the new variable modification permutation options on this help page, and a more detail look at the data presented in this article in this presentation
Keywords: error tolerant, modification, permutation, site analysis, variable modifications