Posted by Patrick Emery (July 15, 2020)

Variable Modifications in Mascot 2.7

Most protein samples will exhibit some degree of modification which needs to be considered when carrying out a database search

In this article we’ll take a look at some important changes we introduced in Mascot 2.7 in how Mascot handles variable modifications.

Variable modification permutation in Mascot 2.6 and earlier

In Mascot 2.6 or earlier, variable modification permutation is handled in the following way:

No upper limit on the number of modified sites
Permutation has built in limits
- If there are less than 16 possible arrangements, all are tested
- Otherwise a sliding window is used since testing all possibilities would be too slow

This approach generally works well. If a peptide has less than 16 possible variable modifications permutations, all possibilities are tested. However, if permutation has switched to the sliding window method, then this tends to cluster modifications on adjacent modifiable sites, often stopping before 16 different permutations have been tested.

Variable modification permutation in Mascot 2.7

Modification iteration in Mascot 2.7 uses a single, consistent method – it no longer switches between two methods. This is controlled by 3 user definable parameters, described in table 1 below, with default values which give similar depth and speed of search to Mascot 2.6 or earlier:

Parameter	Description	Default value
MaxPepNumVarMods	Max no. of different variable modifications per peptide	3
MaxPepNumModifiedSites	Max no. of modified residues per peptide	5
MaxPepModArrangements	Max no. of arrangements of an individual varmod composition	64

Table 1: The new parameters which control variable modification permutation in Mascot 2.7. Values can be set globally in Mascot.dat, or locally in the MGF file header

There are two main cases for changing the default values for these parameters:

For lightly modified samples, or where site localisation isn’t important, you can decrease the limits to reduce search time
Increase the limits to improve modification site analysis, or if you’re looking at a highly modified sample

Example 1: Speeding up an error tolerant search

One of the consequences of these changes are that we can speed up error tolerant searches. To test this, we took a subset of a Mouse label free dataset from the PRIDE public repository (PXD013086) and processed using Mascot Distiller to give us a peaklist containing approximately 20,000 MS/MS spectra. The samples are lightly modified, with the PRIDE annotation specifying Oxidation and Deamidation as variable modifications. We then carried out error tolerant searches using Mascot 2.6 and 2.7 on equivalent hardware to compare search speed. With Mascot 2.7 we searched using the default modification permutation settings, and tighter settings to try and speed up the search. Results are summarised in table 2 below:

Mascot Version	ModArrangements	NumModifiedSites	NumVarMods	Search time (min)	Speed improvement (%)
2.6	N/A	N/A	N/A	95	—
2.7	64	5	3	79	17
2.7	32	3	3	61	35

Table2: Comparison of total search time taken for an error tolerant search on Mascot 2.6 and 2.7. Speed improvement is given in comparison to the Mascot 2.6 search time. The first pass search took approximately 2 minutes in all cases, so the majority of the search time is spent in the error tolerant pass. The identified matches were equivalent between all searches, with Mascot 2.6 returning 4437 significant PSMs and 629 error tolerant PSMs. Mascot 2.7 returned 4442 significant PSMs and 635 error tolerant PSMs

Because the sample was only lightly modified, we were able to significantly speed up the search whilst also retaining results by tightening the modification permutation settings in Mascot 2.7.

Example 2: Improve site analysis

To look at how we can use the new parameters to improve site analysis in Mascot 2.7 we took a single file from a middle down Human Histone H4 dataset from the PRIDE public repository (PXD008296). The study was looking at changes in modification patterns across the cell cycle on the N-terminal 23 residues. The variable modifications used for the search were taken from the paper:

Acetyl (K),Acetyl (Protein N-term),Phospho (ST),Dimethyl (K), Methyl (K),Methyl (R), Trimethyl (K)

Given the peptide sequence being studied was:

SGRGKGGKGLGKGGAKRHRKVLR

We can see that the default permutation settings in Mascot 2.7 are not sufficient for this.

The selected datafile was processed with Mascot Distiller, and the fragment ions decharged to singly charged. The resulting peaklist was searched using Mascot 2.6 and Mascot 2.7. The number of proteoforms (modification patterns) identified are presented in table 3 below:

Mascot Version	NumModifiedSites	NumVarMods	ModArrangements	No. proteoforms
2.6	N/A	N/A	N/A	81
2.7	11	7	64	106
2.7	11	7	128	109
2.7	11	7	256	109
2.7	11	7	512	109

Table 3: The number of proteoforms identified by Mascot 2.6 and by Mascot 2.7.

As you can see, by increasing the permutation settings in Mascot 2.7, we’ve identified significantly more proteoforms than we found with Mascot 2.6. In addition, we can see many instances in the results where site localisation is improved in Mascot 2.7.

Figure 1 below shows a high scoring match from the Mascot 2.6 search. This reports methylation of arginines 17 and 19, however these residues are not Methylated in vivo, and the sequence ladder shows no fragment ion matches over this region. This is a case where the Mascot 2.6 has used the sliding window permutation method has probably inaccurately clustered modifications.

Figure 1: Match to an MS/MS spectrum from the Mascot 2.6 search
Click to view full size image
Figure 1: Match to an MS/MS spectrum from the Mascot 2.6 search.

Figure 2 shows the match to the same spectrum from Mascot 2.7 which has reported Acetylation on Lysine 16 and Dimethylation of Lysine 20 – these modifications are known to occur in vivo, and the sequence ladder shows fragment ion matches across the whole region.

Figure 2: Match from Mascot 2.7 to the same MS/MS peaklist as shown in figure 1
Click to view full size image
Figure 2: Match from Mascot 2.7 to the same MS/MS peaklist as shown in figure 1.

So, by increasing the variable modification permutation limits in Mascot 2.7, we’ve been able to get more and better results in a search of a highly modified sample than we could with Mascot 2.6 or earlier.

You can find more details about the new variable modification permutation options on this help page, and a more detail look at the data presented in this article in this presentation

Keywords: error tolerant, modification, permutation, site analysis, variable modifications

Comments are closed.

Mascot Version	NumModifiedSites	NumVarMods	ModArrangements	No. proteoforms
2.6	N/A	N/A	N/A	81
2.7	11	7	64	106
2.7	11	7	128	109
2.7	11	7	256	109
2.7	11	7	512	109

Mascot Version	NumModifiedSites	NumVarMods	ModArrangements	No. proteoforms
2.6	N/A	N/A	N/A	81
2.7	11	7	64	106
2.7	11	7	128	109
2.7	11	7	256	109
2.7	11	7	512	109

Matrix Science

Variable Modifications in Mascot 2.7

Variable modification permutation in Mascot 2.6 and earlier

Variable modification permutation in Mascot 2.7

Example 1: Speeding up an error tolerant search

Example 2: Improve site analysis

Mascot Version	NumModifiedSites	NumVarMods	ModArrangements	No. proteoforms
2.6	N/A	N/A	N/A	81
2.7	11	7	64	106
2.7	11	7	128	109
2.7	11	7	256	109
2.7	11	7	512	109