Trypsin autolysis
The most analysed protein
The Journal of Proteome Research has a paper from the Medical University of Graz concerning the importance of correctly identifying spectra from contaminant proteins. In particular, trypsin autolysis peptides.
The authors point out that sequencing grade trypsin is modified by methylation or acetylation of the lysines, to inhibit autolysis. Unless these variable modifications are selected in a search, simply including a contaminants database will not be sufficient to catch all trypsin autolysis peptides. As part of their study, data were acquired using an LTQ-Orbitrap Velos from a yeast cell lysate digested with Promega trypsin (Data Set 1 in the paper). The raw data is available on PRIDE.
Example search
This example uses three raw files from PRIDE project PXD002726. Files were processed into a single merged peak list using Mascot Distiller.
Based on the results of an error tolerant search, we chose Carbamidomethyl (C) as a fixed mod and Methyl (N-term), Methyl (K), Dimethyl (K), Dimethyl (N-term), Dehydro (C), Deamidated (NQ) and Carbamidomethyl (N-term) as variable mods. We also decided to use semiTrypsin as the enzyme, because it was clear that there were large numbers of non-specific peptides. Target FDR was set to 1%.
Most abundant autolysis products
The results report shows that the most abundant trypsin autolysis peptide is usually R.LGEHNIDVLEGNEQFINAAK.I, which the authors point out is identified some 21,722 times in the PRIDE Cluster resource. This peptide is abundant in the Graz data in both modified and unmodified forms.
It is tempting to treat the number of determinations as pseudo-quantitative, and there are 125 spectra for the unmodified peptide versus a total of 250 for various modified forms, but this could be misleading because methylation is not favourable towards CID fragmentation. Even so, it seems clear that methylation is far from complete, probably because of the steric issues identified in the paper. The authors also observe that cleavage occurs readily after a methylated or dimethylated lysine.
The searches described in the Graz paper are all for strict tryptic specificity. When searched with semiTrypsin, R.LGEHNIDVLEGNEQFINAAK.I exhibits a near complete family of C-terminal "ragged ends", going down to R.LGEHNIDVLEGN.E. The most abundant appears to be R.LGEHNIDVLEGNEQFINAAK.K, which is represented by 375 spectra. Whether this occurs in solution or in the ion source is hard to say.
The reason for including the N-term variable mods was that these gave strong matches in the error tolerant search. These are not protein terminus modifications, so must be post-digest artefacts. Carbamidomethyl (N-term) is very common, and could be due to residual iodoacetamide, but why do we see Methyl (N-term) and Dimethyl (N-term)? The most likely explanation is autolysis prior to or during methylation.
Clearly, there are many peptides that would be missed in a vanilla search. At 1% FDR (and refining with machine learning enabled), the counts of PSMs and distinct sequences for the semi-tryptic search with multiple varmods are 3482 and 827 compared with 2267 and 565 for a search with strict trypsin and Carbamidomethyl (N-term) as the only variable mod.
Including autolysis products in routine searches
The Graz paper advocates editing the sequence of trypsin in the Fasta, replacing K with J, and defining J as the mass of dimethylated lysine. Unmodified lysine or mono-methylated lysine can then be matched using J-specific mods, which keeps the overall search space small because only the trypsin sequence contains any J. This is fine as far as it goes, but it doesn’t catch the N-term modifications or the non-specific cleavage. The authors mention another solution: "combine the in silico generated search space with measured spectral libraries from contaminants."
This is a far more powerful option, since it allows any number of modified and non-specific peptides from any number of contaminants to be intercepted with no increase in the search space. It is easy to create a library from search results with Mascot Server.
Autolysis and Peptide Mass Fingerprint searches
Low-level digests can be dominated by autolysis peaks. The peptide masses (neutral, Mr values) for limit digests of bovine and porcine trypsin are listed below. It is worth screening experimental data for both species, since the labelling of commercial material is not always reliable (recognised as long ago as Vestling, 1990).
The peaks from porcine trypsin at 841.50 and 2210.10 are often used in MALDI for internal mass calibration. Others peaks which have been observed by MALDI include 514.32, 1044.56, 2282.17, and 2298.17 (2282.17 with oxidised Met) (Parker, 1998).
From | To | Mono. | Avg. | Sequence |
---|---|---|---|---|
52 | 53 | 261.14 | 261.28 | SR |
54 | 57 | 514.32 | 514.63 | IQVR |
108 | 115 | 841.50 | 842.01 | VATVSLPR |
209 | 216 | 905.50 | 906.05 | NKPGVYTK |
148 | 157 | 1005.48 | 1006.15 | APVLSDSSCK |
98 | 107 | 1044.56 | 1045.16 | LSSPATLNSR |
134 | 147 | 1468.72 | 1469.68 | SSGSSYPSLLQCLK |
217 | 231 | 1735.84 | 1736.97 | VCNYVNWIQQTIAAN |
116 | 133 | 1767.79 | 1768.99 | SCAAAGTECLISGWGNTK |
158 | 178 | 2157.02 | 2158.48 | SSYPGQITGNMICVGFLEGGK |
58 | 77 | 2210.10 | 2211.42 | LGEHNIDVLEGNEQFINAAK |
78 | 97 | 2282.17 | 2283.63 | IITHPNFNGNTLDNDIMLIK |
179 | 208 | 3012.32 | 3014.33 | DSCQGDSGG…SWGYGCAQK |
9 | 51 | 4474.09 | 4477.04 | IVGGYTCAA…VVSAAHCYK |
9 | 51 | 4488.11 | 4491.07 | IVGGYTCAA…VVSAAHCYK |
From | To | Mono. | Avg. | Sequence |
---|---|---|---|---|
110 | 111 | 259.19 | 259.35 | LK |
157 | 159 | 362.20 | 362.49 | CLK |
238 | 243 | 632.31 | 632.67 | QTIASN |
64 | 69 | 658.38 | 658.76 | SGIQVR |
112 | 119 | 804.41 | 804.86 | SAASLNSR |
221 | 228 | 905.50 | 906.05 | NKPGVYTK |
160 | 169 | 1019.50 | 1020.17 | APILSDSSCK |
229 | 237 | 1110.55 | 1111.33 | VCNYVSWIK |
146 | 156 | 1152.57 | 1153.25 | SSGTSYPDVLK |
207 | 220 | 1432.71 | 1433.65 | LQGIVSWGSGCAQK |
191 | 206 | 1494.61 | 1495.61 | DSCQGDSGGPVVCSGK |
70 | 89 | 2162.05 | 2163.33 | LGEDNINVVEGNEQFISASK |
170 | 190 | 2192.99 | 2194.47 | SAYPGQITSNMFCAGYLEGGK |
90 | 109 | 2272.15 | 2273.60 | SIVHPSYNSNTLNNDIMLIK |
120 | 145 | 2551.24 | 2552.91 | VASISLPTS…LISGWGNTK |
21 | 63 | 4550.12 | 4553.14 | IVGGYTCGA…VVSAAHCYK |