Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Peak picking Thermo .RAW data with Mascot Distiller

Mascot distiller ships with processing options files for each of the main instrument vendor raw file types. These .opt files are designed as a reasonable starting point for peak picking your own data – but to get the very best, you’ll need to tweak the parameters on a typical raw file from your instruments and then use those settings.

For Thermo .RAW files, we supply several sets of options. The main two are:

  • default.ThermoXcalibur.opt
  • prof_prof.ThermoXcalibur.opt

What are the differences between the options files and which of these are the best processing options to use?

Differences between the default and prof_prof processing options

The main difference is that the default settings will take centroids for the MS/MS peaklists, if the MS/MS scans in your ‘raw’ data are actually saved as centroids, whereas the prof_prof processing options will always uncentroid these types of scans. The MS/MS peaklists generated using the prof_prof method will have additional information available, such as fragment ion charge states. These are required if you wish to carry out de novo sequencing in Mascot Distiller, and can be used to de-charge the peaklists. This is important if you are expecting charge states of greater than 2+, as you would if you were doing top- or middle-down experiments.

In both cases, Distiller will use profile data for peak detection in the MS scans, which is required if you want to carry out quantitation using survey scan based methods (e.g. intensity based Label-free, SILAC etc).

Therefore, the answer to the question of which are the best processing options to use depends on how your MS/MS scans are saved (as centroids or profile data), and on exactly what you’re trying to do with the data.

MS/MS scans saved as profile data

When the MS/MS scans are saved as profile data, the choice is simple: you should use the prof_prof.ThermoXcalibur.opt settings as your starting point for peak picking. The processing time in this case is similar between the two sets of options, but you’ll get much better results using the prof_prof options.

The below example uses a .RAW file that has the MS/MS scans saved as profile data, processed the file using Mascot Distiller with default.ThermoXcalibur.opt and with prof_prof.ThermoXcalibur.opt. The peak lists were searched using identical search settings. Results are summarised in table 1 below:

Processing options#Sig. matches (1% FDR)Processing time (HH:MM:SS)
default.ThermoXcalibur.opt382000:01:45
prof_prof.ThermoXcalibur.opt932900:03:18

Table 1: Comparison of search results and processing time using default and prof_prof options on a .RAW file with 42021 MS/MS scans which have been saved as profile data.

Although processing time has increased using prof_prof, peak picking was still very fast and we got much better results.

MS/MS scans saved as centroids

When the scans are saved as centroids, peak lists generated using the prof_prof options are expected to give slightly higher Mascot scores than those produced using the default options. This is because peak picking using the Distiller MDRO library should result in a cleaner peak list with fewer noise peaks. However, because the centroids have to be uncentroided back into profile data to do this, it comes at the expense of increased processing time.

If you need the additional fragment ion peak information, such as the charge state for de novo searching or de-charging for higher charge states, then you have no choice but to uncentroid the MS/MS ‘raw’ data. For other use cases, it’s a trade off between speed and results.

The below example uses four .RAW files from one project. Files were processed in Mascot Distiller with either the default or prof_prof options, and then searched using identical search settings. Results are summarised in table 2 below:

Processing options#Sig. matches (1% FDR)Average score of significant peptides# peptides score >=70Processing time (HH:MM:SS)
default.ThermoXcalibur.opt1598140119200:04:58
prof_prof.ThermoXcalibur.opt1617342135303:57:39

Table 2: Comparison of search results and processing time using default and prof_prof options on a set of 4 .RAW file with a total of 99745 MS/MS scans which have been saved as centroids.

As you can see, we are seeing an approximate 1% increase in the number of significant matches at a 1% PSM FDR, and the average score of significant peptide matches increases to 42 from 40 by using the prof_prof processing options instead of default.

If you look at the higher scoring peptide matches, we can see a more significant effect, with 1353 matches with a score of 70 or above from the prof_prof processed peaklists, compared with 1192 from the default processing options – an improvement of ~13.5%. If you are searching a very large database, that is a significant improvement.

However, this is achieved at a big increase in the processing time. On this test system, processing time increased from ~5 minutes to ~4 hours. That is a high price to pay for slightly improved coverage of the data.

The prof_prof.ThermoXcalibur.opt processing options uncentroid the MS/MS centroids at a resolution of 600 points per Da, which is a high resolution that we’ve found gives good search results, but which accounts for a significant proportion of the processing time when processing centroided MS/MS scans. Table 3 below shows the effect of decreasing the uncentroiding resolution to 400 and 200 points per Da respectively:

Uncentroiding points per Da#Sig. matches (1% FDR)Processing time (HH:MM:SS)
2001589500:52:50
4001591302:09:15

Table 3: Effects of changing the uncentroiding points per Da used by the prof_prof processing options on the number of significant matches and processing time

As you can see, reducing the uncentroiding points per Da does give a significant improvement in processing speed, but at the cost of some significant matches, with the numbers of those falling to slightly below those obtained using the default processing options. However, if you needed the fragment charge state information for de novo sequencing, or for decharging the peaklists, the trade-off may be acceptable.

In general though, if you don’t require fragment ion charge state information, we recommend using the default.ThemoXcalibur.opt processing options with .RAW data files which have MS/MS scans saved as centroids. You’ll generally get good results from the peaklists if your data are good, and the processing time is significantly reduced.