Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by John Cottrell (October 19, 2016)

Retention time

Although retention time is not part of the Mascot scoring algorithm, it can be used by Percolator to improve the re-scoring of search results and it is important or essential information for many types of quantitation. This article examines how retention time is represented in the peak list, the search results, and various export formats.

Mascot Server works off a peak list, not the raw data, so the first link in the chain is the peak picking software. Unless this writes retention time (RT) into the peak list in a way that is ‘machine readable’, it won’t be available later in the workflow. Older peak picking software often ignored RT or encoded it into the scan title. For example, the Sciex Analyst ‘mascot.dll’ created MGF titles that looked like this

TITLE=File: BSA 100fmol March 10, 04.wiff, Sample: BSA 100fmol March 10, 04 (sample number 1), Elution: 21.446 to 21.582 min, Period: 1, Cycle(s): 1263-1265 (Experiment 2) (Charge not auto determined)

This may be fine for a human to read, but extracting this information in software depends on the syntax being very well controlled; small changes can easily break integration. Because there is nothing standardised about the syntax, each peak picking utility would use a slightly different form, which made it very difficult for downstream applications to extract it. It became clear that something had to be done, so we introduced a new MGF parameter in Mascot 2.1: RTINSECONDS. This can be used for a single value or, when scans have been summed together, a range or list, e.g. RTINSECONDS=1067.3 or RTINSECONDS=257-259,264,269-278. Most software that supports the MGF format now outputs this field.

The other peak list formats that capture RT information in a standardised way are the XML interchange formats: mzData, which is long obsolete, and mzML. One challenge of the mzML format is that there are several controlled vocabulary (CV) terms that could be used for RT. Originally, the preferred term was MS:1001114 retention time(s). This was later deprecated in favour of MS:1000016 scan start time. You might also select MS:1000826 elution time or MS:1000894 retention time if you weren’t aware of the alternatives. Or, possibly MS:1000916 retention time window lower offset and MS:1000917 retention time window upper offset. Mascot Server 2.4.0 and later looks first for MS:1000016. If this is missing, it looks for MS:1000826.

When RT information is present in the MGF peak list as RTINSECONDS or in the mzML file as MS:1000016 or MS:1000826, it will be output to the Mascot search result file. It can then be displayed in the Peptide View report, extracted using Mascot Parser, and exported in a variety of formats. If you export results as CSV or XML, check Query Level Information and Query title to get RT exported in a column or element named RTINSECONDS. The CV terms used in the mzIdentML export have changed a couple of times. In Mascot 2.5.0 and earlier, MS:1001114 was used. In Mascot 2.5.1, a single value is exported using MS:1000894 while a range uses MS:1000916 and MS:1000917. Basically, this was a mistake, and it will change in Mascot 2.6.0 to MS:1000016 in all cases plus, for a range, MS:1000916 and MS:1000917. If you are running 2.5.1 and the change in the CV terms is causing a problem, you can download a patched version of the export script that reverts to using MS:1001114. Download the file for your platform, either Windows or Linux, unpack export_dat_2.pl and save to the Mascot cgi directory, replacing the file of the same name. Under Linux, you may need to chmod the file to make it executable.

If the analysis results are spread across multiple raw files – such as a fractions – this creates the additional complexity of connecting RT values to source files. The way Mascot Distiller handles this is to assign an index to each raw file in the MGF header, e.g.

_DISTILLER_RAWFILE[0]={1}C:\Users\billy\data\replicate\Orbi_0319_08.RAW
_DISTILLER_RAWFILE[1]={1}C:\Users\billy\data\replicate\Orbi_0319_09.RAW

This index can be referenced at scan level without causing unnecessary file bloat

BEGIN IONS
TITLE=13670: Scan rt=4456.48 from file [0]
PEPMASS=1000.4724 298557.06
CHARGE=2+
SCANS[0]=8395
RAWSCANS[0]=sn8395
RTINSECONDS[0]=4456.4824
300.79907 215.73866
305.5925 197.11472
313.94009 166.48917

Unfortunately, there is a bug with the exporting of indexed RT values, which will only be fixed when Mascot 2.6.0 is released. Indexed RT values are correctly written to the result file in 2.5.1 and earlier, but they are missing from the various export formats. If the peak list was created by Distiller, the indexed RT information embedded into the scan titles can be viewed and exported as a possible work-around.

Keywords: ,

Comments are closed.