Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Obsolete data file formats

This page describes obsolete data file formats. Current formats are documented in Data file format.

For a Peptide Mass Fingerprint, the file should contain a list of peptide mass values, one per line, optionally followed by white space and a peak area or intensity value. The recommended data file format is Mascot generic format (MGF), but the peak list formats of a wide range of instrument data systems are directly compatible with these requirements. Mascot will automatically recognise the following obsolete formats:

For an MS/MS Ions Search, the data file must contain one or more MS/MS peak lists. The recommended data file formats are Mascot generic format (MGF) and mzML. Mascot Server 2.8 and earlier supported the following obsolete formats:

From Mascot Server 3.0, these formats are hidden by default and not selectable in the search form. The configuration option SearchSubmitAcceptedFileTypes defaults to MGF and mzML, but the above obsolete formats can be added back to SearchSubmitAcceptedFileTypes if required. Note, however, that support for the obsolete formats may be removed in a future Mascot release.

Finnigan (ASC) Files

Files in this format are created by the LIST command on the ICIS data system. The header block for each MS/MS dataset begins with a “LIST:” field. The text in this field is used by Mascot to identify the query, equivalent to an embedded TITLE parameter.

The ASC file header does not specify a charge state for the precursor peptide. This can be specified (globally) on the search form, or by an embedded CHARGE parameter at the head of the data file.

The precursor peptide m/z value is parsed from the “Mode:” field. Mascot uses the prevailing CHARGE value to calculate Mr from the observed m/z.

A blank line to delimit MS/MS datasets is optional.

Example of Finnigan ASC format:

LIST: dp210198b 21-Jan-98 DERIVED SPECTRUM #9
Samp: Spot 6483 from Gel 29A44 Start : 18:37:54 100
Mode: ESI +DAU 808.3 @ 25eV UP LR
Oper: Administrator Inlet :
Base: 798.9 Inten : 25525 Masses: 225 > 2000
Norm: 798.9 RIC : 181489 #peaks: 586
Peak: 1000.00 mmu
Data: +/1>99
0
No. Mass Intensity %RA %RIC Flags
1 229.3 8 0.03 0.00 #
2 230.3 9 0.04 0.00 #
3 259.9 8 0.03 0.00 #
.
.
.
583 1831.0 5 0.02 0.00 #
584 1878.3 5 0.02 0.00 #
585 1881.8 8 0.03 0.00 #  
LIST: dp210198a 21-Jan-98 DERIVED SPECTRUM #9
Samp: Spot 6483 from Gel 29A44 Start : 18:27:30 95
Mode: ESI +DAU 973.9 @ 25eV AVER UP LR
Oper: Administrator Inlet :
Base: 974.5 Inten : 191564 Masses: 270 > 1800
Norm: 974.5 RIC : 341387 #peaks: 593
Peak: 1000.00 mmu
Data: +/1>95
0
No. Mass Intensity %RA %RIC Flags
1 297.9 10 0.01 0.00 #
2 326.7 8 0.00 0.00 #
3 345.1 237 0.12 0.07 #
.
.
.

Sequest (DTA) Files

Information on creating DTA files from RAW files can be found here.

The DTA format is very simple. The first line contains the singly protonated peptide mass (MH+) and the peptide charge state as a pair of space separated values. Subsequent lines contain space separated pairs of fragment ion m/z and intensity values.

N.B. In a DTA file, the precursor peptide mass is an MH+ value independent of the charge state. In Mascot generic format, the precursor peptide mass is an observed m/z value, from which Mr or MHnn+ is calculated using the prevailing charge state. For example, in Mascot:

PEPMASS=1000
CHARGE=2+

… means that the relative molecular mass Mr is 1998. This is equivalent to a DTA file which starts:

1999 2

The DTA format uses the file name to identify the dataset. An example of a file name would be “Myoglobin_digest.0012.0015.3.dta”. This corresponds to scans 12 to 15 of an LC-MS run, averaged together, and a peptide charge state of 3+.

While it is perfectly possible to submit a native DTA file to Mascot, each file contains only a single MS/MS data set. If you have a series of related datasets, such as from an LC-MS experiment, it is much better to concatenate the DTA files into a single data file so that the queries can be scored and reported collectively.

Remember to include at least one blank line between each MS/MS dataset. A delimiter between datasets is essential because the DTA format is relatively unstructured. Without a delimiter, the first line of a new dataset (peptide mass, charge) might be just another line from the previous dataset (fragment ion mass, intensity).

Utilities to concatenate DTA files automatically can be downloaded from the Xcalibur help page.

Micromass (PKL) Files

QTof users can export peak list data in either DTA or PKL format using the Micromass ProteinLynx package. Further information can be found here.

The PKL format is similar to the DTA file format, but supports multiple MS/MS datasets in a single file. The first line of a PKL dataset contains the observed m/z, intensity, and charge state of the precursor peptide as a triplet of space separated values. Subsequent lines contain space separated pairs of fragment ion m/z and intensity values.

Multiple MS/MS datasets are delimited by at least one blank line.

PerSeptive (.PKS)

PSD peak lists exported from Grams as .PKS files contain data from a single PSD spectrum. Since the .PKS format does not include details of the precursor peptide m/z, this information must be entered manually into the PRECURSOR and CHARGE form fields. This limitation also means that multiple spectra cannot be merged into a single data file.

Example of the .PKS file format:

"Peak Table"
OP=0
Center X Peak Y Left X Right X Time X Mass Difference Name
STD.Misc Height Left Y Right Y %Height,Width,%Area,%Quan,H/A
818.39992 4265.0000 818.39992 818.39992 81554.550 0 818.3999
C 0.? 0 4265.0000 4265.0000
820.42154 3765.0000 820.42154 820.42154 81616.547 0 820.4215
C 0.? 0 3765.0000 3765.0000
842.38252 2571.0000 842.10681 842.62999 82290.021 0 842.3825
C 0.? 0 1800.0000 1800.0000
.
.
.

Sciex API III

Peak lists exported from PE Sciex API III contain data from a single MS/MS spectrum. Since the file format does not include details of the precursor peptide m/z, this information must be entered manually into the PRECURSOR and CHARGE form fields. This limitation also means that multiple spectra cannot be merged into a single data file.

Example of PE Sciex peak list format:

287.50 650 287.5
301.00 1150 301.0
305.00 1150 305.0
315.00 6550 315.0
321.00 16,000 321.0
333.00 3050 333.0
333.50 1800 333.5
370.00 1550 370.0
.
.
.

Bruker (.XML)

Bruker XMASS and flexAnalysis save peak lists in a simple XML format. A DTD or XSD for the format is not publicly available. For each peak, Mascot takes the m/z value from the <mass> element and the intensity from the <absi> element.

The file format for MS/MS does not include details of the precursor peptide m/z, so this information must be entered manually into the PRECURSOR and CHARGE form fields. This limitation also means that multiple spectra cannot be merged into a single data file.

mzData (.XML)

Mascot supports mzData version 1.05. Follow the link for a schema document and further information.