Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Mascot search overview

Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases.

Types of database search

While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:

  • Peptide Mass Fingerprint in which the only experimental data are peptide mass values. (tutorial)
  • Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query. (more information)
  • MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides. (tutorial)

Data for peptide mass fingerprint is typically acquired using a MALDI MS instrument.

Data for an MS/MS search is typically acquired using LC-MS/MS in data-dependent acquisition (DDA) mode. Mascot can identify peptides from chimeric spectra, including spectra acquired using narrow window data-independent acquisition (DIA).

How Mascot works

The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.

Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.

The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.

Databases

PMF data can be searched against Fasta files. MS/MS data can be searched against both Fasta files and spectral libraries. Fasta files are text-format databases of protein sequences, while spectral libraries are databases of mass spectra.

In-house Mascot Server licence

When you have an in-house Mascot Server licence, you can search any sequence database. Mascot ships with a number of predefined definitions for publicly available protein sequence databases, such as SwissProt, UniProt proteomes and NCBI nr, as well as a dozen spectral libraries. If you have a custom Fasta file, it is easy to set up as a searchable database in Mascot.

Public Mascot Server

The sequence databases that can be searched on the Matrix Science free, public Mascot server are:

SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical. SwissProt is ideal for peptide mass fingerprint searches and MS/MS searches of well characterised organisms, where it isn’t essential to match every single spectrum.

EMBL EST divisions contain "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. There are 10 divisions:

  • Environmental_EST
  • Fungi_EST
  • Human_EST
  • Invertebrates_EST
  • Mammals_EST
  • Mus_EST
  • Plants_EST
  • Prokaryotes_EST
  • Rodents_EST
  • Vertebrates_EST

contaminants is a database of common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried

cRAP is a database of common contaminants compiled by the Global Proteome Machine Organization

Selected UniProt proteomes

Database name Organism Taxonomy ID Uniprot ID Coverage
UP6548_A_thaliana Arabidopsis thaliana (Mouse-ear cress) (Strain: cv. Columbia) 3702 UP000006548 99.6%
UP9136_B_taurus Bos taurus (Bovine) (Strain: Hereford) 9913 UP000009136 98.0%
UP1940_C_elegans Caenorhabditis elegans (Strain: Bristol N2) 6239 UP000001940 99.7%
UP6906_C_reinhardtii Chlamydomonas reinhardtii (Chlamydomonas smithii) (Strain: CC-503) 3055 UP000006906 96.0%
UP437_D_rerio Danio rerio (Zebrafish) (Brachydanio rerio) (Strain: Tuebingen) 7955 UP000000437 96.9%
UP2195_D_discoideum Dictyostelium discoideum (Slime mold) (Strain: AX4) 44689 UP000002195 96.0%
UP803_D_melanogaster Drosophila melanogaster (Fruit fly) (Strain: Berkeley) 7227 UP000000803 99.3%
UP625_E_coli_K12 Escherichia coli (strain K12) (Strain: K12 / MG1655 / ATCC 47076) 83333 UP000000625 100.0%
UP219602_F_oxysporum Fusarium oxysporum f. sp. radicis-cucumerinum (Strain: Forc016) 327505 UP000219602 98.5%
UP317484_G_aquaeductus Geodermatophilus aquaeductus (DSM 46834) 1564161 UP000317484 99%
UP5640_H_sapiens Homo sapiens (Human) 9606 UP000005640 99.5%
UP589_M_musculus Mus musculus (Mouse) (Strain: C57BL/6J) 10090 UP000000589 99.7%
UP254291_M_gilvum Mycolicibacterium gilvum (NCTC10742) 1804 UP000254291 99.2%
UP808_M_pneumoniae Mycoplasma pneumoniae (strain ATCC 29342 / M129) 272634 UP000000808 75.9%
UP59680_O_sativa Oryza sativa subsp. japonica (Rice) (Strain: cv. Nipponbare) 39947 UP000059680 87.0%
UP8311_R_communis Ricinus communis (Castor bean) 3988 UP000008311 90.5%
UP2494_R_norvegicus Rattus norvegicus (Rat) (Strain: Brown Norway) 10116 UP000002494 97.8%
UP2311_S_cerevisiae Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker’s yeast) 559292 UP000002311 98.9%
UP2485_S_pombe Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 284812 UP000002485 97.8%
UP8227_S_scrofa Sus scrofa (Pig) (Strain: Duroc) 9823 UP000008227 96.2%
UP241690_T_harzianum Trichoderma harzianum CBS 226.95 983964 UP000241690 98.6%
UP5226_T_rubripes Takifugu rubripes (Japanese pufferfish) (Fugu rubripes) 31033 UP000005226 95.2%
UP279841_T_thermophilus Thermus thermophilus 274 UP000279841 85.5%
UP186698_X_laevis Xenopus laevis (African clawed frog) (Strain: J) 8355 UP000186698 95.6%
UP7305_Z_mays Zea mays (Maize) (Strain: cv. B73) 4577 UP000007305 96.4%

Spectral libraries:

NIST_Mouse_IonTrap

NIST_S.cerevesiae_IonTrap

PRIDE_Contaminants

PRIDE_Human