Blog
Articles tagged: FDR
How does rescoring with machine learning work?
Mascot Server ships with Percolator, which is an algorithm that uses semi-supervised machine learning to improve the discrimination between correct and incorrect spectrum identifications. This is often termed rescoring with machine learning. What exactly does it mean, and how does it work? Identifying correct matches using a score threshold When you submit a search against the target protein sequence database, [...]
Identifying peptides from chimeric spectra (DDA, DIA)
Many traditional search engines assume the MS/MS spectrum of a peptide is produced by a single precursor. This is often true when you run a single-species tryptic digest in data dependent acquisition (DDA) mode with a narrow isolation window. Once you change the acquisition strategy or analyse a complex mixture, like an environmental sample, it’s possible for two or more [...]
Does your search engine show the evidence?
You’ve submitted a protein sequence database search and start looking at the results. Why did the search engine identify that protein? What is the peptide evidence? Which alternatives did the software consider? Is the software’s decision correct? These are basic yet important questions with any software-driven approach – which is the bulk of today’s MS/MS data analysis. A lot of [...]
Identify more HLA peptides
Endogenous peptides are challenging to identify by database searching. A Mascot no-enzyme search matches every subsequence of a protein to the observed spectrum, which makes a very large search space even if precursor tolerance is tight. As a result, Mascot score thresholds tend to be conservative and sensitivity is reduced. Mascot ships with Percolator, which often improves discrimination between true [...]
Error tolerant searches now show statistical significance
The latest release of Mascot Server introduces some important changes to error tolerant searches. Matches from the second pass search now have expect values attached, indicating confidence levels. These are either estimates based on counting trials or empirical values derived from searching a decoy database. If you are not familiar with the error tolerant search, now is the time to [...]
Validating intact crosslinked peptide matches
Intact crosslinked search results are more complex than conventional (non-crosslinked) searches, because there are many more degrees of freedom. The precursor mass could be within tolerance of a looplinked sequence, a linear sequence with monolink and several different alpha-beta candidates. Each possibility is multiplied if you also consider variable modifications like oxidation of methionine. Mascot 2.7 uses the same scoring [...]
Human Proteome Project data interpretation guidelines
The Human Proteome Project (HPP) data interpretation guidelines were recently updated. Many of the guidelines are good practice and common sense in any proteomics study where reliable protein identification is critical, not just when studying the human proteome. The guidelines are easy to meet using Mascot Server 2.7. Core guidelines The full list consists of 9 guidelines. The first one [...]
Protein FDR in Mascot Server 2.7
One of the new features in Mascot Server 2.7, now running on this web site, is an estimate of protein FDR. This is displayed in the Protein Family Summary for Fasta searches whenever automatic decoy is selected. The basis is the number of proteins inferred in the target database compared with the number in the decoy database. Conceptually, this is [...]
Common myths about protein scores
Mascot Server is used in many different application areas by both mass spectrometry experts and non-experts. Over the years, we’ve spotted a few recurring misconceptions about how protein scores are interpreted and used. All the examples come from recent peer-reviewed papers. Protein scores in PMF searches The very first thing to check is, what type of experiment is being reported. [...]
What are you inferring?
Benchmarking protein inference is notoriously difficult. Artificial samples of known content tend to be too simple while real samples lack ground truth. An interesting approach was adopted for the ABRF iPRG 2016 study, and has been the subject of a publication from The et al. A collection of human Protein Epitope Signature Tags (PrESTs) were expressed in E. coli and [...]