Blog
Articles tagged: protein inference
Mascot workflow for LC-MS/MS data
Data analysis in mass spectrometry proteomics is complex and, nowadays, almost entirely software driven. Processing a raw file, peptide identification by database searching, protein inference and protein quantitation all have many steps and built-in assumptions, not to mention a huge number of parameters. Software continues to evolve as does best practice. Whether you are new to mass spectrometry proteomics or [...]
Does your search engine show the evidence?
You’ve submitted a protein sequence database search and start looking at the results. Why did the search engine identify that protein? What is the peptide evidence? Which alternatives did the software consider? Is the software’s decision correct? These are basic yet important questions with any software-driven approach – which is the bulk of today’s MS/MS data analysis. A lot of [...]
Unipept and Mascot
Drawing conclusions from protein-level data is complicated in environmental and metaproteomics studies, where the sample is a mixture of hundreds or thousands of proteomes. The Unipept database is a useful, complementary resource for interrogating metaproteomics data and can be used in conjunction with Mascot’s protein inference. Human gut example Identify proteins by more than ‘gut’ feeling discussed analysing a human [...]
Human Proteome Project data interpretation guidelines
The Human Proteome Project (HPP) data interpretation guidelines were recently updated. Many of the guidelines are good practice and common sense in any proteomics study where reliable protein identification is critical, not just when studying the human proteome. The guidelines are easy to meet using Mascot Server 2.7. Core guidelines The full list consists of 9 guidelines. The first one [...]
Protein FDR in Mascot Server 2.7
One of the new features in Mascot Server 2.7, now running on this web site, is an estimate of protein FDR. This is displayed in the Protein Family Summary for Fasta searches whenever automatic decoy is selected. The basis is the number of proteins inferred in the target database compared with the number in the decoy database. Conceptually, this is [...]
Identify proteins by more than ‘gut’ feeling
Last month, we discussed benchmarking protein inference and the role of shared peptide matches. Excluding shared matches may be beneficial to protein identification accuracy if the sequence database contains perfect representations of all proteins in the sample. Many real-life data sets don’t meet this condition. Metaproteomics and environmental samples, such as the various human body sites, peat bog and ocean [...]
What are you inferring?
Benchmarking protein inference is notoriously difficult. Artificial samples of known content tend to be too simple while real samples lack ground truth. An interesting approach was adopted for the ABRF iPRG 2016 study, and has been the subject of a publication from The et al. A collection of human Protein Epitope Signature Tags (PrESTs) were expressed in E. coli and [...]
Protein inference for spectral library searches
The major new feature of Mascot Server 2.6, now running on this web site, is that searches of spectral libraries have been fully integrated with ‘conventional’ Mascot searches of Fasta files. The search engine for spectral library searches is MSPepSearch from Steve Stein and colleagues at NIST. We didn’t have any revolutionary ideas for improving spectral library scoring so, rather [...]
Creating a list of confidently identified proteins
This can be done very easily using Report Builder: Select the Decoy checkbox when submitting the search Open the result report as a Protein Family Summary Switch to the Report Builder tab Expand the decoy search section and set the peptide FDR to 1% Expand the filters section and set ‘Num of significant unique sequences’ > 1 Optionally, expand the [...]
Does protein FDR have any meaning?
Its easy to grasp the concept of using a target/decoy search to estimate peptide false discovery rate. You search against a decoy database where there are no true matches available, so the number of observed matches provides a good estimate of the number of false matches in the results from the target. People debate implementation details, such as whether the [...]