Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by John Cottrell (June 17, 2020)

Using the Quantitation Summary to create reports and charts

An earlier article described how to create a Quantitation Summary in Mascot Daemon. This is a spreadsheet-like text file, where the rows correspond to proteins and the columns contain expression data for various samples in the form of abundances or ratios of abundances.

A Quantitation Summary can be opened and manipulated in a spreadsheet program such as Excel, and it is possible to create charts in general purpose software, although this can be hard work. A more efficient option is to use specialised software such as Perseus, from the Max Planck Institute. This is a good choice if you prefer to manipulate the data using a graphical user interface.

If you are willing to do a bit of scripting, the R language provides access to a huge range of statistical and graphical tools. Bioconductor is a collection of packages for genomic and proteomic applications. Currently, 135 packages are indexed under proteomics and 91 under mass spectrometry.

We’ll use a package called DEP (Differential Enrichment analysis of Proteomics data) to illustrate the types of analysis that can be achieved with a few lines of scripting.

Bioconductor DEP package
Click to view full size image

Sample data comes from a study to identify oncogenic microRNAs in non-small cell lung cancer.

 

study to identify oncogenic microRNAs in non-small cell lung cancer
Click to view full size image

Quantitation used 10plex TMT. 72 files downloaded from PRIDE project PXD004163 were processed and searched using Mascot Daemon. The Sample Map in Mascot Daemon looked like this: 3 replicates of the control and one of the microRNA treatments, 2 replicates of the other two treatments. Peptide FDR was set to 1% by target/decoy.

Sample map
Click to view full size image

Using DEP, we can very easily create a number of informative charts. Some are for QC, such as this one, which shows we have data for almost all 8021 proteins across all 10 channels – very few missing values.

Missing values
Click to view full size image

A box plot showing the intensities before and after normalisation.

Box plot
Click to view full size image

PCA shows the replicates cluster nicely.

PCA plot
Click to view full size image

A heat map for sample to sample similarity.

Heat map
Click to view full size image

Finally, a volcano plot for fold changes between one treatment and the control. The outlier proteins are labelled with their identifiers.

Volcano plot
Click to view full size image

As always, detailed help and reference material for the Sample Map and Quantitation Summary can be found in the Mascot Daemon help file. Bioconductor packages are mostly well documented and there is plenty of tutorial material for R on the web. The Quantitation Summary and the R commands used to create these plots can be downloaded here.

Keywords: , , , , ,

One comment on “Using the Quantitation Summary to create reports and charts

  1. Phillip Wilmarth on said:

    Hi John,
    Nice post! I looked at data in the quantitative summary file using a Jupyter notebook here: https://github.com/pwilmart/PXD004163_Notebooks/tree/master. Notebooks are nice for telling data analysis stories.
    Cheers,
    Phil