NIST Human HCD Spectral Libraries
Mascot 2.6 and later can search spectral libraries using the MSPepSearch spectral library search engine from the US National Institute of Standards and Technology (NIST). Spectral libraries can be searched alongside FASTA sequence databases to give an integrated report, and you can easily generate spectral libraries from your own search results.
When we introduced spectral library searches in Mascot 2.6, we also added a number of predefined definitions for Database Manager for some freely available and commonly used spectral libraries, including libraries from NIST and the European Bioinformatics Institute (EBI). One of these was the “NIST_Human_HCD” library, a consensus library derived from over 10,000 raw data files. The current definition of the library in Mascot is for the release of the library from 2016/05/03. However, NIST updated the Human HCD library in May of 2020. With this change, the library was updated and spectra split by quality into 3 separate libraries:
Library | Description |
---|---|
human_hcd_tryp_best | high-quality spectra, mostly tryptic peptides without missed cleavages |
human_hcd_tryp_good | medium-quality spectra, mostly tryptic peptides with missed cleavages |
human_hcd_semitryp | high- and medium-quality spectra, mostly semi-tryptic peptides |
According to the information on the NIST website, “the new libraries of consensus spectra contain 86% more peptides than the previous version with 4-15% increase in the number of positive IDs returned for typical samples at FDR=0.01″.
To confirm whether or not we see the expected improvement from the updated libraries we carried out a search of the iPRG2016 dataset against the 2016 NIST_Human_HCD, separately against each of the newer libraries, and finally a combined search against all three of the updated libraries. We then took the number of Peptide-spectrum matches and peptide-sequence matches at the default score threshold of 300. Results are summarised in table 2 below:
Library | No. significant PSMs | No. significant sequences |
---|---|---|
Human_HCD 2016 | 2883 | 505 |
human_hcd_tryp_best | 3421 | 513 |
human_hcd_tryp_good | 805 | 140 |
human_hcd_semitryp | 850 | 216 |
human_hcd_tryp_best+human_hcd_tryp_good+human_hcd_semitryp | 5056 | 829 |
From these results, we can see that by searching just the updated ‘Best’ library gave us an additional 538 PSMs, but only 8 new sequences over the 2016 library, so there is a strong overlap in the results between those two searches, and presumably in the peptides represented in the two libraries. The average quality of the spectra in the ‘Best’ library does appear to exceed that of the original 2016 release though. We can also see that there is very little overlap in the peptide sequences identified by the searches against the ‘Best’, ‘Good’ and ‘Semi-tryptic’ spectral libraries. By searching all three of the updated libraries, we can gain an additional 2173 PSMs and 324 peptide sequences.
We have updated the NIST_Human_HCD definition to download the human_hcd_tryp_best library and added new definitions for NIST_Human_HCD_2_good and NIST_Human_HCD_3_semitryp. If you have enabled NIST_Human_HCD in Database Manager, updating is as simple as clicking the Update or Get New Files button. If you are using Mascot 2.6, there is a known issue with modification handling for the updated libraries and we would recommend that users updated to Mascot 2.7 if they wish to use NIST libraries.
Keywords: database manager, NIST, spectral library