Blog
Articles tagged: Fasta
Downloading UniProt proteomes via new API
The UniProt website received a snazzy facelift in June 2022. Both the browser interface and the REST API were updated. The previous version, now termed legacy website, remains available until the 2022_04 release under a new URL (https://legacy.uniprot.org/) – so there is limited time to compare and admire the improvements! The new API is an almost seamless transition for Mascot [...]
NCBI nr in Mascot Server 2.8.1
Mascot Server 2.8.01 patch was released in March 2022. One of the big improvements is optimising the compression speed of the NCBI nr database, available as the NCBIprot predefined definition. We’ve greatly decreased the time it takes to bring the database online, as well as removed an inadvertent limitation on database size. The patch is available to download on the [...]
The minutiae of database management
The two major inputs into a database search are sequence databases and mass spectrometry data. Management of the databases in Mascot Server has evolved and improved over the years and newer versions. There are still a few issues that come up regularly and I am going to cover them in this article. Common error messages When you activate a new [...]
Protein FDR in Mascot Server 2.7
One of the new features in Mascot Server 2.7, now running on this web site, is an estimate of protein FDR. This is displayed in the Protein Family Summary for Fasta searches whenever automatic decoy is selected. The basis is the number of proteins inferred in the target database compared with the number in the decoy database. Conceptually, this is [...]
What are you inferring?
Benchmarking protein inference is notoriously difficult. Artificial samples of known content tend to be too simple while real samples lack ground truth. An interesting approach was adopted for the ABRF iPRG 2016 study, and has been the subject of a publication from The et al. A collection of human Protein Epitope Signature Tags (PrESTs) were expressed in E. coli and [...]
Keeping genome databases up to date
Database Manager is a great tool for keeping your sequence databases up to date in Mascot. If the database is available as a ready-made FASTA file, all you need to do is enable it as a predefined definition, or set up a definition to download the file from a known URL (see the help for more details). Updating the database [...]
How to create a spectral library for contaminants
An earlier article highlighted how modified and non-specific peptides from contaminants can be matched using a spectral library without increasing the search space for the target proteins. This is particularly useful for sequencing grade trypsin, which is modified by methylation or acetylation of the lysines, creating a large number of modified non-specific peptides that are missed by typical search strategies. [...]
Disruption ahead for NCBI databases
NCBI has announced that it will drop ‘gi number’ unique identifiers on June 15, 2016. Details are given in section 1.4 of the GenBank release notes. This will create difficulties for users of many bioinformatic tools, not just Mascot. Particularly in the context of major projects, where analyses are performed over an extended time period and results consolidated by protein [...]