This class is used to encapsulate a complete NIST .msp, SpectraST .sptxt or X!Hunter ASL MGF file. More...
#include <ms_spectral_lib_file.hpp>
Public Member Functions | |
ms_spectral_lib_file (const char *fileName, const char *regexForAccession, const char *cdbFileName, const std::map< std::string, std::string > &modificationAliases) | |
Constructor just for C++, accepting a list of modification aliases. | |
ms_spectral_lib_file (const char *fileName, const char *regexForAccession, const char *cdbFileName=0) | |
Constructor. | |
~ms_spectral_lib_file () | |
Destructor. | |
void | appendErrors (const ms_errors &src) |
Copies all errors from another instance and appends them at the end of own list. | |
void | clearAllErrors () |
Remove all errors from the current list of errors. | |
void | copyFrom (const ms_errors *right) |
Use this member to make a copy of another instance. | |
std::vector< int > | findEntries (const char *sequence, const char *checksum=0, const char *accession=0, const char *mods=0) const |
Returns a list of entries that match the parameters. | |
std::string | getAccessionFromNumber (const int number) const |
Returns the accession given the offset into the file. | |
std::vector< std::string > | getAllMods () const |
Returns the complete list of mods named in the file. | |
std::string | getChecksumFromNumber (const int number) const |
Returns the spectrum checksum given the offset into the file. | |
ms_spectral_lib_entry | getEntryFromNumber (const int number) const |
Returns the individual spectrum from the msp file. | |
std::vector< std::string > | getEntryFromNumberAsText (const int number) const |
Returns the individual spectrum from the msp file as a vector of strings. | |
const ms_errs * | getErrorHandler () const |
Retrive the error object using this function to get access to all errors and error parameters. | |
std::string | getFileName () const |
Returns the full file path passed to the constructor. | |
ms_spectral_lib::FILE_FORMAT | getFormat () const |
Returns the format of the file specified in the constructor. | |
int | getLastError () const |
Return the error description of the last error that occurred. | |
std::string | getLastErrorString () const |
Return the error description of the last error that occurred. | |
std::string | getModsFromNumber (const int number) const |
Returns the mods given the offset into the file. | |
int | getNumEntries () const |
Returns the number of spectra in the msp file. | |
int | getNumResidues () const |
Returns the number of residues in the msp file. | |
int | getPrecursorChargeFromNumber (const int number) const |
Returns the precursor charge given the offset into the file. | |
double | getPrecursorMZFromNumber (const int number) const |
Returns the precursor m/z value given the offset into the file. | |
long | getQmatch (double minMz, double maxMz) const |
Return the number of spectra in the library with a precursor mass within the passed m/z range. | |
std::string | getSequenceFromNumber (const int number) const |
Returns the spectrum peptide sequence given the offset into the file. | |
std::string | getStatsInformation () const |
Returns some unstructured text giving some statistics for the file. | |
bool | isValid () const |
Call this function to determine if there have been any errors. | |
bool | saveAs (const char *fileName, const bool replaceProteinName=true, ms_spectral_lib::FILE_FORMAT fileFormat=ms_spectral_lib::FORMAT_NIST_MSP, const int startNumber=1, const int endNumber=-1, const ms_spectral_lib_entry::WHAT_TO_ANNOTATE whatToAnnotate=ms_spectral_lib_entry::ANNOTATE_REPLACE_QUESTION_MARKS, const double annotateTol=0.6, const char *annotateTolu="Da", const ms_umod_configfile *unimod=0) const |
Save a copy of the file in the specified format. | |
bool | verifyThatModsAreInUnimod (const ms_umod_configfile &unimod) |
The modifications listed should all be in the passed unimod file. | |
This class is used to encapsulate a complete NIST .msp, SpectraST .sptxt or X!Hunter ASL MGF file.
Support for spectral library searches was added in Mascot Server 2.6 and Mascot Parser 2.6. Although external NIST software is used for the spectral library search itself, there is a requirement to process the msp and SpectraST files which are plain text.
This class is used to open an msp file, and an individual entries can be obtained using ms_spectral_lib_file::getEntryFromNumber
The format of the msp file is multiple lines, each with a [param][colon][space][value]. For example,
followed by, in this case, 150 lines of peak data, followed by a blank line.
The format is defined here.
The SpectraST files (.sptxt) have minor differences, for example additional lines and a different format for the peak list. See saveAs() for details.
The X!Hunter ASL MGF format is described here.
See Spectral libraries for related information.
ms_spectral_lib_file | ( | const char * | fileName, |
const char * | regexForAccession, | ||
const char * | cdbFileName = 0 |
||
) |
Constructor.
Commonly used constructor.
The use of a cdb index file is optional. If one is specified in the constructor, and the file doesn't exist, or is incompatible, then the whole spectral library will be read and the index (re-) created during this constructor call. If no cdbFileName is specified, then the spectral library file will be opened, but only read on demand. For example, calling getNumEntries() obviously requires the whole file to be read, but calling getEntryFromNumber(1) only requires the first entry to be read.
The cdb index contains a lookup for the accessions found in the spectral library, and so is dependent on the regexForAccession. This means that if you try and create a ms_spectral_lib_file with a different regular expression, then the cdb file will be automatically re-created.
fileName | is the full path to the spectral library file |
regexForAccession | is the regular expression used to extract the accession from the protein description. If a regular expression is not defined, then it is not possible to search for or return an accession. If the regexForAccession is of the incorrect format, then ms_errs::ERR_MSP_COMPILE_PARSE_RULE errors will be raised. |
cdbFileName | is the full path to the cdb index file for the library file. |
ms_spectral_lib_file | ( | const char * | fileName, |
const char * | regexForAccession, | ||
const char * | cdbFileName, | ||
const std::map< std::string, std::string > & | modificationAliases | ||
) |
Constructor just for C++, accepting a list of modification aliases.
Constructor for C++
The use of a cdb index file is optional. If one is specified in the constructor, and the file doesn't exist, or is incompatible, then the whole spectral library will be read and the index (re-) created during this constructor call. If no cdbFileName is specified, then the spectral library file will be opened, but only read on demand. For example, calling getNumEntries() obviously requires the whole file to be read, but calling getEntryFromNumber(1) only requires the first entry to be read.
The cdb index contains a lookup for the accessions found in the spectral library, and so is dependent on the regexForAccession. This means that if you try and create a ms_spectral_lib_file with a different regular expression, then the cdb file will be automatically re-created.
fileName | is the full path to the spectral library file |
regexForAccession | is the regular expression used to extract the accession from the protein description. If a regular expression is not defined, then it is not possible to search for or return an accession. If the regexForAccession is of the incorrect format, then ms_errs::ERR_MSP_COMPILE_PARSE_RULE errors will be raised. |
cdbFileName | is the full path to the cdb index file for the library file. |
modificationAliases | are a map of modification aliases, such as "CAM" => "Carbamidomethyl". In Mascot Server, these are read from the library_mod_aliases file. The modification aliases are used in the created CDB file and when saveAs() is called. |
|
inherited |
Copies all errors from another instance and appends them at the end of own list.
src | The object to copy the errors across from. See Maintaining object references: two rules of thumb. |
|
inherited |
Remove all errors from the current list of errors.
The list of 'errors' can include fatal errors, warning messages, information messages and different levels of debugging messages.
All messages are accumulated into a list in this object, until clearAllErrors() is called.
See Error Handling.
|
inherited |
Use this member to make a copy of another instance.
right | is the source to initialise from |
std::vector< int > findEntries | ( | const char * | sequence, |
const char * | checksum = 0 , |
||
const char * | accession = 0 , |
||
const char * | mods = 0 |
||
) | const |
Returns a list of entries that match the parameters.
The library may contain multiple spectra for the same sequence and/or multiple spectra with the same checksum.
The function 'ands' all the parameters, so if sequence, checksum, accession and mods are all supplied, it will only return spectra which match all four parameters.
If a cdb index file has been created, then the lookup will be fast because an index is saved in the cdb file.
Use getEntryFromNumber() with each value from the returned list to get the relevant ms_spectral_lib_entry objects.
Example code:
sequence | is the peptide sequence to find. It should just contain upper case A-Z |
checksum | is the string that would be returned by ms_spectral_lib_entry::getPeakListChecksum() |
accession | is matched to the accession retrieved from the Protein= entry in the comment line. The accession is extracted from the protein using the regular expression passed to the constructor |
mods | is in the form exactly as in the spectral library file and as described in ms_spectral_lib_entry::getMods |
std::string getAccessionFromNumber | ( | const int | number | ) | const |
Returns the accession given the offset into the file.
The accession retrived from the Protein= entry in the comment line. The accession is extracted from the protein using the regular expression passed to the ructor The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()
number | must be in the range of 1..getNumEntries() |
std::vector< std::string > getAllMods | ( | ) | const |
Returns the complete list of mods named in the file.
The order of the names is just the order that they appear in the file and there will be no duplicate names.
See ms_spectral_lib_entry::getMods for a description of the format in the msp file
See Using STL vector classes vectori, vectord and VectorString in Perl, Java, Python and C#
std::string getChecksumFromNumber | ( | const int | number | ) | const |
Returns the spectrum checksum given the offset into the file.
The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()
number | must be in the range of 1..getNumEntries() |
ms_spectral_lib_entry getEntryFromNumber | ( | const int | number | ) | const |
Returns the individual spectrum from the msp file.
The number supplied has no relation to spectrum id returned by mspepsearch.exe. Either iterate through the whole file, using number = 1..getNumEntries() or use findEntries() to get the number.
If the function fails to load the entry, then a fatal error: ms_errs::ERR_MSP_NIST_FAILED_TO_LOAD_ENTRY or ms_errs::ERR_MSP_NIST_INDEX_OUT_OF_RANGE is set in the returned ms_spectral_lib_entry object. Call ms_spectral_lib_entry::isValid() on the returned object to determine if it is safe to use it.
See also getEntryFromNumberAsText()
number | must be in the range of 1..getNumEntries() |
std::vector< std::string > getEntryFromNumberAsText | ( | const int | number | ) | const |
Returns the individual spectrum from the msp file as a vector of strings.
The number supplied has no relation to spectrum id returned by mspepsearch.exe. Either iterate through the whole file, using number = 1..getNumEntries() or use findEntries() to get the number.
If the function fails to load the entry, then a fatal error: ms_errs::ERR_MSP_NIST_FAILED_TO_LOAD_ENTRY or ms_errs::ERR_MSP_NIST_INDEX_OUT_OF_RANGE is set in this object. Call isValid() to determine if it is safe to use the returned vector of strings.
See also getEntryFromNumber()
number | must be in the range of 1..getNumEntries() |
|
inherited |
Retrive the error object using this function to get access to all errors and error parameters.
See Error Handling.
std::string getFileName | ( | ) | const |
Returns the full file path passed to the constructor.
ms_spectral_lib::FILE_FORMAT getFormat | ( | ) | const |
Returns the format of the file specified in the constructor.
The file format of the file specified in the constructor is auto detected
|
inherited |
Return the error description of the last error that occurred.
All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.
See Error Handling.
|
inherited |
Return the error description of the last error that occurred.
All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.
See Error Handling.
std::string getModsFromNumber | ( | const int | number | ) | const |
Returns the mods given the offset into the file.
The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()
See ms_spectral_lib_entry::getMods for a description of the format in the msp file
number | must be in the range of 1..getNumEntries() |
int getNumEntries | ( | ) | const |
Returns the number of spectra in the msp file.
This value is retrieved from the cdb file if possible, or if no cdb file is specified, then the whole msp file will have to be parsed.
int getNumResidues | ( | ) | const |
Returns the number of residues in the msp file.
This value is retrieved from the cdb file if possible, or if no cdb file is specified, then the whole msp file will have to be parsed.
This is a slightly confusing number! It is the count of all of the residues in each sequence in the library. Just used because at the end of each Mascot search, we report the number of sequences and residues searched.
int getPrecursorChargeFromNumber | ( | const int | number | ) | const |
Returns the precursor charge given the offset into the file.
For MSP files, the charge is taken from the Name: line
number | must be in the range of 1..getNumEntries() |
double getPrecursorMZFromNumber | ( | const int | number | ) | const |
Returns the precursor m/z value given the offset into the file.
There are several possible values in the Comment line to use for the precursor mz: Parent=865.409, Mz_exact=865.4092, Mz_av=865.898
This function returns the Mz_exact value if it exists, otherwise it returns the Parent= value.
In .sptxt files, there is also a separate PrecursorMZ: line, but this is currently not used.
See also getQmatch()
number | must be in the range of 1..getNumEntries() |
long getQmatch | ( | double | minMz, |
double | maxMz | ||
) | const |
Return the number of spectra in the library with a precursor mass within the passed m/z range.
This function calls getPrecursorMZFromNumber() to find the number of matches within a precursor mass range.
minMz | Is the lower limit to consider |
maxMz | Is the upper limit to consider |
std::string getSequenceFromNumber | ( | const int | number | ) | const |
Returns the spectrum peptide sequence given the offset into the file.
The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()
number | must be in the range of 1..getNumEntries() |
std::string getStatsInformation | ( | ) | const |
Returns some unstructured text giving some statistics for the file.
|
inherited |
Call this function to determine if there have been any errors.
This will return true unless there have been any fatal errors.
See Error Handling.
bool saveAs | ( | const char * | fileName, |
const bool | replaceProteinName = true , |
||
ms_spectral_lib::FILE_FORMAT | fileFormat = ms_spectral_lib::FORMAT_NIST_MSP , |
||
const int | startNumber = 1 , |
||
const int | endNumber = -1 , |
||
const ms_spectral_lib_entry::WHAT_TO_ANNOTATE | whatToAnnotate = ms_spectral_lib_entry::ANNOTATE_REPLACE_QUESTION_MARKS , |
||
const double | annotateTol = 0.6 , |
||
const char * | annotateTolu = "Da" , |
||
const ms_umod_configfile * | unimod = 0 |
||
) | const |
Save a copy of the file in the specified format.
SpectraST format files can be converted to files that will be read by NIST tools using this function
SpectraST and NIST (MSP) files differ in the following ways:
fileName | is the file to write to |
replaceProteinName | specifies if the Protein="..." field in the comment section should be replaced with Protein="[1..n]:checksum" where
|
fileFormat | specifies the format as described above |
startNumber | is the (index) number of the first spectrum to be saved. and should be in the range 1..getNumEntries(). The default value is 1 |
endNumber | is the (index) number of the last spectrum to be saved. and should be greater that startNumber and in the range 1..getNumEntries(). A value of -1 (the default) is used to specify that it should go the end of the file. |
whatToAnnotate | is used to specify whether existing annotation should be replaced. |
annotateTol | is the value in the units specified, for matching to the calculated data. Only peaks within this tolerance will be annotated. Other peaks will be annotated with a "?" |
annotateTolu | must be "Da", "mmu" or "ppm". |
unimod | is required if any entry has any modifications that are just specified by name. Otherwise, there is no way to calculate the fragment ion masses. |
bool verifyThatModsAreInUnimod | ( | const ms_umod_configfile & | unimod | ) |
The modifications listed should all be in the passed unimod file.
If any mods are not found, then a warning ms_errs::ERR_MSP_NIST_MODIFICATION_NOT_FOUND is added
unimod | is a reference to the unimod file |