Time alignment between multiple runs (raw files) is performed in Mascot Distiller for label free quantitation. More...
#include <ms_ms1quant_time_align.hpp>
Public Types | |
typedef std::vector< std::vector< std::map< double, double > > > | featureWidths_t |
[prj][m/z bin][rt->width] | |
typedef std::map< int, std::vector< int > > | fractionToSubProject_t |
the vector for each fraction contains 0 based subProject numbers and fraction numbers don't need to be sequential, hence the map | |
enum | status_t { ST_TA_NO_DATA , ST_TA_LOADED_FROM_XML , ST_TA_LOADED_FROM_CDB , ST_TA_CALCULATED } |
Public Member Functions | |
ms_ms1quant_time_align (const int binSize=12) | |
Default constructor. | |
ms_ms1quant_time_align (const ms_ms1quant_time_align_body &body) | |
Populated constructor. | |
void | appendErrors (const ms_errors &src) |
Copies all errors from another instance and appends them at the end of own list. | |
bool | calculateFromConsensuValues (bool replaceExisting=false) |
A retention time shift is calculated by the shift from project A to the consensus and then subtracting the shift from the consensus to project B. This. | |
void | clearAllErrors () |
Remove all errors from the current list of errors. | |
void | copyFrom (const ms_errors *right) |
Use this member to make a copy of another instance. | |
std::string | getAlgorithmName () const |
Return the name of the algorithm used for calculating the time alignment. | |
int | getBinSize () const |
Return the value supplied in the constructor, or loaded from an xml or cdb file. | |
ms_ms1quant_time_align_limits & | getCombinedLimits (const int fractionNum) |
The m/z and retention time limits for each fraction are calculated from the limits for each rawfile and search results file for that fraction. | |
ms_ms1quant_time_align_limits | getCombinedLimits (const int fractionNum) const |
The m/z and retention time limits for each fraction are calculated from the limits for each rawfile and search results file for that fraction. | |
const ms_errs * | getErrorHandler () const |
Retrive the error object using this function to get access to all errors and error parameters. | |
double | getEstimatedFeatureWidth (const int subProjectId, const double mOverZ, const double rt) const |
For label free, return the estimated width of an XIC for a given mOverZ and retention time. | |
bool | getEvaluation (const int subProject, double &meanErrorsRaw, double &meanErrorsAligned, double &stdevErrorsRaw, double &stdevErrorsAligned, double &pearsonCoefficientRaw, double &pearsonCoefficientAligned) |
const featureWidths_t & | getFeatureWidths () const |
Returns the multidimentional vector that has the feature widths for each project. | |
std::vector< std::vector< std::vector< double > > > & | getFinalResults () |
const std::vector< std::vector< std::vector< double > > > & | getFinalResults () const |
fractionToSubProject_t | getFractionToSubProjectMap () const |
Return the map to enable looking up a list of sub projects for each fraction. | |
int | getLastError () const |
Return the error description of the last error that occurred. | |
std::string | getLastErrorString () const |
Return the error description of the last error that occurred. | |
double | getRToffset (const int myProjectId, const int otherProjectId, const double rtInOtherProject, const double mOverZ) const |
Get the offset between two projects. | |
const std::vector< std::vector< std::vector< double > > > & | getShiftsFromConsensus () const |
Returns the vector that has all the offsets from a consensus to a specified project. | |
const std::vector< std::vector< std::vector< double > > > & | getShiftsToConsensus () const |
Returns the vector that has all the offsets from one project to a consensus. | |
status_t | getStatus (void) const |
Get the status of the data. | |
std::string | getStatusAsText (void) const |
Get the status of the data as text for a message. | |
const std::vector< int > | getSubProjectToFractionMap () const |
Return a vector containing the fraction number that each subproject belongs to. | |
bool | isValid () const |
Call this function to determine if there have been any errors. | |
bool | loadXmlFile (const std::string &xmlFilename, const std::string &schemaDirectory) |
Populate the object from an XML file. | |
bool | saveXmlFile (const std::string &xmlFilename, const std::string &schemaDirectory) const |
Save just the time alignment data to an XML file. | |
void | setAlgorithmName (const char *algorithmName) |
void | setAverageFeatureWidths (const std::vector< double > &featureWidths) |
For supervised time alignment, a single feature width per subproject is used. | |
Protected Member Functions | |
void | setEvaluation (int subProject, double meanErrorsRaw, double meanErrorsAligned, double stdevErrorsRaw, double stdevErrorsAligned, double pearsonCoefficientRaw, double pearsonCoefficientAligned) |
Time alignment between multiple runs (raw files) is performed in Mascot Distiller for label free quantitation.
The time alignment results are stored within a .rov file in Distiller 2.9 and later.
See: ms_ms1quantitation::getTimeAlignmentData() for how to obtain this object. Alternatively, create an empty object and then call ms_ms1quant_time_align::loadXmlFile()
For example code, see Examples for the Mascot Parser quantitation module
enum status_t |
Flags used to determine whether time alignment needs to be performed (again)
ms_ms1quant_time_align | ( | const int | binSize = 12 | ) |
Default constructor.
The constructor will normally be called internally in Parser. See: ms_ms1quantitation::getTimeAlignmentData() for how to obtain this object. Alternatively, use this constructor and then call ms_ms1quant_time_align::loadXmlFile()
binSize | defaults to 12 daltons |
|
inherited |
Copies all errors from another instance and appends them at the end of own list.
src | The object to copy the errors across from. See Maintaining object references: two rules of thumb. |
|
inherited |
Remove all errors from the current list of errors.
The list of 'errors' can include fatal errors, warning messages, information messages and different levels of debugging messages.
All messages are accumulated into a list in this object, until clearAllErrors() is called.
See Error Handling.
|
inherited |
Use this member to make a copy of another instance.
right | is the source to initialise from |
std::string getAlgorithmName | ( | ) | const |
Return the name of the algorithm used for calculating the time alignment.
Brief descriptive text that describes the algorithm used for the time alignment.
Currently restricted to "Unsupervised" or "Supervised"
int getBinSize | ( | ) | const |
Return the value supplied in the constructor, or loaded from an xml or cdb file.
For time alignment, m/z values are binned together. This is the size of each bin. Spectra with precursors in the same 'bin' will have the same time alignment shift.
ms_ms1quant_time_align_limits & getCombinedLimits | ( | const int | fractionNum | ) |
The m/z and retention time limits for each fraction are calculated from the limits for each rawfile and search results file for that fraction.
fractionNum | can be obtained by calling getSubProjectToFractionMap() |
ms_ms1quant_time_align_limits getCombinedLimits | ( | const int | fractionNum | ) | const |
The m/z and retention time limits for each fraction are calculated from the limits for each rawfile and search results file for that fraction.
fractionNum | can be obtained by calling getSubProjectToFractionMap() |
|
inherited |
Retrive the error object using this function to get access to all errors and error parameters.
See Error Handling.
double getEstimatedFeatureWidth | ( | const int | subProjectId, |
const double | mOverZ, | ||
const double | rt | ||
) | const |
For label free, return the estimated width of an XIC for a given mOverZ and retention time.
Used in label free quantitation (Replicate, not Average). Should only be used where a retention time is predicted based on time alignment. If a match has been found from a Mascot database search, we should be confident that we can start searching for the XIC from that retention time. If the the retention time has been predicted by the time alignment algorithm, then it isn't safe to assume that the predicted point will be in the XIC. Safer to use a range, but we need an estimate of that range, and that is what is returned by this function.
The retention time of the query in another subproject is transformed into the retention time passed to this function by calling getRToffset() , so rt is not going to be an exact value, and it's unlikely that there will be a 'feature' in this subproject at that retention time. So, this function looks for the closest retention time.
It can't be assumed that all subprojects will have the same number of bins. For example, another subproject may have a peptide with m/z 5000 but this subproject may have only found (in the crude feature finding) values up to a maximum m/z 4800. In this case, the highest bin is used.
If there are no bins for a specific subproject, or an invalid subProjectId is passed, then zero will be returned.
For Waters MS^E DIA projects, it's possible to get LOGMSG_QUANT_TIME_ALIGN_INDEX_MZ_RANGE1 debug messages for some lower mass pepetides. This is because an "average" charge will have been rounded up to the nearest integer, and matrix_science::ms_ms1quant_match_component_body::updateMoverZ() is called and therefore, the value returned by matrix_science::ms_ms1quant_match_component::getMoverz() may be lower than any of the precursor values in the 'survey' scans. If this happens, the debug message is generated, and the nearest bin is used. This should give a very close approximation to the proper value, and the debug message can safely be ignored by the client application.
[in] | subProjectId | is a 1 based sub project number |
[in] | mOverZ | is the precursor m/z value. m/z values are 'binned' so similar m/z values may return the same width |
[in] | rt | is the predicted retention time of the ms/ms spectrum |
bool getEvaluation | ( | const int | subProject, |
double & | meanErrorsRaw, | ||
double & | meanErrorsAligned, | ||
double & | stdevErrorsRaw, | ||
double & | stdevErrorsAligned, | ||
double & | pearsonCoefficientRaw, | ||
double & | pearsonCoefficientAligned | ||
) |
See Multiple return values in Perl, Java, Python and C#.
Use these values to assess how good the alignment is before and after the optimised shift has been found using a measure of correlation between the two datasets. The Pearson coefficient is a number between -1 and 1 where -1 is a perfect negative correlation, zero indicates no correlation and 1 is a perfect positive correlation. The correlation is a suitable test for alignment because, in quantitation tests, the datasets are likely to have some biological difference between them due to the test parameters. The correlation will still return a positive correlation even if the peak intensity for one dataset is much smaller than the other.
If the pearsonCoefficientRaw is close to zero (say -0.1 to +0.1) then the two datasets are very dissimilar. It's possible to then flag to the user that they may have loaded two completely unrelated datasets into the analysis by mistake. Also if pearsonCoefficientAligned isn't very high (say less than 0.8) the user can be warned that the alignment process hasn't worked very well and they may want to use a different method.
The correlation coefficient value can be used to determine if the time alignment operation has improved the correlation (and therefore alignment). Some threshold values can be defined to warn the user if the alignment is poor before and/or after the alignment operation, or even if two completely unrelated datasets have been supplied to the algorithm.
For each 'fraction', a consensus time alignment is calculated. For each subproject, the shift from the consensus to the subproject is calculated. The values here are for the correlation between the consensus and the project.
The 'raw' values are the intensity values from each scan and subproject or fraction. The aligned values are the intensity values after all optimised time shifts have been applied.
[in] | subProject | is the one based subproject number |
[out] | meanErrorsRaw | is the mean average of the absolute difference between the consensus and raw values for each subproject and fraction |
[out] | meanErrorsAligned | is the mean average of the absolute difference between the consensus and aligned values for each subproject and fraction |
[out] | stdevErrorsRaw | is the standard deviation of the difference between the consensus and raw values for each subproject and fraction |
[out] | stdevErrorsAligned | is the standard deviation of the difference between the consensus and aligned values for each subproject and fraction |
[out] | pearsonCoefficientRaw | is the Pearson correlation coefficient between the consensus and raw values for each subproject and fraction |
[out] | pearsonCoefficientAligned | is the Pearson correlation coefficient between the consensus and aligned values for each subproject and fraction |
const ms_ms1quant_time_align::featureWidths_t & getFeatureWidths | ( | ) | const |
Returns the multidimentional vector that has the feature widths for each project.
To access the correct map/dictionary use the return value with indexes: [prj][bin] where
The returned map/dictionary will have a set of values mapping specific retention times to widths
std::vector< std::vector< std::vector< double > > > & getFinalResults | ( | ) |
const std::vector< std::vector< std::vector< double > > > & getFinalResults | ( | ) | const |
ms_ms1quant_time_align::fractionToSubProject_t getFractionToSubProjectMap | ( | ) | const |
Return the map to enable looking up a list of sub projects for each fraction.
|
inherited |
Return the error description of the last error that occurred.
All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.
See Error Handling.
|
inherited |
Return the error description of the last error that occurred.
All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.
See Error Handling.
double getRToffset | ( | const int | myProjectId, |
const int | otherProjectId, | ||
const double | rtInOtherProject, | ||
const double | mOverZ | ||
) | const |
Get the offset between two projects.
Returns the estimated retention time offset between two projects for a given m/z value. To find a predicted retention time in myProject, call this function and then add the returned offset to the rtInOtherProject
This is implemented by finding the value in the array returned by getShiftsToConsensus() for otherProjectId and adding the value in the array returned by getShiftsFromConsensus() for myProjectId.
Both projects must belong to the same fraction for this to be meaningful.
For Waters MS^E DIA projects, it's possible to get LOGMSG_QUANT_TIME_ALIGN_INDEX_MZ_RANGE debug messages for some lower mass pepetides. This is because an "average" charge will have been rounded up to the nearest integer, and matrix_science::ms_ms1quant_match_component_body::updateMoverZ() is called and therefore, the value returned by matrix_science::ms_ms1quant_match_component::getMoverz() may be lower than any of the precursor values in the 'survey' scans. If this happens, the debug message is generated, and the nearest bin is used. This should give a very close approximation to the proper value, and the debug message can safely be ignored by the client application.
myProjectId | is a zero based |
otherProjectId | is a zero based |
rtInOtherProject | is the retention time in the other project which is to be 'aligned' to a retention time in my project |
mOverZ | is required because alignment is done in m/z bins |
const std::vector< std::vector< std::vector< double > > > & getShiftsFromConsensus | ( | ) | const |
Returns the vector that has all the offsets from a consensus to a specified project.
Return a read-only 3D vector that should be accessed with [prj][bin][rt] where
The value in the array at those indexes is the predicted retention time shift from the 'consensus' to [prj].
To find the predicted retention time shift between project 'A' and project 'B', call getRToffset()
const std::vector< std::vector< std::vector< double > > > & getShiftsToConsensus | ( | ) | const |
Returns the vector that has all the offsets from one project to a consensus.
Return a read-only 3D vector that should be accessed with [prj][bin][rt] where
The value in the array at those indexes is the predicted retention time shift from [prj] to the 'consensus'.
To find the predicted retention time shift between project 'A' and project 'B', call getRToffset()
ms_ms1quant_time_align::status_t getStatus | ( | void | ) | const |
Get the status of the data.
When the object is initially constructed, it sets the status to ms_ms1quant_time_align::ST_TA_NO_DATA
A successful calls to loadXmlFile() will cause the flag to be set to ms_ms1quant_time_align::ST_TA_LOADED_FROM_XML and if the object is initialised from a CDB file it will be set to ms_ms1quant_time_align::ST_TA_LOADED_FROM_CDB If the data is calculated in msquantlib, the value is set to ms_ms1quant_time_align::ST_TA_CALCULATED
std::string getStatusAsText | ( | void | ) | const |
Get the status of the data as text for a message.
When the object is initially constructed, it sets the status to ms_ms1quant_time_align::ST_TA_NO_DATA
A successful calls to loadXmlFile() will cause the flag to be set to ms_ms1quant_time_align::ST_TA_LOADED_FROM_XML and if the object is initialised from a CDB file it will be set to ms_ms1quant_time_align::ST_TA_LOADED_FROM_CDB If the data is calculated in msquantlib, the value is set to ms_ms1quant_time_align::ST_TA_CALCULATED
const std::vector< int > getSubProjectToFractionMap | ( | ) | const |
Return a vector containing the fraction number that each subproject belongs to.
|
inherited |
Call this function to determine if there have been any errors.
This will return true unless there have been any fatal errors.
See Error Handling.
bool loadXmlFile | ( | const std::string & | xmlFilename, |
const std::string & | schemaDirectory | ||
) |
Populate the object from an XML file.
This function is used to load the time alignment data as a discrete XML file.
If this function is successful, it will return true and the value returned by getStatus() will be ms_ms1quant_time_align::ST_TA_LOADED_FROM_XML
xmlFilename | The path and filename of the file to load |
schemaDirectory | is the directory which will contain the distiller_time_align_1.xsd file |
bool saveXmlFile | ( | const std::string & | xmlFilename, |
const std::string & | schemaDirectory | ||
) | const |
Save just the time alignment data to an XML file.
This is function is used to load the time alignment data as a discrete XML file. The time alignment data can also be saved as part of the the ms1 based quantitation results by calling ms_ms1quantitation::saveXmlFile with the saveTimeAlignmentData parameter set to true.
xmlFilename | The path and filename of the file to load |
schemaDirectory | is the directory which will contain the distiller_time_align_1.xsd file |
void setAlgorithmName | ( | const char * | algorithmName | ) |
Brief descriptive text that describes the algorithm used for the time alignment.
Currently restricted to "Unsupervised" or "Supervised"
algorithmName | describes the type of algorithm used. |
void setAverageFeatureWidths | ( | const std::vector< double > & | featureWidths | ) |
For supervised time alignment, a single feature width per subproject is used.
For supervised time alignment, the feature width for a subproject is calculated by taking the average xic width for all identified peptides in the subproject. Hence, just one value per subproject.
simpleFeatureWidths | is a 1D vector with numSubproject values in seconds |
|
protected |
[in] | subProject | is the one based subproject number |
[out] | meanErrorsRaw | is the mean average of the absolute difference between the ? ? ? |
[out] | meanErrorsAligned | |
[out] | stdevErrorsRaw | |
[out] | stdevErrorsAligned | |
[out] | pearsonCoefficientRaw | |
[out] | pearsonCoefficientAligned |