Matrix Science Mascot Parser toolkit
 
Loading...
Searching...
No Matches
ms_imputation_missforest Class Reference

#include <ms_imputation.hpp>

Inheritance diagram for ms_imputation_missforest:
Collaboration diagram for ms_imputation_missforest:

Public Member Functions

void appendErrors (const ms_errors &src)
 Copies all errors from another instance and appends them at the end of own list.
 
void clearAllErrors ()
 Remove all errors from the current list of errors.
 
void copyFrom (const ms_errors *right)
 Use this member to make a copy of another instance.
 
std::vector< std::vector< ms_imputation_missing_val > > & getDataWithMissing ()
 Get the data array that the imputation method will impute.
 
const ms_errsgetErrorHandler () const
 Retrive the error object using this function to get access to all errors and error parameters.
 
int getLastError () const
 Return the error description of the last error that occurred.
 
std::string getLastErrorString () const
 Return the error description of the last error that occurred.
 
std::vector< std::vector< double > > impute () override
 Returns a 2D array of double with the miss forest prediction results in place of missing values.
 
bool isValid () const
 Call this function to determine if there have been any errors.
 

Protected Member Functions

void averageFindAverage ()
 Find average of each column.
 
std::vector< std::vector< double > > changems_imputation_missing_valArrayToDoubleArray (std::vector< std::vector< ms_imputation_missing_val > > dataProcess)
 Extracts only the values from a 2d array of missing value objects.
 
std::vector< double > changems_imputation_missing_valVecToDoubleVec (std::vector< ms_imputation_missing_val > missingObservation)
 Extracts only the values from a vector of missing value objects.
 
std::vector< double > getKnownValues (std::vector< ms_imputation_missing_val >)
 Loop over one observation and get not missing values.
 
std::vector< int > getMissingIndexes (std::vector< ms_imputation_missing_val >)
 Loop over one observation and get missing indexes.
 
std::vector< std::vector< int > > removeDuplicateIndexes (std::vector< std::vector< int > > duplicateIndexes)
 The same combination of missing indexes can appear multiple times within a dataset. Duplicate indexes are removed from the missing indexes list to avoid making repeated alglib models.
 
void setDataWithMissing (const std::vector< std::vector< ms_imputation_missing_val > > &dataWithMissingIn)
 

Detailed Description

An accurate prediction algorithm that uses an iterative random forest approach to create progressively improving predictions.

EXPERIMENTAL
This class is EXPERIMENTAL. Both the API and the implementation may change in a future version of Parser.

The original paper can be found here: https://academic.oup.com/bioinformatics/article/28/1/112/219101 While slower to complete than KNN, Miss Forest has proven to be more accurate than KNN during in house testing and within the paper. Unlike KNN, Miss Forest does not need any complete observations to function. Miss Forest uses average value imputation as a starting point and as a means of filling out all missing observations. Although to missing values are no longer missing, the positions of the original missing values is saved and used throughout. Then Miss Forest creates a random forest from the average imputed data for each variable with original missing values. Each original missing value is predicted and updated within the working dataset. This process is repeated and the two products are compared with the difference recorded. The forest creation, prediction, update and comparsion is repeated until the difference between consecutive products increases in size. The final prediction is the previous product before the increase in differences.

For example the array of five observations of five variables with some missing observations marked with a "x":

0.0 1.0 2.0 3.0 4.0
0.3 1.3 2.3 3.3 4.3
0.1 1.1 2.1 3.1 4.1
x 1.2 x 3.2 x
-0.1 x 1.9 x 3.9

Is imputed to :

0.0 1.0 2.0 3.0 4.0
0.3 1.3 2.3 3.3 4.3
0.1 1.1 2.1 3.1 4.1
0.16 1.2 2.16 3.2 4.15
-0.1 1.0 1.9 3.0 3.9

This example requires 6 iterations.

Miss Forest imputation class inherits general imputation method class with average value specific properties and methods To use missforest impution, create an instance of the ms_imputation_missforest class, then an instance of ms_imputation using ms_imputation_missforest in the constructor, finally call ms_imputation.impute(). For example, with a valid ms_ms1quantitation object, in C#: ms_imputation_missforest missForestImputation = new ms_imputation_missforest(); ms_imputation Imputation = new ms_imputation(ms1Quant, missForestImputation, IMPUTATION_VARIABLE.IMPUTE_PEPTIDE_RATIO); VecVecdouble imputationRes = Imputation.impute()

Member Function Documentation

◆ appendErrors()

void appendErrors ( const ms_errors src)
inherited

Copies all errors from another instance and appends them at the end of own list.

Parameters
srcThe object to copy the errors across from. See Maintaining object references: two rules of thumb.

◆ clearAllErrors()

void clearAllErrors ( )
inherited

Remove all errors from the current list of errors.

The list of 'errors' can include fatal errors, warning messages, information messages and different levels of debugging messages.

All messages are accumulated into a list in this object, until clearAllErrors() is called.

See Error Handling.

See also
isValid(), getLastError(), getLastErrorString(), getErrorHandler()
Examples
common_error.cpp, resfile_error.cpp, and resfile_summary.cpp.

◆ copyFrom()

void copyFrom ( const ms_errors right)
inherited

Use this member to make a copy of another instance.

Parameters
rightis the source to initialise from

◆ getErrorHandler()

const ms_errs * getErrorHandler ( ) const
inherited

Retrive the error object using this function to get access to all errors and error parameters.

See Error Handling.

Returns
Constant pointer to the error handler
See also
isValid(), getLastError(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Examples
common_error.cpp, and http_helper_getstring.cpp.

◆ getLastError()

int getLastError ( ) const
inherited

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

See Error Handling.

See also
isValid(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Returns
the error number of the last error, or 0 if there have been no errors.

◆ getLastErrorString()

std::string getLastErrorString ( ) const
inherited

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

Returns
Most recent error, warning, information or debug message

See Error Handling.

See also
isValid(), getLastError(), clearAllErrors(), getErrorHandler()
Examples
common_error.cpp, config_enzymes.cpp, config_fragrules.cpp, config_license.cpp, config_mascotdat.cpp, config_masses.cpp, config_modfile.cpp, config_procs.cpp, config_quantitation.cpp, config_taxonomy.cpp, http_helper_getstring.cpp, and tools_aahelper.cpp.

◆ impute()

std::vector< std::vector< double > > impute ( )
overridevirtual

Returns a 2D array of double with the miss forest prediction results in place of missing values.

An accurate prediction algorithm that uses an iterative random forest approach to create progressively improving predictions. The oriinal paper can be found here: https://academic.oup.com/bioinformatics/article/28/1/112/219101 While slower to complete than KNN, Miss Forest has proven to be more accurate than KNN during in house testing and within the paper. Unlike KNN, Miss Forest does not need any complete observations to function. Miss Forest uses average value imputation as a starting point and as a means of filling out all missing observations. Although to missing values are no longer missing, the positions of the original missing values is saved and used throughout. Then Miss Forest creates a random forest from the average imputed data for each variable with original missing values. Each original missing value is predicted and updated within the working dataset. This process is repeated and the two products are compared with the difference recorded. The forest creation, prediction, update and comparsion is repeated until the difference between consecutive products increases in size. The final prediction is the previous product before the increase in differences.

For example the array of five observations of five variables with some missing observations marked with a "x":

0.0 1.0 2.0 3.0 4.0
0.3 1.3 2.3 3.3 4.3
0.1 1.1 2.1 3.1 4.1
x 1.2 x 3.2 x
-0.1 x 1.9 x 3.9

Is imputed to :

0.0 1.0 2.0 3.0 4.0
0.3 1.3 2.3 3.3 4.3
0.1 1.1 2.1 3.1 4.1
0.16 1.2 2.16 3.2 4.15
-0.1 1.0 1.9 3.0 3.9

This example requires 6 iterations.

Returns
A 2D array with the miss forest predictions in place of missing observations

Implements ms_imputation_method.

◆ isValid()

bool isValid ( ) const
inherited

◆ setDataWithMissing()

void setDataWithMissing ( const std::vector< std::vector< ms_imputation_missing_val > > &  dataWithMissingIn)
protectedinherited

Set the missing value array

Parameters
dataWithMissingInThe array of missing values to be imputed

The documentation for this class was generated from the following files: