Note: This describes the structure of a multi-file project. Projects with a single file have a different, simpler structure, see Extracting quantitation results with the Distiller toolkit.
Distiller stores its results are stored in a Distiller project (.rov) file.
The project file is the same format as a .ZIP file. You can view the contents by opening it with a .ZIP file viewer.
The data can also be extracted using the Mascot Distiller SDK (MDRO), see Distiller projects and the toolkit.
The master file for the Distiller multi-file project contains references to the subprojects.
The subproject files are held in a sub-folder with a name formed by appending ".files" to the name of the master file (without the ".rov" extension). For example, the master project file "Triple2.rov" would have a subfolder called "Triple2.files", in the same folder as the master file, to hold the subproject files.
The master project file contains a number of streams.
name | format | contents |
---|---|---|
rover_data+2 | XML | The master project containing lists of the subprojects, searches and sub-searches. |
rover_data | XML | The Distiller project to be used to create a ms_distiller_data. |
rover_data+ (cac0+N) | MIME | The combined peptide summary cache to be used to construct a ms_peptidesummary. N is the id attribute from the search in the master project lists. |
rover_data+ (1f40+M) | binary | The combined quantitation results file used to poulate a ms_ms1quantitation. M is the projectStream attribute from the Distiller project. |
rover_data+ (bb8+M) | binary | The combined quantitation cache file used to poulate a ms_ms1quantitation. M is the projectStream attribute from the Distiller project. |
Each subproject file contains a number of streams.
name | format | contents |
---|---|---|
mdro_search_status | XML | A search status list of the search results in the subproject. |
mdro_search_status+N | MIME | The Mascot search results to be used in the combined ms_mascotresfile. N is the id attribute from the search status. |
The lists of subprojects and searches are held in XML format in the stream rover_data+2
of the master project file. The presence of this stream indicates that this is the master file for a multi-file project.
This stream holds the list of subprojects in the master project (at XPath /MasterProject/SubProject
).
It also holds the id
used to generate the name of the combined peptide summary cache stream (at XPath /MasterProject/Searches/Search/@id
).
The name of the stream is available from ms_distiller_master_project.
std::string masterProjectStreamName = matrix_science::ms_distiller_master_project::getMasterProjectStreamName();
This data should be loaded into an ms_distiller_master_project.
matrix_science::ms_distiller_master_project masterProject; masterProject.loadXmlFile(xmlSchemaDirectory, masterProjectPathname);
The schema for this is in distiller_master_project_1.xsd
.
The Distiller project parameters are held in the stream rover_data
.
This data has the same format as that for single file Distiller project parameters.
The Mascot results are combined from the mdro_search_status+N
stream from each of the subproject files, where N is the id
from the search status in the subproject. Each subproject has its own Mascot results stream. See Search list and Mascot results for a single file Distiller project.
For a multi-file project, instead of creating a ms_mascotresfile for each subproject, the ms_mascotresfile is created from the Mascot results of each subproject in turn. The first subproject is used for the constructor and then the subsequent ones are each appended.
The last update time attribute of the first results file is used by Parser when determining the name of the subfolder used for the peptide summary cache file; see Mascot results.
std::vector<std::string> resfilePathnames; ... // extract subproject results files and poluate filenames matrix_science::ms_mascotresfile * resfile = new matrix_science::ms_mascotresfile( resfilePathnames[0].c_str(), 0, // keepAliveInterval "<!-- %d seconds -->\n", // keepAliveText matrix_science::ms_mascotresfile::RESFILE_NOFLAG, cacheDirectory.c_str(), xmlSchemaDirectory); for (int iResult = 1; iResult < resfilePathnames.size(); ++iResult) { resfile->appendResfile(resfilePathnames[iResult].c_str()); }
The peptide summary cache for the combined results files is in the stream rover_data+cac0
+ N of the master project file, where N is the id
of the search from the master project; for example "rover_data+cac1".
The ms_distiller_master_project should be used to retrieve the name of the stream containing the combined peptide summary cache file.
std::string pepsumStreamName = masterProject.getSearch(1).getCombinedPeptideSummaryCacheStreamName();
The ms_distiller_data from the master project should be used for the settings when creating the peptide summary filename and again when creating the peptide summary.
std::string pepsumFilename = matrix_science::ms_peptidesummary::getCacheFilename( *resfile, distillerData, 1); // first search in the list
The peptide summary cache data should be copied to the correct file location before constructing the peptide summary, see Peptide summary cache for a single file Distiller project.
matrix_science::ms_peptidesummary * peptideSummary = new matrix_science::ms_peptidesummary( *resfile, distillerData, 1);
The combined (from all subprojects) quantitation data is stored in two streams of the master project file.
The stream names are obtained from the ms_distiller_data of the master project file. This is the same as for a single file project, see the single file Quantitation results.
matrix_science::ms_quant_method quantMethod = distillerData.quantMethod; matrix_science::ms_ms1quantitation quant(peptideSummary, quantMethod); std::string cdbStreamName = distillerData.getQuant(1).getResultsStreamName(); std::string cacheStreamName = distillerData.getQuant(1).getCacheStreamName(); ... // extract the streams quant.loadCdbFile(zipExtractPath + cdbStreamName, zipExtractPath + cacheStreamName);