Note: This describes the structure of a multi-file project. Projects with a single file have a different, simpler structure, see Extracting quantitation results with the Distiller toolkit.

Multi-file Quantitation Overview

Distiller stores its results are stored in a Distiller project (.rov) file.

The project file is the same format as a .ZIP file. You can view the contents by opening it with a .ZIP file viewer.

The data can also be extracted using the Mascot Distiller SDK (MDRO), see Distiller projects and the toolkit.

Project and subproject files

The master file for the Distiller multi-file project contains references to the subprojects.

The subproject files are held in a sub-folder with a name formed by appending ".files" to the name of the master file (without the ".rov" extension). For example, the master project file "Triple2.rov" would have a subfolder called "Triple2.files", in the same folder as the master file, to hold the subproject files.

Master project file streams

The master project file contains a number of streams.

name	format	contents
`rover_data+2`	XML	The master project containing lists of the subprojects, searches and sub-searches.
`rover_data`	XML	The Distiller project to be used to create a ms_distiller_data.
`rover_data+`(cac0+N)	MIME	The combined peptide summary cache to be used to construct a ms_peptidesummary. N is the `id` attribute from the search in the master project lists.
`rover_data+`(1f40+M)	binary	The combined quantitation results file used to poulate a ms_ms1quantitation. M is the `projectStream` attribute from the Distiller project.
`rover_data+`(bb8+M)	binary	The combined quantitation cache file used to poulate a ms_ms1quantitation. M is the `projectStream` attribute from the Distiller project.

subproject file streams

Each subproject file contains a number of streams.

name	format	contents
`mdro_search_status`	XML	A search status list of the search results in the subproject.
`mdro_search_status+N`	MIME	The Mascot search results to be used in the combined ms_mascotresfile. N is the `id` attribute from the search status.

Master project lists

The lists of subprojects and searches are held in XML format in the stream rover_data+2 of the master project file. The presence of this stream indicates that this is the master file for a multi-file project.

This stream holds the list of subprojects in the master project (at XPath /MasterProject/SubProject).

It also holds the id used to generate the name of the combined peptide summary cache stream (at XPath /MasterProject/Searches/Search/@id).

The name of the stream is available from ms_distiller_master_project.

std::string masterProjectStreamName = matrix_science::ms_distiller_master_project::getMasterProjectStreamName();

This data should be loaded into an ms_distiller_master_project.

matrix_science::ms_distiller_master_project masterProject;
masterProject.loadXmlFile(xmlSchemaDirectory, masterProjectPathname);

The schema for this is in distiller_master_project_1.xsd.

Distiller project parameters

The Distiller project parameters are held in the stream rover_data.

This data has the same format as that for single file Distiller project parameters.

Mascot results

The Mascot results are combined from the mdro_search_status+N stream from each of the subproject files, where N is the id from the search status in the subproject. Each subproject has its own Mascot results stream. See Search list and Mascot results for a single file Distiller project.

For a multi-file project, instead of creating a ms_mascotresfile for each subproject, the ms_mascotresfile is created from the Mascot results of each subproject in turn. The first subproject is used for the constructor and then the subsequent ones are each appended.

The last update time attribute of the first results file is used by Parser when determining the name of the subfolder used for the peptide summary cache file; see Mascot results.

std::vector<std::string> resfilePathnames;

... // extract subproject results files and poluate filenames

matrix_science::ms_mascotresfile * resfile = new matrix_science::ms_mascotresfile(
        resfilePathnames[0].c_str(),
        0, // keepAliveInterval
        "<!-- %d seconds -->\n", // keepAliveText
        matrix_science::ms_mascotresfile::RESFILE_NOFLAG,
        cacheDirectory.c_str(),
        xmlSchemaDirectory);
for (int iResult = 1; iResult < resfilePathnames.size(); ++iResult)
{
    resfile->appendResfile(resfilePathnames[iResult].c_str());
}

Combined peptide summary cache

The peptide summary cache for the combined results files is in the stream rover_data+cac0 + N of the master project file, where N is the id of the search from the master project; for example "rover_data+cac1".

The ms_distiller_master_project should be used to retrieve the name of the stream containing the combined peptide summary cache file.

std::string pepsumStreamName = masterProject.getSearch(1).getCombinedPeptideSummaryCacheStreamName();

The ms_distiller_data from the master project should be used for the settings when creating the peptide summary filename and again when creating the peptide summary.

std::string pepsumFilename = matrix_science::ms_peptidesummary::getCacheFilename(
        *resfile,
        distillerData,
        1); // first search in the list

The peptide summary cache data should be copied to the correct file location before constructing the peptide summary, see Peptide summary cache for a single file Distiller project.

matrix_science::ms_peptidesummary * peptideSummary = new matrix_science::ms_peptidesummary(
        *resfile,
        distillerData,
        1);

Quantitation results

The combined (from all subprojects) quantitation data is stored in two streams of the master project file.

The stream names are obtained from the ms_distiller_data of the master project file. This is the same as for a single file project, see the single file Quantitation results.

matrix_science::ms_quant_method quantMethod = distillerData.quantMethod;
matrix_science::ms_ms1quantitation quant(peptideSummary, quantMethod);
std::string cdbStreamName = distillerData.getQuant(1).getResultsStreamName();
std::string cacheStreamName = distillerData.getQuant(1).getCacheStreamName();

... // extract the streams

quant.loadCdbFile(zipExtractPath + cdbStreamName, zipExtractPath + cacheStreamName);