Matrix Science Mascot Parser toolkit
 
Loading...
Searching...
No Matches
Using Percolator scores

Mascot Server ships with Percolator, which is an algorithm that uses semi-supervised machine learning to improve the discrimination between correct and incorrect spectrum identifications.

Percolator was developed by Lukas Käll, Jesse D Canterbury, Jason Weston, William Stafford Noble and Michael J MacCoss at the University of Washington, Department of Genome Sciences.

Percolator scores can be shown in Mascot Server reports instead of Mascot ions scores.

Workflow

Percolated results are available both for old and new searches. However, they are only available when a decoy search has been performed. The Mascot results file is not changed by Percolator, but additional files for each results file are created by running ms-createpip.exe and percolator.exe.

ms-createpip.exe calculates core features from the peptide match data. Mascot Server 3.0 and later can also fetch predicted features from other machine learning tools like MS2Rescore using an adapter interface. There are two sets of pip (Percolator input) and pop (Percolator output) files, whose names depend on Mascot options and ML adapter parameters. These are enumerated below.

Name Meaning Dependencies
Core pip file Percolator input (pip) file created by ms-createpip.exe, containing features calculated from target and decoy peptide matches. The core pip file name and contents depend on mascot.dat options:
  • PercolatorFeatures (string)
  • PercolatorUseRT (true/false)
  • PercolatorUseProteins (true/false)
  • PercolatorExeFlags (string)
  • PercolatorTargetRankScoreThreshold (int)
  • PercolatorTargetRankRelativeThreshold (double)
Final pip file Pip file created by insert_predicted_data.pl from the core pip file, containing features predicted by machine learning tools (e.g. MS2Rescore). The final pip file name and contents depend on the core pip file, the above mascot.dat options, plus any ML adapter parameters (e.g. with MS2Rescore, the MS2PIP model name).
Pop file Percolator output (pop) file created by percolator.exe from a pip file. Percolator creates one pop file for target matches and one for decoy matches. The target and decoy pop file names depend on the above mascot.dat options plus any ML adapter parameters. The contents depend only on the pip file given as argument.

Workflow on the client side

When you use Parser in a client application, you need the results file and the target/decoy pop files. Normally these are downloaded from Mascot Server, and the easiest way is using ms_http_client_search.

Parser reads pop files from a fixed location:

The detailed workflow is:

  1. Mascot Server 3.0 and later: Set PERCOLATE=1 search parameter.
    Mascot Server 2.8 and earlier: Set Percolator option to 1 in mascot.dat on server side, so that Mascot runs ms-createpip.exe and percolator.exe at the end of the search (ExecAfterSearch).
  2. Run a Mascot search, making sure that the DECOY option is on. Download the results file.
  3. Download config/mascot.dat.
  4. Download the pop files.
  5. Check that there are sufficient queries in the search by comparing ms_mascotresfilebase::getNumQueries() with ms_mascotoptions::getPercolatorMinQueries().
  6. Check that there are sufficient sequences in the search by comparing ms_mascotresfilebase::getNumSeqsAfterTax() with ms_mascotoptions::getPercolatorMinSequences().
  7. Generate the pip and pop file names: call ms_mascotresfilebase::setPercolatorFeatures(). Use the same ms_mascotoptions object.
  8. Specify MSPEPSUM_PERCOLATOR as part of the flags2 parameter when creating an ms_peptidesummary object.
  9. Optionally, specify target FDR with ms_mascotresults_params::setTargetFDR().

If the downloaded pop file name doesn't match the one generated by setPercolatorFeatures(), it may be because the Mascot options have a difference, or it could be because you are using a different version of Parser compared to the Mascot Server version. An easy workaround is to rename the downloaded pop files to match the file names returned by ms_mascotresfilebase::getPercolatorFileNames().

Workflow on the server side

For completeness, the workflow in Mascot Server is as follows. The workflow is implemented in the server-side script mascot/bin/refine_results_with_ml.pl. The detailed steps are:

  1. Run a Mascot search, making sure that the DECOY option is on.
  2. Check that there are sufficient queries in the search by comparing ms_mascotresfilebase::getNumQueries() with ms_mascotoptions::getPercolatorMinQueries().
  3. Check that there are sufficient sequences in the search by comparing ms_mascotresfilebase::getNumSeqsAfterTax() with ms_mascotoptions::getPercolatorMinSequences().
  4. Run refine_results_with_ml.pl.

refine_results_with_ml.pl performs the steps:

  1. Generate the core pip file name.
  2. Retrieve the pip file name with ms_mascotresfilebase::getPercolatorFileNames().
  3. Run ms-createpip.exe, specifying the output filename as the core pip file.
  4. Generate the final pip and pop file names: call ms_mascotresfilebase::setPercolatorFeatures(). If any ML adapters should be enabled, pass the ML adapter parameters as the vector argument.
  5. If the core and final pip files are different, run insert_predicted_data.pl with core pip file as –pipfile_in and final pip file as –pipfile_out, and giving the ML adapter parameters as arguments.
  6. Run percolator.exe with command-line parameters from getPercolatorExeFlags() , giving the final pip and pop file names as argument.

Finally:

  1. If the final pip and pop files were generated successfully, specify MSPEPSUM_PERCOLATOR as part of the flags2 parameter when creating an ms_peptidesummary object.
  2. Optionally, specify target FDR with ms_mascotresults_params::setTargetFDR().

Steps 5 to 10 can be performed automatically after a search by specifying the appropriate options in getExecAfterSearch() .

There is also a static function, staticGetPercolatorFileNames() that can be called to get filenames without creating an ms_mascotresfile_msr or ms_mascotresfile_dat object.

Percolator Scores

A 'Percolator score' is calculated from the posterior error probability (PEP) by

    percolatorScore = -10 * log10(PEP)

This is analogous to the Mascot ions score, which is -10*log10(p-value). The posterior error probability is similar to but not the same as a p-value.

Percolator processes the rank 1 matches found by Mascot, plus any other ranks as defined by ms_mascotoptions::getPercolatorTargetRankScoreThreshold and ms_mascotoptions::getPercolatorTargetRankRelativeThreshold.

Peptide matches that were not processed by Percolator get a score based on the rank 1 match, scaled by the Mascot ions score:

    rank2PercolatorScore = (rank2MascotIonsScore/rank1MascotIonsScore) * rank1Percolatorscore

If the PEP value for rank 1 is exactly 1, it is reset to 0.9999. This is to ensure that there is a tiny amount of spread in scores for lower ranking peptides.

The function ms_peptide::getPercolatorScores() returns the posterior error probability which is used by Mascot Parser. A different score (calculated by Percolator itself) and the qValue can also be obtained, but these are unused by Mascot Parser.

When the MSPEPSUM_PERCOLATOR flag is specified, all Mascot scores are replaced with percolator derived scores. The original Mascot ions score for a peptide is still available by calling getPercolatorScores(). The following table describes how each of the existing Mascot functions have been changed to return Percolator values.

Mascot Parser FunctionHow value is calculated
ms_peptide::getIonsScore() -10 * log10( posterior error probability )
ms_protein::getPeptideIonsScore() Same value as getIonsScore() above, except a minor correction for large proteins is applied in the same way as for the Mascot Score.
ms_protein::getScore() Calculated using the percolator scores rather than Mascot scores. Same rules for MudPIT and standard scoring apply.
ms_protein::getNonMudpitScore() Calculated using the percolator scores rather than Mascot scores. Same rules for MudPIT scoring apply.
ms_mascotresults::getPeptideIdentityThreshold() Calculated by taking -10log10(sigthreshold), so a value of sigthreshold=0.05 gives a threshold score of ~13
ms_mascotresults::getAvePeptideIdentityThreshold() With Percolator, the threshold is the same for every query, so this is exactly the same as getPeptideIdentityThreshold() above.
ms_mascotresults::getMaxPeptideIdentityThreshold() With Percolator, the threshold is the same for every query, so this is exactly the same as getPeptideIdentityThreshold() above.
ms_mascotresults::getHomologyThreshold() With Percolator, there is no homology threshold, so this always returns 0.
ms_mascotresults::getHomologyThresholdForHistogram() With Percolator, there is no homology threshold, so this always returns 0.
ms_mascotresults::getPeptideExpectationValue() Return the posterior error probability by calculating it back from the score using 10 ^ score/-10. The same value can also be obtained by calling ms_peptide::getPercolatorScores() and retrieving the posterior error probability value.
ms_mascotresults::getProbFromScore() Calls getPeptideExpectationValue() as above.
ms_mascotresults::getIonsScoreHistogram() Returns a vector of Percolator scores rather than Mascot scores.
ms_mascotresults::getProteinScoreForHistogram() Returns the Percolator protein score rather than Mascot protein score
ms_mascotresults::getNumHitsAboveIdentity() Any peptide with a posterior error probability less than the significance value specified will be counted.
ms_mascotresults::getNumDecoyHitsAboveIdentity() Any peptide with a posterior error probability less than the significance value specified will be counted.
ms_mascotresults::getNumHitsAboveHomology() No homology thresholds, so always returns the same number as getNumHitsAboveIdentity().
ms_mascotresults::getNumDecoyHitsAboveHomology() No homology thresholds, so always returns the same number as getNumDecoyHitsAboveIdentity().

You currently cannot specify MSPEPSUM_PERCOLATOR with an Integrated error tolerant search.

The following configuration functions are relevant: