A common requirement is to repeat all searches with, for example, a different or updated database. The standard Mascot Server reports provide a button "Search Selected" or "Re-search all" to repeat a search, but it is tedious to do this manually for more than a few searches. Mascot Daemon also allows searches to be repeated as a follow on from a search using the original peak lists.
Repeating searches in a batch can easily be achieved using Mascot Parser. It is assumed you have access to a Mascot Server installation.
To repeat a search, nph-mascot.exe
has to be run with command parameter 4
and the search data needs to be taken from a MIME format input file. The format of the input file is described in the Mascot Installation and Setup manual, chapter 8.
In its simplest form, the MIME format file has the following structure:
----12345 Content-Disposition: form-data; name="QUE" MASS=Monoisotopic CLE=Trypsin ... 1234.012 1567.086 ----12345--
where MASS
, CLE
, etc. are the search parameters, and the two numbers are simply two masses for a peptide mass fingerprint search. Note the standard MIME format header and terminating line.
You can access the search parameters easily in Mascot Parser by using the params() method of the ms_mascotresfilebase
object. Another way is to iterate through the keys using ms_mascotresfilebase::getSearchParametersKeyValues(), which avoids having to type all parameter names directly.
The following example prints all search parameters except DB
, which is changed to another database. INTERMEDIATE
and RULES
are also skipped for reasons explained below.
ms_mascotresfilebase resfile = ms_mascotresfilebase::createResfile(filename); if (!resfile.isValid()) { /* Error handling... */ } std::vector<std::string> keys, values; resfile.getSearchParametersKeyValues(keys, values); for (size_t i = 0; i != keys.size(); ++i) { std::string key = keys[i]; std::string val = values[i]; if (!val.empty() && key != "INTERMEDIATE" && key != "RULES" && key != "DB") std::cout << key << "=" << val << std::endl; } std::cout << "DB=My_database" << std::endl;
my $resfile = msparser::ms_mascotresfilebase::createResfile($filename); if (not $resfile->isValid()) { # Error handling... } my $keys = msparser::VectorString->new; my $values = msparser::VectorString->new; $resfile->getSearchParametersKeyValues($keys, $values); for my $i (0 .. $keys->size()-1) { my $key = $keys->get($i); my $val = $values->get($i); if ($val ne '' and $key ne "INTERMEDIATE" and $key ne "RULES" and $key ne "DB") { print $key, "=", $val, "\n"; } } print "DB=My_database\n";
ms_mascotresfilebase resfile = ms_mascotresfilebase::createResfile(filename); if (!resfile.isValid()) { /* Error handling... */ } VectorString keys = new VectorString(); VectorString values = new VectorString(); resfile.getSearchParametersKeyValues(keys, values); for (int i = 0; i != keys.size(); ++i) { String key = keys.get(i); String val = values.get(i); if (val != "" && key != "INTERMEDIATE" && key != "RULES" && key != "DB") System.out.println(key + "=" + val); } System.out.println("DB=My_database");
resfile = msparser.ms_mascotresfilebase::createResfile(filename) if not resfile.isValid() : # Error handling... keys = msparser.VectorString() values = msparser.VectorString() resfile.getSearchParametersKeyValues(keys, values); for i in range(keys.size()): key = keys.get(i) val = values.get(i) if len(val) > 0 and key != "INTERMEDIATE" and key != "RULES" and key != "DB" : print("%s=%s" % (key, val)) print("DB=My_database")
ms_mascotresfilebase resfile = ms_mascotresfilebase::createResfile(filename); if (!resfile.isValid()) { /* Error handling... */ } VectorString keys = new VectorString; VectorString values = new VectorString; resfile.getSearchParametersKeyValues(keys, values); for (int i = 0; i != keys.Count; ++i) { String key = keys[i]; String val = values[i]; if (val != "" && key != "INTERMEDIATE" && key != "RULES" && key != "DB") Console.WriteLine("{0} = {1}",key,val); } Console.WriteLine("DB=My_database");
For MS-MS data, the complete set of ions peaks could be megabytes or even gigabytes of data, and it may make no sense to copy the data into the repeat search file. nph-mascot.exe
supports a "query" statement for this purpose, which is returned by getRepeatSearchString():
for (int q=1; q != resfile.getNumQueries(); q++) std::cout << resfile.getRepeatSearchString(q) << std::endl;
for my $q (1 .. $resfile->getNumQueries()) { print $resfile->getRepeatSearchString($q), "\n"; }
for (int q = 1; q != resfile.getNumQueries(); q++) System.out.println(resfile.getRepeatSearchString(q));
for q in range(1, 1 + resfile.getNumQueries()) : print(resfile.getRepeatSearchString(q))
for (int q = 1; q != resfile.getNumQueries(); q++) Console.WriteLine(resfile.getRepeatSearchString(q));
When nph-mascot.exe
reads in a query
statement, it loads the original input data from the Mascot results (.dat) file. To this end, you also need to add the INTERMEDIATE
parameter that points to the original results file:
std::cout << "INTERMEDIATE=" << filename << std::endl;
print "INTERMEDIATE=", $filename, "\n";
System.out.println("INTERMEDIATE=" + filename);
print("INTERMEDIATE=%s" % filename)
Console.WriteLine("INTERMEDIATE={0}", filename);
In the example code, the search is run by calling nph-mascot.exe
with two parameters, and the repeat search data is piped into the process' standard input:
nph-mascot.exe 4 -commandline > tmp.txt
The 4
indicates that this is a repeat search. The -commandline
parameter is used to prevent progress reports and HTML being written to standard out.
For a successful search, the output (e.g. tmp.txt
) will be of the form:
SUCCESS ../data/20031007/F001547.dat
where the data file name is output following the text "SUCCESS".
If an error occurs, the output will be of the form:
FATAL_ERROR: M00027 Sorry, the database (SwissProt) is not currently available for searching [M00027]
The example code uses a simple approach to compare the old and new results. For searches with PMF data, the top protein hit from both searches are compared and if their scores differ by more than 10, then the difference is reported.
For searches with MS-MS data, peptide matches are compared, and a score difference of more than 10 will be reported.
It is more than likely that these comparison rules will need to be changed for optimum results.
Remember that if the searches were performed with a much older version of Mascot Server, then the scores may have changed a little because minor changes have been made to Mascot to optimize the scoring. However, a score difference of greater than 10 is likely to indicate a new protein sequence in the database.
Example code is provided in various programming languages:
The sample program takes a single input file, repeats the search and compares results as above. To run the program on a whole directory of files, use 'find' under Unix, or a FOR
loop in a batch/cmd file under Windows. For example, to repeat all searches from the year 2002 under Unix:
# find ../data/2002???? -name \*.dat | xargs -n 1 repeat_search.pl
Remember that new results will go in 'todays' directory, so be sure not to include that directory with 'find' or the repeated searches will be repeated again and again and again...