Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by Richard Jacob (November 14, 2016)

Benchmarking your Mascot Server

In an earlier blog post we talked about choosing the right hardware for a Mascot Server and mentioned that the major performance bottle neck is the CPU performance. We also looked at the PassMark PerformanceTest CPUBenchmark as a way of predicting the performance of a Mascot Server. In this article we will look at how to measure Mascot Server performance and compare one server to another.

The key to any comparison is running a data set that gives the system a sufficient challenge and keeping all the search parameters and other variables consistent between searches. We recommend using a reasonably large data set that searches the complete database to make sure the search performance is limited by the CPU and not disk access or other factors. You can use any search for a benchmark test but we provide a data set here that has been tested and can be used as a community standard.

The test data set

A test data set was selected from publicly available files at ProteomeXchange. The sample we have chosen contains proteins from many different species and hence requires searching against the full SwissProt database. We do not use exactly the same search parameters or database as the researchers who provided the data as our aim is not to fully reproduce the results but rather to obtain something reasonable when using the default SwissProt database. We used three fractions from one sample, 130729-27672-12-ATH, from the proteomics portion of a "MuSt multiomics" project available on ProteomeXchange, project PXD003791.

Run the test searches

To prevent human error when entering the search parameters, we advise running the test searches by re-searching an existing search result. Consistency between with the way you test the different systems is important. You should run the search 4 times. There is often an improvement in the search time after the first search because Mascot Server will have cached the database in RAM. The next three searches should have a search duration within a few seconds of each other.

To perform the test, download the results file, benchmark_PXD003791.zip, unzip, and place it in the Mascot data directory. Default locations are c:\inetpub\mascot\data for windows and /usr/local/mascot/data for Linux. You can open the search result with the following URL http://localhost/mascot/cgi/master_results_2.pl?file=../data/benchmark_PXD003791.dat. If you are not accessing the search result from the Mascot Server computer, replace localhost with the name of the computer. Once the search result is open you just need to click the Re-search button to open a search form. Then without making any changes to the search parameters, click Submit to start the search. Do this three more times starting with the search results from the last test. Make sure the Mascot Server is not used for any other search during this period.

You can check that Mascot Server is running correctly by viewing the Windows Task Manger or running the Linux top command. During the search, does Mascot max out the number of cores covered by the license? For example, a 2 CPU license running on a CPU with 10 physical cores will max out 8 cores or 80% of the CPU power. You may see the CPU usage fluctuate every 10 or so seconds as it processes each chunk of data. If the Mascot Server application (nph-mascot.exe) is using nearly 100% of the CPU, or the portion of the CPU is covered by the license, the search times should be an accurate representation of the Mascot Server performance.

Analyzing the results

Once the searches have been run open the Mascot Server search log and look at the values in the Dur(ation) column which is measured in seconds. The search times themselves give an indication of the Mascot Server performance. Because of the variations in the SwissProt database size you can calculate the search times as number of protein entries per second per core by dividing the number of sequence entries in the database by the search time and then by the number of cores used for the search. The number of protein entries in a database is reported at the top of the search report As a benchmark a computer using an Intel i7-4790K Processor (4 cores) and running a 1 CPU Mascot Server license takes on average 1030 seconds to process the benchmark, which is about 134 protein entries per second per core.

Keywords:

Comments are closed.