PC specification for Mascot Server
Introduction
Any recent, high specification PC containing either Intel or AMD Ryzen processor(s) should make a suitable platform for Mascot. Systems with more than two processor sockets usually carry a substantial price premium. If you plan to do high throughput work, and need to run Mascot on more than two processors, a cluster of single or dual processor boxes will usually offer the most cost effective solution.
If you don’t have time to read the whole of this document, then choose a system with a 6-core or 8-core processor with high ‘turbo’ clock speed (at least 4GHz), at least 32GB of RAM (preferably 64GB), a 64-bit operating system, and the largest SATA SSD disks available (at least 2TB). For a 2 CPU Mascot license, you will need a computer with at least 12 cores. It is beneficial to put the operating system, Mascot program files and database search results on a solid state NVMe disk. Sequence databases can be stored on a bulk SSD drive.
Our blog article How long should a search take? describes some issues relating to choosing how large a licence you need.
Hardware virtualisation is discussed in Hardware virtualisation.
Mascot Distiller hardware requirements are described in Choosing hardware for Mascot Distiller.
Processor (CPU)
The two main PC processor manufacturers are Intel and AMD. Matrix Science does not currently support Mascot on other processor architectures.
We have observed excellent scalability under Microsoft Windows and Linux for systems with modest numbers of cores per processor. That is, throughput from a system with an 8 core processor can be very close to double that obtained from a processor with 4 cores and the same architecture and clock speed. However, power management for processors with large numbers of cores means that the clock frequency is progressively reduced as the workload increases, which causes scaling to become non-linear.
Processor Speed
The main factors affecting Mascot performance are processor clock speed and number of cores.
It is not possible to compare processor speeds directly for different architectures. However, for any given processor model, the search speed will be proportional to processor speed unless:
- Disk access becomes a bottleneck, for example if the FASTA sequence database has to be read into memory from disk (see section on RAM below) or
- The processor cache is too small and causes a bottleneck between processor and memory.
The PassMark CPU benchmark is a pretty good guide as to the performance you can expect for Mascot Searches. The important benchmark for processor speed is single thread performance, because the cost of a Mascot Server licence depends on the number of cores used for searches.
Multiple Cores
Mascot supports up to 4 cores per licence. For example, to use all cores on a system with:
- 1 x quad core processor requires a 1 CPU Mascot licence
- 1 x eight core processor requires a 2 CPU Mascot licence
- 1 x 16-core processor requires a 4 CPU Mascot licence
- 2 x eight core processors requires a 4 CPU Mascot licence
However, it is always wise to have 1-4 spare cores for running reports, or downloading and preparing databases.
Versions of Mascot prior to Mascot 2.3 were licensed on a “per socket” basis. Mascot versions prior to 2.2 are unlikely to work on most modern hardware.
Hybrid Architectures
Recent Intel processor families for desktop and mobile have a hybrid core architecture, starting from the 12th generation Intel Core processors. For example, Intel Core i7 12700 has 8 ‘ P’ performance and 4 ‘E’ efficiency cores. Mascot can be run on both P and E cores, but the performance of the E cores is poor. If you have an Intel processor with hybrid architecture, we recommend configuring Mascot to run only on the P cores. Search for “ProcessorSet” in the Installation & Setup manual.
64-bit
Mascot requires 64-bit processor architecture. All modern Intel and AMD processors are 64-bit or, in Intel terms, have “Intel EM64T” technology. All recent versions of Windows and Linux are fully 64-bit.
Earlier versions of Mascot were available on both 32-bit and 64-bit systems. Support for 32-bit Linux was dropped in Mascot 2.5. Support for 32-bit Windows was dropped in Mascot 2.6.
Hyper-Threading or Simultaneous Multithreading technology
Hyper-Threading is available on Intel processors. Recent AMD processors (Zen microarchitecture) have similar technology called Simultaneous Multithreading. Hyper-Threading works by duplicating certain sections of the processor – those that store the architectural state – but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two “logical” processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously.
When HT is enabled, 2 logical “processors” per core will be visible to Mascot. So, for example, a single physical Intel Xeon® Gold 5210 with 14 cores and HT will appear to have 28 logical processors. A single physical AMD Ryzen 9 5900X with 12 cores and Simultaneous Multithreading will appear to have 24 logical processors.
Hyper-threading can give up to a 12% performance increase. It is not equivalent to a true multi-core processor.
Hyper-threading does not count towards the number of cores used for licensing purposes. For example, a 3-CPU licence with Intel Xeon® Gold 5210 with 14 cores uses 12 physical cores (24 logical processors).
Intel or AMD Processors
We have found that performance for searches under identical conditions are roughly proportional to the results from these benchmarks. The performance ratios published here are similar but slightly harder to understand.
Random Access Memory (RAM)
RAM requirements depend on the selection of databases you plan to search as well as the number of simultaneous users of the Mascot Server system. Memory usage also depends on which task is being run, described below.
RAM usage during database compression
Mascot Monitor makes a compressed copy of each FASTA database, in which the title lines have been removed and the sequence strings have been packed in a byte efficient manner. The compressed copy of each database is mapped into RAM and, if there is sufficient room, can even be locked into memory.
RAM usage during database compression is negligible for most databases, because only small parts of the database are kept in memory at any given time.
RAM usage during database compression (NCBIprot only)
The exception is NCBIprot. This database uses a taxonomy file, prot.av2taxid, that lists the NCBI taxonomy ID for each accession in the FASTA file. During database compression, Mascot imports prot.av2taxid into an efficient disk-based data structure, which is memory mapped for fast access. If the PC has sufficient RAM, the operating system will automatically cache frequently accessed portions of prot.av2taxid in memory. If there is insufficient RAM, prot.av2taxid lookups become disk bound, which greatly increases NCBIprot compression time.
If you intend to use NCBIprot, we recommend at least twice the amount of RAM as the size of the prot.av2taxid file plus any operating system overhead. Current (July 2024) size is 22GB, so we recommend at least 48GB of RAM for NCBIprot. The size of the FASTA file itself is not important for RAM usage during database compression.
RAM usage during database search
Memory usage during a database search will fluctuate. Maximum memory usage can be estimated from the number of search threads. For each thread, you should allow approximately 60 MB for the operating system (Windows) and at least 200 MB for temporary storage. Thus, a search with 8 threads may use up to 2GB of RAM.
The actual memory usage depends on a number of factors, such as how many variable modifications are selected. A search with just 1 variable modification on a 1-CPU licence will peak at 500-600MB. The same search on a 4-CPU licence will use 4x the amount of RAM.
Mascot Server allows up to 10 concurrent searches by default (this soft limit can be increased). If you have more than one user, you should multiply the above guideline with the anticipated maximum number of concurrent searches.
Memory mapping
When a search calls for a database that is not in memory, the search duration is increased by the time taken to read the database from disk. For a search that takes longer than a couple of minutes, this additional time will be negligible. For a short search, for example a PMF or an MS/MS search of a few spectra, reading from disk may take longer than the search itself.
Databases should always be memory mapped, even though a system might not have sufficient physical RAM to hold them all. Memory mapping only consumes virtual address space, and enables the file to be accessed more efficiently. However, it doesn’t guarantee that a particular database will be in memory when a search calls for it; some other process may have kicked it out. So, it may be advantageous to lock a small, frequently searched database into memory, guaranteeing that it is always resident in RAM.
Whether you have sufficient RAM to lock a database in memory can be estimated from the size of the FASTA file. For a protein database, the required RAM is roughly 80% of the FASTA file size, while for a nucleic acid database it is roughly 50%. Some examples are given in the following table.
Database | FASTA | RAM | Compression |
---|---|---|---|
Swiss-Prot | 272 MB | 245 MB | 1 : 0.9 |
NCBIprot | 339 GB | 271 GB | 1 : 0.8 |
Plants_EST | 18 GB | 8.9 GB | 1 : 0.5 |
We do not recommend that NCBIprot or Uniref 100 be locked into memory.
In practice, it is rarely a sensible for a database as large as an EST database to be locked in memory. Being composed of short stretches of nucleic acid sequence, it is not suitable for peptide mass fingerprint searches, and tends to be used as a database of last resort for large searches, where the overhead of reading it from disk represents only a small part of the total search time.
RAM usage during machine learning
Database search results can optionally be refined using machine learning. Refining takes place at the end of the database search. Mascot uses two third-party utilities for refining: MS2Rescore and Percolator.
Memory usage for MS2Rescore depends on the type of model being used. Predicting MS/MS fragmentation spectra using MS2PIP can use 500-1000MB per thread. Mascot allocates 4-32 threads for MS2Rescore depending on the number of logical cores on the system. Predicting retention times using DeepLC is multithreaded and may use up to 8-10GB of RAM. If your system has more than one Mascot user, multiply these figures by the number of concurrent users to get a worst-case scenario.
Memory usage for Percolator is negligible compared to the above factors.
RAM usage when viewing reports or exporting results
When you browse the results in a web browser or export to a file, relatively little memory is used compared to the above tasks.
Hard Disk Storage
The Mascot program files require little disk space in comparison to the sequence databases and the accumulating result files.
For the sequence databases, you will need to maintain free disk space of the order of 3 times the largest database. This is because, during a database update, there may be the current FASTA file, reference file and its associated compressed files plus the equivalent for the incoming database. Mascot also keeps a copy of one previous database. Current (July 2024) disk requirements for the common databases are:
Database | Total size of files (GB) | Max disk space (GB) |
---|---|---|
Swiss-Prot | 4.3 | 13 |
NCBIprot | 670 | 2010 |
Human_EST | 8.3 | 25 |
Mus_EST | 4.5 | 13 |
Plants_EST | 28 | 85 |
This illustrates how storage requirements are strongly dependent on which databases are required.
The space needed for result files depends on the overall search profile and on how long results are to remain on-line. Individual result file sizes range from 1MB for a peptide mass fingerprint search through to several GB for a large LC-MS/MS dataset.
Disk drives are very inexpensive, and most PCs support up to 2-3 NVMe drives and four SATA devices. It is difficult to have too much disk space, especially if you plan to search databases similar in size to NCBIprot.
If any databases are not memory mapped, short searches may be disk I/O bound, and a fast disk (e.g. Serial Attached SCSI, SAS) or a disk array (e.g. RAID), or solid state disk (NVMe, SSD) can then become an important factor in maximising throughput.
Operating System
Microsoft Windows
The following versions of Windows are supported:
Operating System | Max processor sockets | Max cores | Max RAM |
---|---|---|---|
Windows 8.1 (Pro) – 64 bit | 2 | 256 | 512 GB |
2012 R2 Server Standard – 64 bit | 64 | unlimited | 4 TB |
2012 R2 Server Datacenter – 64 bit | 64 | unlimited | 4 TB |
Windows 10 Pro – 64 bit | 2 | 256 | 2 TB |
Windows 10 Enterprise – 64 bit | 2 | 256 | 6 TB |
2016 Server Standard – 64 bit | 64 | unlimited | 24 TB |
2016 Server Datacenter – 64 bit | 64 | unlimited | 24 TB |
2019 Server Standard – 64 bit | 64 | unlimited | 24 TB |
2019 Server Datacenter – 64 bit | 64 | unlimited | 24 TB |
2022 Server Standard – 64 bit | 64 | unlimited | 256 TB |
2022 Server Datacenter – 64 bit | 64 | unlimited | 256 TB |
Windows 11 Pro – 64 bit | 2 | 256 | 2 TB |
Windows 11 Enterprise – 64 bit | 2 | 256 | 6 TB |
- Windows Server Core editions are not supported
- Windows Home Basic and Starter editions are not supported
- If you need to run Mascot on an earlier version of Windows, your licence allows running any earlier version of Mascot that supported the target Windows version
Linux
Mascot has very few system dependencies on Linux. The main requirement is glibc 2.17 or later, which is satisfied by most Linux distributions released since 2014. The table below lists the oldest supported versions for the major distributions.
Distribution | Minimum version | Recommended version |
---|---|---|
CentOS | CentOS 7 (glibc 2.17) | RedHat Enterprise Linux, AlmaLinux or Rocky Linux (CentOS has been discontinued) |
Debian | Debian 9 ‘stretch’ (glibc 2.24) | Any later version |
Ubuntu | Ubuntu 16.04 LTS (glibc 2.23) | Any later version |
Amazon | Amazon Linux 2 (glibc 2.26) | Any recent version |
Mascot is tested in-house on various versions of Debian, Ubuntu and CentOS.
Web Server Software
Mascot requires a web server for administration and interactive use. In the case of Windows, Microsoft Internet Information Server (IIS) is the obvious choice unless you are committed to some other package. IIS is bundled with all supported Windows Versions.
The Mascot installation program automatically configures IIS versions 8 and later.
Apache is a good choice for Linux. Apache can also be used under Windows.
Running a web browser on the same PC as the web server can take a surprising amount of processor time, so search times may suffer. If the same PC is also used for instrument control and data acquisition, you may need to adjust job priorities using Windows Task Manager to ensure that the instrument gets adequate priority.
Mascot Cluster Mode
A Mascot licence for 4 or more CPUs (i.e. 16 cores) automatically supports operation on a cluster of systems connected by a dedicated LAN. A cluster offers several advantages over a single, multi-processor system:
- Mass market, reliable, low cost PC hardware can be used.
- The cluster can be incrementally expanded as workload increases.
- Cluster nodes can use processors with modest number of cores, circumventing the clock frequency throttling in processors with high core count.
- The limited bandwidth of the PC bus is effectively multiplied by the number of systems in the cluster.
Further information here.
- Introduction
- Processor
- Processor Speed
- Multiple Cores
- Hybrid Architectures
- 64 Bit
- Hyper-Threading
- Intel / AMD
- Random Access Memory
- Database compression
- Database compression (NCBIprot)
- Database search
- Memory mapping
- Machine learning
- Viewing results
- Hard Disk Storage
- Operating System
- Microsoft Windows
- Linux
- Web Server Software
- Mascot Cluster Mode