Mascot in the Cloud
Cloud computing is the IT paradigm of our times. More and more business services and applications are delivered as a subscription service, where the software runs in a data centre somewhere, operated by a third-party cloud platform. We are often asked whether you can run Mascot Server “in the cloud” and if so, whether it’s the right solution. The answers are yes and, as ever in life, it depends.
Two types of cloud delivery
It’s useful to separate cloud services into two broad categories: Software as a Service (SaaS) and Infrastructure as a Service (IaaS). SaaS used to be just called web-based software: you pay a recurring subscription fee, the software runs on a third-party server and you access the functionality using a web browser. E-mail services are the classic example, and many common office applications like accounting packages and document processing are now delivered this way.
Infrastructure as a service is a different model. A cloud provider allocates a virtual machine (VM) for you, and you pay for the resource usage (CPU, memory, disk, network). You can install anything you like in the VM, although often it’s used as the backend for providing a software service. In fact, SaaS is often implemented on top of IaaS. In both cases, the VM runs somewhere in a data centre operated by the cloud provider.
Note that SaaS is not the same as a licence subscription. Some software product licences are available as a subscription, where you pay a monthly or annual fee for the licence, but the software is installed on your own PC. Software as a service includes running and maintaining the hardware and software as well as a subscription fee for the software licence.
Mascot subscription service?
SaaS is a good delivery model for a variety of web-based services that have relatively low resource requirements per user, little need for user-specific customisation, can benefit from economies of scale and have a large number of potential users. Conversely, SaaS is not a good choice for applications that use large or unpredictable amounts of CPU time or RAM per user or move multi-gigabyte files around. These lead to high underlying costs, so you never reach economies of scale: adding users will cost you more than the increase in subscription income.
We don’t provide Mascot Server or Mascot Distiller under a SaaS model, because proteomics data processing rarely fits the SaaS mould. Consider the typical Mascot workflow. A raw file arrives from the instrument; the file is peak picked and converted to MGF, then sent to Mascot Server for database searching; and the results are downloaded to a client application like Mascot Distiller, which runs quantitation and other post-processing.
Each step has huge variability. Both the raw file and the MGF file can be small or several gigabytes in size. Limiting the service to small files makes it less useful, but allowing large files makes network usage and storage per user unpredictable. A single experiment can have one or dozens of MS/MS runs. The database could be small or very large. The search duration is very hard to predict from the search parameters, unless you apply severe restrictions to control the size of the search space. Quantitation can take less or more time than the database search. Unless you’re using a basic workflow, you will need to customise the database, search parameters, quantitation method, and so on to match the experiment. Adding a bit of SaaS in the middle isn’t useful.
Mascot and Infrastructure as a Service
By contrast, infrastructure as a service is an alternative to buying and maintaining physical hardware. You simply click a button to provision a 32 vCPU virtual machine in the cloud when you need it. The obvious benefit is, there is no upfront capital payment for the hardware and it is available within minutes. You just pay for resource usage, then delete the VM when you’re done.
Mascot Server can be run on any IaaS platform that provides virtual machines with Intel or AMD processors. We’ve added a help page, Mascot Server in the cloud, that describes the software architecture and provisioning requirements. The virtual CPU, RAM and disk requirements are basically the same as in general hardware virtualisation, except a cloud platform typically gives less choice about CPU core mapping. The main thing to bear in mind is, Mascot Server is designed to provide a high-throughput database search service that is available 24/7. It’s not designed to be stopped and started for every database search.
If cost is not an issue, or your organisational policy requires moving everything into the cloud, then running Mascot Server in a cloud VM works just fine. In fact, performance and security are typically excellent, and cloud storage provides convenient data backup. If your workflow has any need for quantitation, it makes sense to run Mascot Server, Mascot Daemon and Mascot Distiller all in the same cloud for best inter-application network performance. Distiller and Daemon are normal applications that can be run in a VM without problems. However, since they are both GUI applications, you’ll need Remote Desktop access to the cloud VM, which is slightly less convenient and potentially less secure than running it on your own desktop.
An unlimited IT budget is rare, so there is almost always a question around cost. First of all, the Mascot licence is perpetual, and there is no additional fee to running Mascot in the cloud. There is also no fee to moving the licence from physical to virtualised hardware or vice versa. If you are buying a new licence, the cost is the same whether it is in-house or in the cloud.
The ongoing cost of cloud computing depends greatly on your usage patterns. As a rough rule of thumb, we have observed that the cost of running Mascot Server for one year on a cloud platform is in the same ballpark as buying a physical server with the same hardware specifications, assuming you run database searches continuously throughout the year. This applies to both 1-2 CPU licences and larger cluster installations. The reason is, you are essentially outsourcing hardware purchase, monitoring, maintenance, replacement and data centre security to a third party. You’re also paying for their electricity, network data transfer, storage and so on. Purely from a cost perspective, it makes sense to move Mascot to the cloud if the cost of your in-house overheads for physical hardware would exceed the cloud usage fee after the first year.
If you only need Mascot intermittently, you can stop and start the cloud VM to save some money. If your licence is for 4-CPU or more, you could configure Mascot in cluster mode with several search nodes, turn off nodes when they are not needed and keep the master node online for viewing search results. Mascot Server does not currently support automatic allocation of virtual resources based on current or predicted usage, also known as auto scaling. The reasons are the same as why SaaS is not a good delivery model: it is very hard to predict how long processing a “random” data set will take, so it is also very hard to predict how many more CPUs should be added to reach the desired average search duration or number of searches per minute/hour. There is also a start-up delay to adding more nodes, as sequence databases need to be propagated before a search can start.
We advise giving cloud computing a try to see what your actual monthly bill looks like. Provided you have a Mascot Server licence under warranty or support, we can give you a free, 1-CPU, 30-day licence for this purpose. We also provide a turn-key Amazon Machine Image (AMI) for provisioning Mascot Server on Amazon Web Services.
Keywords: cluster, licensing, pc hardware, virtual machine