Analyzing the disulfide bond structure of the SARS-Cov-2 Spike protein
The SARS-Cov-2 virus has been under intense investigation since its discovery two years ago. Almost every feasible analytical method has been used and according to PubMed, that includes over 3000 proteomics papers. The most important protein immunologically speaking is the spike protein that is involved in the initial binding to the target cells. An example of the more encompassing spike protein analyses is “Virus-Receptor Interactions of Glycosylated SARSCoV-2 Spike and Human ACE2 Receptor“. It covers the analysis of the protein sequence, glycosylations and disulfide bonds that are then used for 3D modeling and interaction studies. The data is publicly available in PRIDE project PXD019939. In this blog post we used Mascot Server to reanalyse the disulfide bonds data from the publication using the crosslinking feature introduced in Mascot Server 2.7.
The spike protein used in the analysis was purified from a prototype DNA vaccine that was designed to be a mimetic immunogen that can be used to stimulate an immune responses (ref). The purified protein was not reduced but was alkylated, then submitted to enzymatic digestion, either with a multi-enzyme combination of trypsin, Lys-C and Glu-C (EKRZ) or with α-Lytic Protease (ASTV). On Mascot Server, we created enzyme definitions for both with appropriate cleavage rules. The publication also covers an in-depth study of the glycosylation structures on the spike protein but after digestion, the samples were deglycosylated using PNGaseF in order not to interfere with the disulfide bonds analysis.
The resulting peptides were analyzed with a Thermo Orbitrap Fusion Lumos using ETD fragmentation. We searched the two data sets against the SARS-CoV-2, contaminants and Uniprot Human databases. The databases are available as predefined definitions in Mascot. We processed the raw data with Mascot Distiller using the default Thermo processing options. As many of the crosslinked peptides have higher precursor masses and charge states than typical peptides, we changed Distiller peak list preferences to deconvolute the fragment ions, producing singly charged MH+ values. This allows for correct identification of 3+ fragment ions if any are present.
In the publication, the authors used the combination of sequence coverage and crosslinking to determine the exact residue where the signal peptide was cleaved. This was found to be at the glutamine in position 13. We made a custom database with this spike protein sequence minus the signal peptide to make sure the truncated N-terminal would be available under the enzymatic cleavage conditions used. For the crosslinking method, we copied the default “Disulfide bridge in Lysozyme” method as the starting point, then edited the accession numbers to use the spike protein sequence “SPIKE_SARS” and its edited version minus the signal peptide.
The complete search conditions were as listed:
Crosslinking : Disulfide bridge in SARS2 Spike glycoprotein Enzyme : Trypsin+Lys-C+Glu-C Variable modifications : Carbamidomethyl (C), Oxidation (M), Gln->pyro-Glu (N-term Q), Deamidated (NQ) Linkers : Xlink:Disulfide (C) Peptide mass tolerance : ± 10 ppm Fragment mass tolerance: ± 0.1 Da Max missed cleavages : 2 Instrument type : ETD-TRAP
Searches of both enzymatic preparations identified far more disulfide bonds than expected from reading the paper. Here we have exported the results into the xiVIEW csv format along with the protein hits in the FASTA format and the MGF file.
We then created a new data set in xiVIEW and uploaded the MGF, CSV and FASTA files in order to visualize the results. The α-Lytic Protease digest was processed in the same way.
The authors only report a disulfide bond between Cys15 with Cys136 in the paper, as that was important in proving that the signal peptide cleavage happened before Cys15, but we can see from the results uploaded to PRIDE that other possible disulfide bonds were identified. We compared the results from Mascot Server to the known disulfide bonds for the spike protein as recorded in UniProt. Mascot Server identified 28 disulfide bonds of which 8 were previously known from UniProt. As the material analyzed was a mimetic immunogen rather than material extracted and purified from a SARS-Cov-2 virus preparation, it is not clear if the 20 additional disulfide bonds can occur naturally. However, the concept of using a mimetic immunogen implies that they should also be found in nature.
Let’s take a look at some of these high scoring matches that are not listed in UniProt. We can see there is plenty of sequence coverage in both directions of fragmentation and both sides of the disulfide bond. The match to LPDDFTGCVIAWNSNNLDSK C8<–Xlink:Disulfide–>C1 CTLK, amino acid positions 301-432 is a reasonable example, with Mascot score 78 and expect value 5.8e-8:
We have previously published some guidelines for validating intact crosslinked peptide matches, and we used those to evaluate the matches here. The above match was identified with charge states of 2+ and 3+. In both cases, the rank 2 matches scored significantly lower than the rank 1 matches and did not have significant scores themselves. There is no evidence to suggest the match is not correct.
And here is another example, YNENGTITDAVDCALDPLSETK.C C13<—Xlink:Disulfide—>C1 K.CYGVSPTK.L + Deamidated, amino acid positions 291-379 with Mascot score 72 and expect value 1.7e-7:
In this case the match is present in only one charge state, 4+. The rank 2 match is quite close in score, only 10 less, while the rank 3 match had a very low score of 2. The disulfide bond localization is the same for the rank 1 and 2 matches but the deamination location changes, slightly favoring N4 over N2. Again, there is no evidence to suggest that the match is anything but correct. It will be interesting to see how the knowledge database about the SARS-CoV-2 Spike protein expands over time and whether other analyses report the same disulphide bonds.
Keywords: crosslinking, SARS-Cov-2