Mascot 2.7 and later can search for intact crosslinked peptides as well as peptides with cleavable crosslinks. A typical example of intact crosslinking is disuccinimidyl suberate (DSS), which chemically bonds lysines in two different peptides. An MS-cleavable linker like disuccinimidyl sulfoxide (DSSO) also links lysines but mostly cleaves during CID.
Intact crosslinked peptide matches consist of the alpha peptide sequence, the beta peptide sequence and the linker. The intact link can occasionally cleave during MS/MS, which leaves behind a linker fragment called the monolink. Monolinks are modelled as ordinary variable modifications.
Conversely, cleavable crosslinks are always modelled as monolinks. The alpha and beta peptides hold different ends of the linker fragment, and there is one peptide-spectrum match for alpha and another for beta. However, if cleaving efficiency is less than 100%, some of the links may survive MS or MS/MS intact, so the search results may still contain intact crosslinked peptide matches. The rules are defined in the crosslinking method.
A linker is often chemically quenched during sample processing when one end reacts with a non-peptide molecule. The distinction between quenched linkers and linker fragments is made in the neutral loss elements of the linker definition. Mascot treats both types simply as monolinks.
By default, Parser opens crosslinked search results in a backwards-compatible mode, where intact crosslinked matches are invisible. Adding support in client code is more involved than just setting a constructor flag. Many of the methods either take a new argument in Parser 2.7, or they interpret existing arguments differently.
Term | Synonyms | Definition |
---|---|---|
linear peptide | a single peptide sequence without branches or cycles | |
(intact) crosslink | chemical crosslink; non-cleavable link | a link between two peptides that survives MS/MS (e.g. DSS/BS3, DST, ...) |
cleavable crosslink | a link between two peptides that cleaves during CID or is cleaved by other means before MS/MS | |
monolink | type 0 modification; deadend link | a linear peptide with a linker fragment whose other end is unattached |
looplinked peptide | type 1 modification; intrapeptide link; intramolecular link | a single peptide where two sites are linked, forming a loop |
crosslinked peptide | type 2 modification; interpeptide link; intermolecular link | two linear peptides joined by the intact crosslink |
alpha peptide, beta peptide | alpha chain, beta chain | the two linear peptides part of a crosslinked unit; alpha is the heavier or longer one |
homo-crosslinked peptide | type 2 link where alpha and beta peptides have the same sequence | |
hetero-crosslinked peptide | type 2 link where alpha and beta peptides have different sequences | |
protein intralink | a protein where two non-overlapping peptides have an intact crosslink | |
protein interlink | two proteins with an intact crosslink (either homo-crosslinked or hetero-crosslinked proteins) | |
crosslinking method | a set of parameters and settings that define which types of links to search |
Note that a looplinked peptide (type 1) is different from a homo-crosslinked peptide (type 2).
Note also that either or both of the alpha or beta peptide in a crosslinked pair may contain looplinks. For simplicity of terminology, a linear peptide denotes both a single peptide sequence (with or without looplinks) and the alpha or beta peptide in a crosslinked pair (with or without looplinks).
To determine whether a results file might contain matches with intact crosslinks, looplinks or monolinks, use ms_mascotresfilebase::getCrosslinkingMethod(). If the search has a crosslinking method, some of the linear peptide matches in the 'peptides' section may contain monolinks or looplinks. There is currently no easy check to determine whether there are any matches with monolinks or looplinks, other than iterating through all peptide matches. No special constructor flags are needed for accessing linear peptide data.
To determine whether there actually are crosslinked matches, use ms_mascotresfile_dat::anyCrosslinkedMatches(). If the test returns true, then the results file contains crosslinked matches and the file should be opened in integrated crosslink mode.
Three modes are available:
The helper function ms_mascotresfilebase::get_ms_mascotresults_params() does not set MSPEPSUM_CROSSLINK_INTEGRATED automatically. This means code written for Parser 2.6 and earlier always opens crosslinked search results in linear-only mode.
The following search types are not supported in combination with crosslinking:
These restrictions may be lifted in a future release.
To open the file in integrated mode, pass the flag MSPEPSUM_CROSSLINK_INTEGRATED to the matrix_science::ms_peptidesummary constructor.
The integrated crosslink mode is similar to the integrated error tolerant mode (see Integrated error tolerant search) and the integrated spectral library mode (see Opening the file in integrated mode). A query can contain a mixture of up to 20 linear and intact crosslinked peptide matches (see ms_mascotresults::getMaxRankValue()).
The identity and homology thresholds are determined by pooling match data from both linear and crosslinked peptides. For example, the total number of trials (qmatch) is the sum of qmatch from the linear summary section and qmatch from the crosslinked summary section.
Integrated mode is the preferred mode for opening crosslinked search results.
To open the file in crosslink-only mode, pass the flag MSPEPSUM_CROSSLINK_ONLY to the matrix_science::ms_peptidesummary constructor.
If the search contains only linear matches, then opening the file in this mode means there are no matches available at all. Using this mode is not recommended, because you will get misleading significance thresholds.
The crosslinking method defines the linkers used in the search as well as parameters like whether to search for protein intralinks or protein interlinks. See the Mascot help page for more detail.
Use ms_mascotresfilebase::getCrosslinkingMethod() or ms_peptidesummary::getCrosslinkingMethod() to access the crosslinking method object.
Mascot Server 3.0 introduces a new file format, Mascot Search Results (MSR). The MSR format is fully documented in the Mascot Server Installation & Setup manual, chapter 8. Match data for crosslinks and looplinks is stored in the table psm__linked_sites, and monolinks are stored in psm__monolinks.
Mascot Server 2.8 and earlier saved results in a MIME-format file with .dat extension, now called the dat28 format. The format is documented in chapter 8 of the Mascot Server Installation & Setup manual. The information below gives more detail about how crosslinked matches are stored in this format.
When Mascot parses the crosslinking method at the beginning of the search, it assigns a variable mod number to each linker specificity. For example, if the method defines linker specificities Xlink:DSS (Protein N-term) and Xlink:DSS (K), the first one could be varmod number 1 and the second varmod number 2. The linked sites are encoded in the variable mods string, and additionally in the linked_sites attribute.
Here is an example of a match with one intact link between alpha K4 (1:4:2) and beta K5 (2:5:2) and no variable modifications.
q9913_p1=-0.000397,11,66,16.87,1010001000200001201,0,0 q9913_p1_sequence_1=1,801.470831,NLGKVGSK,0000200000;"ALBU_HUMAN":0:453:460:1 q9913_p1_sequence_2=1,746.403463,ASSAKQR,000002000;"ALBU_HUMAN":0:215:221:1 q9913_p1_linked_sites=1:4:2,2:5:2
If a match has a monolink, the variable mod number of the linker is output in the variable mods string. The monolink string records the index of the corresponding neutral loss element in the linker definition. In the next example, the alpha peptide has a monolink at K8, and the intact link is between alpha K2 (1:2:2) and beta K5 (2:5:2). The monolink has neutral loss index 1, which (in this case) corresponds to monolink code [A]. NL index 0 (in this case) corresponds to the intact link [I]. The monolink string is only present if the peptide match has monolinks or looplinks.
q17957_p1=-0.000976,10,65,29.12,0010001011200001001,0,0 q17957_p1_sequence_1=2,1590.855163,LKCASLQKFGER,00200000200000;"ALBU_HUMAN":0:222:233:1 q17957_p1_sequence_2=1,746.403463,ASSAKQR,000002000;"ALBU_HUMAN":0:215:221:1 q17957_p1_linked_sites=1:2:2,2:5:2 q17957_p1_monolink_1=00000000100000 q17957_p1_monolink_2=000000000
If the match has a looplink, one end of the looplink is encoded exactly like a monolink. The other end can be inferred from the looplinked_sites attribute. Linear peptides may have monolinks and looplinks, as in the example below. The peptide has a looplink between K2 and K4. One end, K2, is encoded like a monolink, except its NL index corresponds to the intact link [I]. The peptide also has a normal monolink at K7.
q9022_p1=3,1599.934799,0.000313,8,HKPKATKEQLK,39,0020000200000,17.70,1000011012000002000,0,0;"ALBU_HUMAN":0:559:569:1 q9022_p1_monolink=0000000100000 q9022_p1_looplinked_sites=2:4
Finally, the most complicated case is a crosslinked peptide where the alpha and beta peptides have variable mods (like Oxidation (M)), monolinks and looplinks, and the variable mods additionally have fragment neutral losses. Matches this complicated are not commonly seen.
The ms_peptide class represents the peptide-spectrum match. The original API design assumes a linear peptide. Some interface changes have been made to allow fetching data specific to alpha and beta peptides without introducing a new class or a new set of methods.
Most ms_peptide methods take a new argument psmComponent
of type ms_peptide::PSM. This is one of PSM_COMPLETE
, PSM_CROSSLINK_ALPHA
or PSM_CROSSLINK_BETA
.
The rules for whether the parameter needs to be specified are:
psmComponent
at its default value (PSM_COMPLETE
) or omit the parameter. psmComponent
to PSM_CROSSLINK_ALPHA
or PSM_CROSSLINK_BETA
to access the alpha or beta data, and PSM_COMPLETE
to access match-level data. In either case, the match may contain monolinks encoded as variable modifications, as well as looplinks.
In an intact crosslinked match, PSM_COMPLETE
represents only the match-level data, such as match score. Where reasonable, methods may return data concatenated or summed from the alpha and beta peptides. For example, ms_peptide::getPeptideStr(true, ms_peptide::PSM_COMPLETE) returns a string where the alpha sequence is concatenated with the beta sequence. The two sequences are separated by "][", where the characters mark the alpha C-terminus and beta N-terminus, respectively. This ensures the concatenated variable mods string, primary NL string and related strings align correctly with the concatenated sequence string.
The table below summarises the return values when the match is an intact crosslinked peptide and the method depends on psmComponent
. Methods that always return match-level data, like ms_peptide::getDelta() do not take a new parameter.
Method | Return value with PSM_COMPLETE | Return value with PSM_CROSSLINK_ALPHA | Return value with PSM_CROSSLINK_BETA |
---|---|---|---|
ms_peptide::getAmbiguityString() | "" | alpha string | beta string |
ms_peptide::getAnyProteinTermination() | false, false | alpha values for isNterminus , isCterminus | beta value isNterminus , isCterminus |
ms_peptide::getComponentStr() | alpha component if alpha component == beta component, otherwise "" | alpha component | beta component |
ms_peptide::getLocalModsNlStr() | alpha string + beta string | alpha string | beta string |
ms_peptide::getLocalModsStr() | alpha string + beta string | alpha string | beta string |
ms_peptide::getLoopLinks() | alpha looplinks followed by beta looplinks | alpha looplinks | beta looplinks |
ms_peptide::getMissedCleavages() | -1 if alpha/beta value is -1, otherwise alpha value + beta value | alpha value | beta value |
ms_peptide::getMonoLinkStr() | alpha string + beta string | alpha string | beta string |
ms_peptide::getMrCalc() | alpha value + beta value + linker mass | alpha value | beta value |
ms_peptide::getPeptideLength() | alpha length + length("][") + beta length | alpha length | beta length |
ms_peptide::getPeptideStr() | alpha string + "][" + beta string | alpha string | beta string |
ms_peptide::getPrimaryNlStr() | alpha string + beta string | alpha string | beta string |
ms_peptide::getSummedModsNlStr() | alpha string + beta string | alpha string | beta string |
ms_peptide::getSummedModsStr() | alpha string + beta string | alpha string | beta string |
ms_peptide::getVarModsStr() | alpha string + beta string | alpha string | beta string |
The table below summarises the return values when the match is a linear peptide and the method depends on psmComponent
. The argument is ignored in all cases.
Method | Return value with PSM_COMPLETE , PSM_CROSSLINK_ALPHA , PSM_CROSSLINK_BETA |
---|---|
ms_peptide::getAmbiguityString() | ambiguity string |
ms_peptide::getAnyProteinTermination() | isNterminus , isCterminus |
ms_peptide::getComponentStr() | peptide component |
ms_peptide::getLocalModsNlStr() | local mods NL string |
ms_peptide::getLocalModsStr() | local mods string |
ms_peptide::getLoopLinks() | looplinks |
ms_peptide::getMissedCleavages() | missed cleavages |
ms_peptide::getMonoLinkStr() | monolink string |
ms_peptide::getMrCalc() | MrCalc |
ms_peptide::getPeptideLength() | peptide sequence length |
ms_peptide::getPeptideStr() | peptide sequence |
ms_peptide::getPrimaryNlStr() | primary NL string |
ms_peptide::getSummedModsNlStr() | summed mods NL string |
ms_peptide::getSummedModsStr() | summed mods string |
ms_peptide::getVarModsStr() | variable mods string |
For backwards compatibility, the delta returned by ms_searchparams::getVarModsDelta() is the mass of the intact link. If you open crosslinked search results in Parser 2.6 or earlier, which are unaware of the new attributes, it will appear as if monolinks and looplinks all have the same delta. This is correct for looplinks but not for monolinks.
Determining the type of the variable mod number is a sequential procedure.
Processing intact link and looplink data is easiest in a separate loop. For intact links:
Processing looplink data is the same, except it is typically easiest to do it separately for alpha and beta peptides:
Note that the monolink index zero could mean either lack of modification (if the variable mods string also has a zero) or the first monolink in the linker definition. It's important to check both the variable mods string and the monolink string.
The following C++ code illustrates processing the variable mods, intact links, looplinks and monolinks of a linear peptide match.
std::string varModsStr = peptide.getVarModsStr(); std::string monoLinkStr = peptide.getMonoLinkStr(); ms_linker_site_vector intactLinks = peptide.getIntactLinks(); ms_linker_site_vector loopLinks = peptide.getLoopLinks(ms_peptide::PSM_COMPLETE); for (int i = 0; i < varModsStr.length(); ++i) { int modNum = convertToInt(varModsStr[i]); int monoLinkNum = 0; // Monolink string could be empty -- check first. if (monoLinkStr.size() > 0) monoLinkNum = convertToInt(monoLinkStr[i]); if (modNum == 0) continue; // Is it an intact link? if (0 < intactLinks.getVarModIdxOfLinkedSite(ms_peptide::PSM_COMPLETE, i)) { // Yes, it's an intact link. Process in a separate loop as described above. continue; } // Is it a looplink in either PSM component? if (0 < loopLinks.getVarModIdxOfLinkedSite(ms_peptide::PSM_COMPLETE, i)) { // Yes, it's a looplink. Process in a separate loop as described above. continue; } // Is it a monolink? const ms_modification *monoLinkMod = resfile.getMonoLinkModification(modNum, monoLinkNum); if (monoLinkMod) { delta = monoLinkMod->getDelta(MASS_TYPE_MONO); continue; } // Not intact link, monolink or looplink, so it's a regular variable mod. delta = resfile.params().getVarModsDelta(modNum); }
Processing the alpha or beta peptide in a crosslinked match is very similar; simply replace ms_peptide::PSM_COMPLETE with ms_peptide::PSM_CROSSLINK_ALPHA or ms_peptide::PSM_CROSSLINK_BETA where relevant.
Protein inference and the ms_protein API are not affected by the presence of monolinks or looplinks, since these are just variable modifications.
However, there are a few details to consider with intact crosslinked peptides. A peptide match is assigned to a protein hit when the alpha sequence, the beta sequence or both appear in the protein. Methods that return sequence-level data, such as ms_protein::getPeptideStart() now take a new parameter psmComponent
. The change is analogous to ms_peptide methods for intact crosslinked peptide match. In fact, the easiest way to check whether psmComponent
is assigned to the protein is by looking at the return value of getPeptideStart()
: if it's -1, the psmComponent
is not assigned to the protein hit.
The definition of duplicate peptide has been extended:
No client code changes are needed; duplicate checking is done internally, as before. (For more details, see Peptide match duplicates.)
Protein inference is not affected by intact crosslinks. That is, the intact crosslink between two proteins is not enough to cluster them in the same protein family. Clustering requires sharing the same significant alpha or beta sequence.
If a protein has any crosslinked peptides, these are ignored when emPAI is calculated.
The table summarises the return values when at least one psmComponent
is assigned to the protein hit.
Method | Return value with PSM_COMPLETE | Return value with PSM_CROSSLINK_ALPHA | Return value with PSM_CROSSLINK_BETA |
---|---|---|---|
ms_protein::getPeptideStart() | -1 | alpha value (if alpha is in this protein) or -1 | beta value (if beta is in this protein) or -1 |
ms_protein::getPeptideEnd() | -1 | alpha value (if alpha is in this protein) or -1 | beta value (if beta is in this protein) or -1 |
ms_protein::getPeptideMultiplicity() | -1 | alpha value (if alpha is in this protein) or -1 | beta value (if beta is in this protein) or -1 |
ms_protein::getPeptideFrame() | -1 | alpha value (if alpha is in this protein) or -1 | beta value (if beta is in this protein) or -1 |
ms_protein::getPeptideResidueBefore() | '?' | alpha value (if alpha is in this protein) or '?' | beta value (if beta is in this protein) or '?' |
ms_protein::getPeptideResidueAfter() | '?' | alpha value (if alpha is in this protein) or '?' | beta value (if beta is in this protein) or '?' |
When the peptide match assigned to the protein hit is a linear peptide, the psmComponent
argument is ignored.
Method | Return value with PSM_COMPLETE , PSM_CROSSLINK_ALPHA , PSM_CROSSLINK_BETA |
---|---|
ms_protein::getPeptideStart() | start position |
ms_protein::getPeptideEnd() | end position |
ms_protein::getPeptideMultiplicity() | multiplicity |
ms_protein::getPeptideFrame() | frame or -1 |
ms_protein::getPeptideResidueBefore() | residue before |
ms_protein::getPeptideResidueAfter() | residue after |
ms_aahelper::calcFragmentsEx() and related methods can fragment a crosslinked peptide. Fragmentation produces single-cleavage product ions. First, the beta sequence is treated as a modification attached to the alpha peptide, and the alpha sequence is fragmented as usual. Then the roles are reversed. The final list of fragments contains single-cleavage ions from alpha and single-cleavage ions from beta. There are no double-cleavage ions where the alpha and beta fragment simultaneously.
The ms_fragment class has two new methods: isFromAlpha() and isFromBeta(). Fragmenting a linear peptide produces ms_fragment
objects where both methods return false. Fragmenting a crosslinked peptide produces fragments where one or the other flag is true. Because there are no double-cleavage ions, isFromAlpha()
and isFromBeta()
cannot both be true at the same time.
There is a helper method for linking two peptide objects: ms_aahelper::createCrosslinkedPeptide().
Fragmenting a peptide with looplinks the exactly the same as fragmenting a peptide without looplinks, apart from one exception. The looplink is assumed to be stronger than the peptide backbone. Thus, there are no fragments that start or end in the region spanned by a looplink. This includes regular series as well as internals.
There is a new method that returns the number of discovered intact crosslinks and looplinks: ms_peptidesummary::getNumDiscoveredIntactLinks()
Because monolinks share the same variable mod number, ms_peptidesummary::getNumDiscoveredVariableMods() has been extended. The method returns the modification names and deltas as well as counts, positions and sites.
ms_peptidesummary::getAllProteinsWithThisPepMatch() has been extended to return the psmComponent
assigned to the protein hit.
There are no changes to the parameters of ms_peptidesummary::findPeptides(). If you search for a peptide sequence with findPeptides()
, it will compare the input string to both the alpha sequence and beta sequence. If either one is a match, findPeptides()
adds it to the return vector.
Calling ms_peptidesummary::getReadableVarMods() without psmComponent
produces a human-readable string of variable modifications, monolinks, looplinks and intact links contained in the peptide match.