#!/usr/local/bin/perl use strict; use msparser; sub get_params { my $resfile = msparser::ms_mascotresfilebase::createResfile($_[0]); return $resfile->params; # PROBLEM HERE } my $params = get_params($ARGV[0]); print $params->getNumberOfDatabases, "\n"; # CRASH HERE
import matrix_science.msparser.*; public class example { static { try { System.loadLibrary("msparserj"); } catch (UnsatisfiedLinkError e) { System.exit(0); } } private static get_params(String filename) { ms_mascotresfilebase resfile = ms_mascotresfilebase.createResfile(filename); return resfile.params(); // PROBLEM HERE } public static void main(String argv[]) { ms_searchparams params = get_params(argv[0]); System.gc(); // See below why these are needed System.runFinalization(); // to trigger the crash. System.out.println(params.getNumberOfDatabases()); // CRASH HERE } }
#!/usr/bin/python import msparser import sys def get_params(filename): resfile = msparser.ms_mascotresfilebase.createResfile(filename) return resfile.params() # PROBLEM HERE params = get_params(sys.argv[1]) print(params.getNumberOfDatabases()) # CRASH HERE
using System; using matrix_science.msparser; class GarbageCollectionExample { private static ms_peptidesummary loadPeptideSummary(string filename) { ms_mascotresfilebase resfile = ms_mascotresfilebase.createResfile(filename); ms_datfile datfile = new ms_datfile("../config/mascot.dat"); ms_mascotoptions opts = new ms_mascotoptions(); ms_mascotresults_params resParams = new ms_mascotresults_params(); resfile.get_ms_mascotresults_params(opts, resParams); return new ms_peptidesummary(resfile, resParams); // PROBLEM HERE } public static void Main(string[] argv) { ms_peptidesummary pepsum = loadPeptideSummary(argv[0]); for (int i = 1; i <= pepsum.getNumberOfHits(); i++) { ms_protein hit = pepsum.getHit(i); for (int e = 1; e <= hit.getNumPeptides(); e++) { int q = hit.getPeptideQuery(e), p = hit.getPeptideP(e); ms_peptide peptide = pepsum.getPeptide(q, p); Console.WriteLine(peptide.getPeptideStr()); // CRASH HERE } } } }
The programs crash because the resfile
object is deallocated from memory too early.
The resfile
object is lexically scoped only to the get_params()
function (or the loadPeptideSummary
function in the C# example). When get_params()
or loadPeptideSummary
ends, resfile
becomes unreachable and is ready for garbage collection. In Perl and Python, this happens at the end of the function; in Java and C#, this happens at an arbitrary point during execution, but you can force it at any time by calling System.gc(); System.runFinalization();
in Java (as we do in the example program to illustrate the problem).
By itself, this is not a problem at all. The problem becomes clear when you look at the declaration of matrix_science::ms_mascotresfilebase::params(). Mascot Parser uses SWIG (Simplified Wrapper and Interface Generator, http://swig.org/) to generate the mappings between C++ and the target language. The params()
method returns a C++ reference to an ms_searchparams
object, not a copy, which the SWIG layer helpfully wraps into a native class object, thus masking the real nature of the return value.
So the resfile
object is freed from memory at the end of scope, and all its internal data deallocated, which also means the ms_searchparams
object. But we still have an object pointing to the internal ms_searchparams
object! It now points to some arbitrary chunk of memory, which most certainly is not executable code. If you then try to call its methods, you will crash the program with a segmentation fault.
Why is this not handled in SWIG? Because the SWIG layer cannot detect that the ms_searchparams
object has an internal reference to the parent ms_mascotresfilebase
object. This is hidden inside the C++ implementation. A general solution to this problem would need to track all pointer and reference assignments in the C++ code, and then increase and decrease the reference count of each wrapped object in the runtime environment of whichever programming language you are using. It is very difficult to implement correctly.
Note that the problem in Java and C# is even more subtle than in Perl and Python. If you remove the System.gc()
and System.runFinalization()
lines from the example program, the program may run just fine for hundreds of times, and then one time, if the system is running out of memory for some reason, or any other similar issue, the JVM or .NET CLR garbage collector might be run just before params.getNumberOfDatabases()
– and bang! Your program, which has worked hundreds of times before, crashes for no reason. This is obviously even harder to detect and debug in a large program.
Luckily there is a safe fix: follow Two rules of thumb when writing programs using Mascot Parser.