On this page:
All Mascot Parser classes are defined in the matrix_science
namespace. When you use Mascot Parser from Perl, Java or Python, these objects are available in a package or class.
All classes are available in the msparser
package. For example, the class matrix_science::ms_mascotresfilebase is accessible as msparser::ms_mascotresfilebase
.
All classes are available in the matrix_science.msparser
package. For example, the class matrix_science::ms_mascotresfilebase is accessible as matrix_science.ms_parser.ms_mascotresfilebase
.
All classes are available in the msparser
class. For example, the class matrix_science::ms_mascotresfilebase is accessible as msparser.ms_mascotresfilebase
.
matrix_science.msparser
namespace. For example, the class matrix_science::ms_mascotresfilebase is accessible as matrix_science.msparser.ms_mascotresfilebase
. In the documentation for ms_mascotresfilebase, the following 'enum' definition is used:
enum FLAGS { RESFILE_NOFLAG, RESFILE_USE_CACHE,
These values are available in the following way in the class they are defined in:
$msparser::ms_mascotresfilebase::RESFILE_NOFLAG
ms_mascotresfilebase.RESFILE_NOFLAG
msparser.ms_mascotresfilebase.RESFILE_NOFLAG
In C#, enum values are wrapped as proper C# enums. The enum values are the available in the following way in the class and enumeration they are defined in:
ms_mascotresfilebase.FLAGS.RESFILE_NOFLAG
To use enumeration values as function parameters in C#, you will need to cast the enum value to the required parameter type, usually an int
or uint
. For example, to use the ms_mascotresfilebase.willCreateCache
method, the flags
parameter needs to be cast to a uint
:
string cachefile; bool will_create_cache = ms_mascotresfilebase.willCreateCache( "F981123.dat", (uint) matrix_science.msparser.ms_mascotresfilebase.FLAGS.RESFILE_USE_CACHE, new matrix_science.msparser.ms_mascotoptions().getCacheDirectory(), out cachefile );
Some values are converted automatically between C++ and the calling code. If a function returns, say, std::string
, this is equivalent to a native string object.
In short:
C++ | Perl | Java | Python | C# |
---|---|---|---|---|
std::string const std::string & char * const char * | ordinary Perl string | String | non-Unicode string object | string |
int long float double | ordinary number | int long float double | ordinary number object | int long float double |
bool | ordinary number (0 means false, any other number means true) | boolean | Bool | bool |
std::string & | usually a return value; see Multiple return values in Perl, Java, Python and C# | StringBuffer | usually a return value; see Multiple return values in Perl, Java, Python and C# | out string |
int & | int[] | out int | ||
unsigned int & | long[] | out uint |
Some keywords have a different interpretation:
C++ | Perl | Java | Python | C# |
---|---|---|---|---|
const | can be ignored | roughly equivalent to final but see special case const std::string & | can be ignored | roughly equivalent to C# const but see special case const std::string & |
inline virtual | can be ignored |
In Perl, Java, Python and C# all objects are references. The same is not true in C++, where a variable can point to the object in three different ways. All three ways are collated transparently into references, which means you do not need to worry about it. Objects can be passed to C++ functions as they are.
You can recognise the three different ways easily. The three following lines all declare a function that takes an object as a parameter:
void func1(ms_mascotresfile_msr resfile); void func2(ms_mascotresfile_msr &resfile); void func3(ms_mascotresfile_msr *resfile);
These three functions would all take an object reference in Perl, Java and Python:
my $resfile = msparser::ms_mascotresfile_msr->new(...); func1($resfile); func2($resfile); func3($resfile);
ms_mascotresfile_msr resfile = new ms_mascotresfile_msr(...); func1(resfile); func2(resfile); func3(resfile);
resfile = msparser.ms_mascotresfile_msr(...) func1(resfile) func2(resfile) func3(resfile)
ms_mascotresfile_msr resfile = new ms_mascotresfile_msr(...); func1(resfile); func2(resfile); func3(resfile);
For example, take a function such as ms_fileutilities::getLastModificationTime():
static time_t getLastModificationTime(const char *filename, ms_errs *err=NULL);
The function expects an ms_errs object as a parameter. You can simply pass an object reference:
my $errs = msparser::ms_errs->new(); my $time = msparser::ms_fileutilities::getLastModificationTime($filename, $errs); if (!$errs->isValid()) { ... }
ms_errs errs = new ms_errs(); int time = ms_fileutilities.getLastModificationTime(filename, errs); if (!errs.isValid()) { ... }
errs = msparser.ms_errs() time = msparser.ms_fileutilities.getLastModificationTime(filename, errs) if not errs.isValid() : ...
ms_errs errs = new ms_errs(); long time = ms_fileutilities.getLastModificationTime(filename, errs); if (!errs.isValid()) { ... }
Some functions take or return arrays of values. In C++, these are called vectors, and each vector can only store values of one particular type. std::vector
is the "Standard Template Library" (STL) class for vectors. std::vector<int>
can only store integer values, while std::vector<std::string>
can only store string values. An example function that both returns a vector and takes vectors as arguments is getAllProteinsWithThisPepMatch(). When calling such functions, you must create a vector object of the correct type:
my $vectorOfInts = new msparser::vectori; # For std::vector<int> my $vectorOfLongs = new msparser::vectorl; # For std::vector<long> my $vectorOfDoubles = new msparser::vectord; # For std::vector<double> my $vectorOfBools = new msparser::vectorb; # For std::vector<bool> my $vectorOfStrings = new msparser::VectorString; # For std::vector<std::string>
vectori vectorOfInts = new vectori(); # For std::vector<int> vectorl vectorOfLongs = new vectorl(); # For std::vector<long> vectord vectorOfDoubles = new vectord(); # For std::vector<double> vectorb vectorOfBools = new vectorb(); # For std::vector<bool> VectorString vectorOfStrings = new VectorString(); # For std::vector<std::string>
vectorOfInts = msparser.vectori() # For std::vector<int> vectorOfLongs = msparser.vectorl() # For std::vector<long> vectorOfDoubles = msparser.vectord() # For std::vector<double> vectorOfBools = msparser.vectorb() # For std::vector<bool> vectorOfStrings = msparser.VectorString() # For std::vector<std::string>
The different vector classes share a common interface, as shown next.
Vectors are similar to Perl arrays, but with two differences: appending and modifying items is more restricted, and vectors are strictly typed. Strict typing means that if you try to append an integer to VectorString
, for example, you will get a runtime exception. See Catching C++ exceptions in Perl, Java, Python and C#.
# Create a vector of size 0. (In this example, we'll use integer vectors.) my $vector = new msparser::vectori; # Create a vector of any size, say 20. Items will be initialised to undef. my $vector2 = new msparser::vectori(20); # Append a value at the end of the vector. $vector->push(100); $vector->push(200); # Get the number of items in the vector. Vector indices run from 0 # to $size - 1. my $size = $vector->size(); # Get item at any index. If the index is out of range (negative or # greater than $size - 1), an exception will be thrown. my $val = $vector->get(1); # Set the value of item at a given index. After this call, $vector # is [200, 200]. $vector->set(0, 200); # Remove the last item of the vector and return its value. If # $vector->size() is zero, this will throw an exception. $val = $vector->pop(); # Predicate testing whether the vector is empty (equivalent to # testing $vector->size == 0). if ($vector->empty) { ... } # Iterate over all items in the vector. for my $i (0 .. $vector->size-1) { print $vector->get($i), "\n"; } # Clear the vector, i.e. remove all of its elements. $vector->clear();
STL vectors are similar to Java Vectors; std::vector<int>
is analogous to Vector<int>
. However, there is no compile-time type checking, which means that if you try to append values to the vector that are not of the correct type, a runtime exception will be thrown. See Catching C++ exceptions in Perl, Java, Python and C#.
// Create a vector of size 0. (In this example, we'll use integer vectors.) vectori vector = new vectori(); // Create a vector of any size, say 20. Items will be initialised to null. vectori vector2 = new vectori(20); // Append a value at the end of the vector. vector.push(100); vector.push(200); // Get the number of items in the vector. Vector indices run from 0 // to size - 1. int size = vector.size(); // Get item at any index. If the index is out of range (negative or // greater than size - 1), an exception will be thrown. int val = vector.get(1); // Set the value of item at a given index. After this call, 'vector' // contains values [200, 200]. vector.set(0, 200); // Remove the last item of the vector and return its value. If // vector.size() is zero, this will throw an exception. int val = vector.pop(); // Predicate testing whether the vector is empty (equivalent to // testing vector.size() == 0). if (vector.empty()) { ... } // Iterate over all items in the vector. for (int i = 0; i != vector.size(); i++) System.out.println(vector.get(i)); // Clear the vector, i.e. remove all of its elements. vector.clear();
Vectors are similar to Python arrays, but with two differences: appending and modifying items is more restricted, and vectors are strictly typed. Strict typing means that if you try to append an integer to VectorString
, you will get a runtime exception. See Catching C++ exceptions in Perl, Java, Python and C#.
# Create a vector of size 0. (In this example, we'll use integer vectors.) vector = msparser.vectori() # Create a vector of any size, say 20. Items will be initialised to None. vector2 = msparser.vectori(20) # Append a value at the end of the vector. If the value is not of # the correct type (integer for vectori, string for VectorString, etc.), # an exception will be thrown. See below how to catch it. vector.append(100) vector.append(200) # Get the number of items in the vector. Vector indices run from 0 # to size - 1. size = len(vector) # Get item at any index. If the index is out of range (negative or # greater than size - 1), an exception will be thrown. val = vector[1] # Set the value of item at a given index. After this call, vector # is [200, 200]. If the index is out of range, an exception will be # thrown. vector[0] = 200 # Remove the last item of the vector and return its value. If # vector.size() is zero, this will throw an exception. val = vector.pop_back() # Predicate testing whether the vector is empty (equivalent to # testing len(vector) == 0). if vector.empty() ... # Iterate over all items in the vector. for item in vector: print(item) # Clear the vector, i.e. remove all of its elements. vector.clear()
STL vectors are similar to C# Lists; std::vector<int>
is analogous to List<int>
. However, there is no compile-time type checking, which means that if you try to append values to the vector that are not of the correct type, a runtime exception will be thrown. See Catching C++ exceptions in Perl, Java, Python and C#.
// Create a vector of size 0. (In this example, we'll use integer vectors.) vectori vector = new vectori(); // Create a vector of any size, say 20. Items will be initialised to null. vectori vector2 = new vectori(20); // Append a value at the end of the vector. vector.Add(100); vector.Add(200); // Get the number of items in the vector. Vector indices run from 0 // to size - 1. int size = vector.Count; // Get item at any index. If the index is out of range (negative or // greater than size - 1), an exception will be thrown. int val = vector[1]; // Set the value of item at a given index. After this call, 'vector' // contains values [200, 200]. vector[0] = 200; // Remove the last item of the vector and return its value. If // vector.size() is zero, this will throw an exception. val = vector[size-1]; vector.RemoveAt(size-1); // Predicate testing whether the vector is empty (equivalent to // testing vector.size() == 0). if (vector.Count == 0) { } // Iterate over all items in the vector. for (int i = 0; i != vector.Count; i++) Console.WriteLine(vector[i]); // Clear the vector, i.e. remove all of its elements. vector.Clear();
Here is an example how to call a method that takes vectors of multiple types as arguments.
my $start = new msparser::vectori; my $end = new msparser::vectori; my $pre = new msparser::VectorString; my $post = new msparser::VectorString; my $frame = new msparser::vectori; my $multiplicity = new msparser::vectori; my $db = new msparser::vectori; my $accessions = $pepsum->getAllProteinsWithThisPepMatch( 1, 1, $start, $end, $pre, $post, $frame, $multiplicity, $db );
In the API documentation, the method getAllProteinsWithThisPepMatch() returns a VectorString
. However, any vector returned from a function is automatically converted into an array reference in Perl.
You can access the elements of each vector by using the get()
method, and calling size()
returns the number of elements in the vector. For instance:
for my $i (0 .. $multiplicity->size()-1) { print $multiplicity->get($i), "\n"; } # This assumes $db->size() == scalar(@$accessions). for my $i (0 .. $#$accessions) { print $db->get($i), '::', $$accessions[$i], "\n"; }
It is easy to convert STL vectors into Perl arrays:
sub stl_to_array { [ map { $_[0]->get($_) } 0 .. $_[0]->size-1 ] } # Usage (assuming $db is a vectori object as above): my $db_arr = stl_to_array($db); # And now the loop is more symmetrical: for my $i (0 .. $#$accessions) { print $$db_arr[$i], '::', $$accessions[$i], "\n"; }
vectori start = new vectori(); vectori end = new vectori(); VectorString pre = new VectorString(); VectorString post = new VectorString(); vectori frame = new vectori(); vectori multiplicity = new vectori(); vectori db = new vectori(); VectorString accessions = pepsum.getAllProteinsWithThisPepMatch( 1, 1, start, end, pre, post, frame, multiplicity, db );
You can access the elements of each vector by using the get()
method, and calling size()
returns the number of elements in the vector. For instance:
for (int i = 0; i != multiplicity.size(); i++) System.out.println(multiplicity.get(i)); // This assumes db.size() == accessions.size(). for (int i = 0; i != accessions.size(); i++) System.out.println(new String(db.get(i)) + "::" + accessions.get(i));
start = msparser.vectori() end = msparser.vectori() pre = msparser.VectorString() post = msparser.VectorString() frame = msparser.vectori() multiplicity = msparser.vectori() db = msparser.vectori() accessions = pepsum.getAllProteinsWithThisPepMatch( 1, 1, start, end, pre, post, frame, multiplicity, db )
You can then iterate over the items in each vector:
for m in multiplicity: print(m) # This assumes db.size() == accessions.size(). for i in range(accessions.size()): print("%d :: %s" % (db[i], accessions[i]))
vectori start = new vectori(); vectori end = new vectori(); VectorString pre = new VectorString(); VectorString post = new VectorString(); vectori frame = new vectori(); vectori multiplicity = new vectori(); vectori db = new vectori(); VectorString accessions = pepsum.getAllProteinsWithThisPepMatch( 1, 1, start, end, pre, post, frame, multiplicity, db );
You can then iterate over the items in each vector, and accessing the Count
parameter returns the number of elements in the vector. For instance:
for (int i = 0; i != multiplicity.Count; i++) { Console.WriteLine(multiplicity[i]); } // This assumes db.Count == accessions.Count. for (int i = 0; i != accessions.Count; i++) { Console.WriteLine("{0}::{1}", db[i], accessions[i]); }
Exceptions from within Parser can be caught using exception handling mechanisms in the native language, as shown below. Exceptions will only be thrown by STL classes; Mascot Parser handles errors internally (see Error Handling).
my $vectori = new msparser::vectori; # This will always throw an exception for an empty vector. eval { $vectori->pop() }; if ($@) { print $@; }
vectori vector = new msparser.vectori(); // This will always throw an exception for an empty vector. try { vector.pop(); } catch (Exception e) { System.out.println(e); }
vector = msparser.vectori() # This will always throw an exception for an empty vector. try: vector.pop_back() except Exception as e: print(e)
vectori vector = new vectori(); try { // this will always throw an exception for an empty vector. vector.RemoveAt(0); } catch (Exception e) { Console.Error.WriteLine(e); }
In C++, functions take a fixed number of arguments. However, some of these arguments can have default values. For example, the getDB() function has the following declaration:
std::string getDB(int idx = 1)
Whenever you see an equals sign (=) in the function documentation, the parameter next to the sign has a default value. This means that you can leave out the parameter when calling the constructor or method:
Assume $params
is of type ms_searchparams
.
my $db = $params->getDB(); # Equivalent to $params->getDB(1) my $db2 = $params->getDB(2); # Override default parameter # This will always print 'yes'. if ($params->getDB eq $params->getDB(1)) { print "yes\n"; }
Assume params
is of class ms_searchparams
.
int db = params.getDB(); // Equivalent to params.getDB(1) int db2 = params.getDB(2); // Override default parameter // This will always print 'yes'. if (params.getDB().equals(params.getDB(1))) { System.out.println("yes"); }
Assume params
is of type ms_searchparams
.
db = params.getDB() # Equivalent to params.getDB(1) db2 = params.getDB(2) # Override default parameter # This will always print 'yes'. if params.getDB() == params.getDB(1): print("yes")
Assume _params
is of class ms_searchparams
.
int db = _params.getDB(); // Equivalent to _params.getDB(1) int db2 = _params.getDB(2); // Override default parameter // This will always print 'yes'. if (_params.getDB().equals(_params.getDB(1))) { Console.WriteLine("yes"); }
Some functions in Mascot Parser are defined at class level rather than object level. In C++, C# and Java, these are called static functions, while in Perl and Python they are called class methods. Static functions can be used directly without creating an object of the class, as the following example shows:
my ($will_create_cache, $cachefile) = msparser::ms_mascotresfilebase::willCreateCache( "F981123.dat", $msparser::ms_mascotresfilebase::RESFILE_USE_CACHE, msparser::ms_mascotoptions->new->getCacheDirectory(), ); if ($will_create_cache) { print "ms_mascotresfilebase will use $cachefile as the cache file.\n"; } else { print "ms_mascotresfilebase will not use the cache.\n"; }
String[] cachefile; boolean will_create_cache = ms_mascotresfilebase.willCreateCache( "F981123.dat", msparser.ms_mascotresfilebase.RESFILE_USE_CACHE, new msparser.ms_mascotoptions().getCacheDirectory(), cachefile ); if (will_create_cache) { System.out.println( "ms_mascotresfilebase will use " + cachefile[0] + " as the cache file." ); } else { System.out.println( "ms_mascotresfilebase will not use the cache." ); }
will_create_cache, cachefile = msparser.ms_mascotresfilebase.willCreateCache( "F981123.dat", msparser.ms_mascotresfilebase.RESFILE_USE_CACHE, msparser.ms_mascotoptions().getCacheDirectory(), ) if will_create_cache : print("ms_mascotresfilebase will use %s as the cache file." % cachefile) else : print("ms_mascotresfilebase will not use the cache.")
string cachefile; bool will_create_cache = ms_mascotresfilebase.willCreateCache( "F981123.dat", (uint) matrix_science.msparser.ms_mascotresfilebase.FLAGS.RESFILE_USE_CACHE, new matrix_science.msparser.ms_mascotoptions().getCacheDirectory(), out cachefile ); if (will_create_cache) { Console.WriteLine( "ms_mascotresfilebase will use {0} as the cache file.", cachefile); } else { Console.WriteLine( "ms_mascotresfilebase will not use the cache." ); }
Some functions in Mascot Parser are used to initialise an object. These are not constructors, but rather take an object reference as an argument, whose member fields are then filled in.
In C++, these "object initialising functions" take a pointer or reference argument. In Perl, Java, C# and Python, you can simply pass an object reference. For example, ms_mascotresfilebase::getQuantitation() (which is used in the example below) takes an ms_quant_configfile object, while staticGetPercolatorFileNames() takes two vectors as arguments and fills them in.
my $resfile = msparser::ms_mascotresfilebase::createResfile($filename); my $qf = msparser::ms_quant_configfile->new(); $qf->setSchemaFileName( "http://www.matrixscience.com/xmlns/schema/quantitation_2" . " ../html/xmlns/schema/quantitation_2/quantitation_2.xsd" . " http://www.matrixscience.com/xmlns/schema/quantitation_1" . " ../html/xmlns/schema/quantitation_1/quantitation_1.xsd" ); if ($resfile->getQuantitation($qf)) { print "Quantitation name: ", $qf->getMethodByNumber(0)->getName(), "\n"; } else { print "No quantitation section\n"; }
ms_mascotresfilebase resfile = ms_mascotresfilebase::createResfile(filename); ms_quant_configfile qf = new ms_quant_configfile(); qf.setSchemaFileName( "http://www.matrixscience.com/xmlns/schema/quantitation_2" + " ../html/xmlns/schema/quantitation_2/quantitation_2.xsd" + " http://www.matrixscience.com/xmlns/schema/quantitation_1" + " ../html/xmlns/schema/quantitation_1/quantitation_1.xsd" ); if (resfile.getQuantitation(qf)) System.out.println("Quantitation name: " + qf.getMethodByNumber(0).getName()); else System.out.println("No quantitation section");
resfile = msparser.ms_mascotresfilebase::createResfile($filename) qf = msparser.ms_quant_configfile() qf.setSchemaFileName( "http://www.matrixscience.com/xmlns/schema/quantitation_2" + " ../html/xmlns/schema/quantitation_2/quantitation_2.xsd" + " http://www.matrixscience.com/xmlns/schema/quantitation_1" + " ../html/xmlns/schema/quantitation_1/quantitation_1.xsd" ) if resfile.getQuantitation(qf) : print("Quantitation name: %s" % qf.getMethodByNumber(0).getName()) else : print("No quantitation section")
ms_mascotresfilebase resfile = ms_mascotresfilebase.createResfile(filename); ms_quant_configfile qf = new ms_quant_configfile(); qf.setSchemaFileName( "http://www.matrixscience.com/xmlns/schema/quantitation_2" + " ../html/xmlns/schema/quantitation_2/quantitation_2.xsd" + " http://www.matrixscience.com/xmlns/schema/quantitation_1" + " ../html/xmlns/schema/quantitation_1/quantitation_1.xsd" ); if (resfile.getQuantitation(qf)) Console.WriteLine("Quantitation name: " + qf.getMethodByNumber(0).getName()); else Console.WriteLine("No quantitation section");
Some functions in Mascot Parser return multiple values. In C++, this is usually handled by returning values in pointer arguments. In the API documentation, parameters to these kinds of functions are marked either as "in" or "out": "in" means the parameter is read by the function, and "out" means the function returns a value in the parameter. If the function parameter has no "in" or "out", it is assumed to be "in" by default.
In Perl, "out" values are returned as a list, while in Python the values are return as a tuple. In C# you mark the "out" parameters with the out
keyword. In Java, however, you must use arrays of length 1 as the following example shows:
In Perl, a function can return a list of values. This means that a Mascot Parser function may also return multiple values. There are only a few functions in Mascot Parser that do so; these are documented in the API. An example is ms_mascotresfilebase::getKeepAlive().
my ($kaTask, $kaPercentage, $kaAccession, $kaHit, $kaQuery, $kaText ) = $resfile->getKeepAlive();
The returning in a list only applies to native types, for example integers, doubles and strings. Objects passed to the function documented by [in,out] should be passed as normal to the function.
Note that std::string &
parameters require a StringBuffer
and the value in the StringBuffer is modified by the Mascot Parser function. There is no need to follow the instructions below for StringBuffer parameters.
In Java, methods can only have one return value. If a Mascot Parser method returns multiple native values via pointers, you must use arrays of length 1 as parameters to that function. The return values will then be at the zeroth index of the arrays. For example, ms_mascotresfilebase::getKeepAlive() returns multiple values:
int[] kaTask = {0}; int[] kaPercentage = {0}; StringBuffer kaAccession = new StringBuffer(); int[] kaHit = {0}; int[] kaQuery = {0}; StringBuffer kaText = new StringBuffer(); resfile.getKeepAlive( kaTask, kaPercentage, kaAccession, kaHit, kaQuery, kaText ); // Results in kaTask[0], kaPercentage[0], ...
In Python, functions can return multiple values in tuples. The same applies to Mascot Parser functions that return multiple values; an example is ms_mascotresfilebase::getKeepAlive().
(kaTask, kaPercentage, kaAccession, kaHit, kaQuery, kaText) = resfile.getKeepAlive()
In C#, methods can have multiple return parameters, specified using the out
keyword. Therefore, any parameter documented as an "out" parameter in the Mascot Parser API must be preceded by the out
keyword when calling the method from C#. For example, ms_mascotresfilebase::getKeepAlive() returns multiple values:
ms_mascotresfilebase.KA_TASK kaTask; int kaPercentage, kaHit, kaQuery; string kaAccession, kaText; string scriptname = resfile.getKeepAlive( out kaTask, out kaPercentage, out kaAccession, out kaHit, out kaQuery, out kaText ); // results in kaTask, kaPercentage, ...
With C#, every out
parameter must be supplied, even if the documentation states that there is a default value.
Consider the following three programs in Perl, Java, and Python, respectively. (Error handling is omitted for clarity, but adding it would not fix the problem.)
#!/usr/local/bin/perl use strict; use msparser; sub get_params { my $resfile = msparser::ms_mascotresfilebase::createResfile($_[0]); return $resfile->params; # PROBLEM HERE } my $params = get_params($ARGV[0]); print $params->getNumberOfDatabases, "\n"; # CRASH HERE
import matrix_science.msparser.*; public class example { static { try { System.loadLibrary("msparserj"); } catch (UnsatisfiedLinkError e) { System.exit(0); } } private static get_params(String filename) { ms_mascotresfilebase resfile = ms_mascotresfilebase.createResfile(filename); return resfile.params(); // PROBLEM HERE } public static void main(String argv[]) { ms_searchparams params = get_params(argv[0]); System.gc(); // See below why these are needed System.runFinalization(); // to trigger the crash. System.out.println(params.getNumberOfDatabases()); // CRASH HERE } }
#!/usr/bin/python import msparser import sys def get_params(filename): resfile = msparser.ms_mascotresfilebase.createResfile(filename) return resfile.params() # PROBLEM HERE params = get_params(sys.argv[1]) print(params.getNumberOfDatabases()) # CRASH HERE
All of the programs crash at the end of the main program while trying to print the number of databases. (Go ahead and try!) In this case, the params()
method of the resfile
object returns an object that contains an internal reference to the parent resfile
object. At the end of get_params()
, resfile
goes out of scope. The program will then crash when methods of the params
object are accessed, because resfile
is no longer in scope.
The same problem can be demonstrated in C#:
using System; using matrix_science.msparser; class GarbageCollectionExample { private static ms_peptidesummary loadPeptideSummary(string filename) { ms_mascotresfilebase resfile = ms_mascotresfilebase.createResfile(filename); ms_datfile datfile = new ms_datfile("../config/mascot.dat"); ms_mascotoptions opts = new ms_mascotoptions(); ms_mascotresults_params resParams = new ms_mascotresults_params(); resfile.get_ms_mascotresults_params(opts, resParams); return new ms_peptidesummary(resfile, resParams); // PROBLEM HERE } public static void Main(string[] argv) { ms_peptidesummary pepsum = loadPeptideSummary(argv[0]); for (int i = 1; i <= pepsum.getNumberOfHits(); i++) { ms_protein hit = pepsum.getHit(i); for (int e = 1; e <= hit.getNumPeptides(); e++) { int q = hit.getPeptideQuery(e), p = hit.getPeptideP(e); ms_peptide peptide = pepsum.getPeptide(q, p); Console.WriteLine(peptide.getPeptideStr()); // CRASH HERE } } } }
This program will crash within a few iterations when calling peptide.getPeptideStr()
. Again, the problem is that the ms_peptidesummary object contains an internal reference to an ms_mascotresfilebase object which goes out of scope at the end of loadPeptideSummary
.
If you are interested in gritty details, see Garbage collection problems (advanced reading).
if
, for
and while
do not create a new lexical scope, but just inherit either the global or function (local) scope:#!/usr/bin/python import msparser import sys if sys.argv[1]: resfile = msparser.ms_mascotresfilebase.createResfile(sys.argv[1]) params = resfile.params() if params: print(params.getNumberOfDatabases()) # No crash here.
There are two easy rules of thumb to avoid crashes such as above. In the following, suppose that $b
contains an object of class B
, and $a
contains an object of class A
.
B
takes a pointer or a reference as an argument, and you pass it $a
, you must keep a reference to $a
for as long as you use $b
. B
returns a pointer or a reference as an argument, and you store it in $a
, you must keep a reference to $b
for as long as you use $a
. There is one exception:
B
takes or returns only objects not in the matrix_science
namespace, no coupling will be introduced, and no precautions are needed for $b
and $a
. How do you recognise a C++ pointer or reference in the API documentation?
ms_protein* ms_mascotresults::getHit(const int hit); ms_inputquery::ms_inputquery(const ms_mascotresfilebase &resfile, const int q);
The first line defines a function getHit
that returns a pointer. The second line defines a function (actually a constructor) ms_inputquery
that takes a references as a parameter. So, be on a lookout for ampersands (&) and asterisks (*) when reading the API documentation.
In practice, the rules of thumb are easily followed by keeping the objects together in a hash, dictionary or array until they are no longer needed, or by creating a wrapper class that holds references to all affected objects. In short programs and scripts, you may wish to just make ms_mascotresfilebase
, ms_proteinsummary
and ms_peptidesummary
objects global instead of lexically scoped, or wrap them in singleton classes.
The following examples are written in Python, as Python code is syntactically close to natural language, and can serve as an effective pseudocode. Statements are terminated with semicolons to make the code more familiar for Perl, Java and C# programmers.
resfile = msparser.ms_mascotresfilebase::createResfile(filename); params = resfile.params(); q1 = msparser.ms_inputquery(resfile, 1);
Rule of thumb 1: keep a reference to resfile
for as long as you use params
.
Rule of thumb 2: keep a reference to resfile
for as long as you use q1
.
resfile = msparser.ms_mascotresfilebase::createResfile(filename); pepsum = msparser.ms_peptidesummary(resfile); hit = pepsum.getHit(1);
Rule of thumb 1: keep a reference to pepsum
for as long as you use hit
.
Rule of thumb 2: keep a reference to resfile
for as long as you use pepsum
.
datfile = msparser.ms_datfile(); dbs = datfile.getDatabases();
Rule of thumb 1: keep a reference to datfile
for as long as you use dbs
.
qf = msparser.ms_quant_configfile(); success = resfile.getQuantitation(qf);
Rule of thumb 2: keep a reference to resfile
for as long as you use qf
.