Matrix Science Mascot Parser toolkit
 
Loading...
Searching...
No Matches
ms_tinycdb Class Reference

Wrapper for the public domain tinycdb package http://www.corpit.ru/mjt/tinycdb.html by Michael Tokarev. More...

#include <ms_tinycdb.hpp>

Inheritance diagram for ms_tinycdb:
Collaboration diagram for ms_tinycdb:

Public Member Functions

 ms_tinycdb (const char *indexFileName, const char *versionNumber, const char *sourceFileName)
 Constructor for creating or reading a CDB index file.
 
 ~ms_tinycdb ()
 Destructor closes any files and frees memory.
 
void appendErrors (const ms_errors &src)
 Copies all errors from another instance and appends them at the end of own list.
 
void clearAllErrors ()
 Remove all errors from the current list of errors.
 
void closeIndexFile ()
 Close index file.
 
void copyFrom (const ms_errors *right)
 Use this member to make a copy of another instance.
 
bool finishCreate ()
 Finish creating the index file and open it for reading.
 
std::vector< std::string > getAllValuesFromKey (const std::string &keyName)
 Returns the text values for a key associated with the key name.
 
const ms_errsgetErrorHandler () const
 Retrive the error object using this function to get access to all errors and error parameters.
 
OFFSET64_T getFileOffsetFromKey (const std::string &keyName)
 A useful function for returning file indexes.
 
std::string getIndexFileName () const
 Return the full path to index file.
 
int getIntFromKey (const std::string &keyName)
 Return an integer value associated with a key.
 
int getIntFromKey (const std::string &keyName, bool &found)
 Return an integer value associated with a key and flag to indicate if the value was found in the CDB file.
 
int getLastError () const
 Return the error description of the last error that occurred.
 
std::string getLastErrorString () const
 Return the error description of the last error that occurred.
 
std::string getValueFromKey (const std::string &keyName, const int count=0)
 Returns the text value of a key associated with the key name.
 
bool isCreating () const
 Return true if the file is open for writing.
 
bool isOpenForReading () const
 Check to see that the file is valid and open for reading.
 
bool isPossibleToCreate () const
 Check to see if the index file can be created.
 
bool isValid () const
 Call this function to determine if there have been any errors.
 
int makeExists (const char *key) const
 See if a key exists while making the file.
 
bool openIndexFile (const bool mayRetryBuilding)
 Open, or try to open an index file.
 
bool prepareToCreate ()
 Start creating the index file.
 
bool saveFileOffsetForKey (const std::string &keyName, OFFSET64_T offset)
 A useful function for storing file indexes.
 
bool saveIntForKey (const std::string &keyName, int value)
 Store an integer value associated with a key.
 
bool saveValueForKey (const char *keyName, const char *value, const unsigned int keyNameLen=0, const unsigned int valueLen=0)
 Add a key/value pair to the file.
 
void setIndexFileName (const char *filename)
 Set the index file name.
 

Detailed Description

Wrapper for the public domain tinycdb package http://www.corpit.ru/mjt/tinycdb.html by Michael Tokarev.

ms_tinycdb is a utility for creating and using a 'static' or 'constant' (read-only) database of arbitrary key/value pairs.

After creating the ms_tinycdb object, call openIndexFile(). If this returns true, then the index file exists, is complete, and values can be retrieved by calling the getValueFromKey() function.

If openIndexFile() returns false, it is because the index file does not exist, is incomplete, out of date or corrupt. To determine the reason, call getLastError() or getLastErrorString(). If the file exists, but for example is incomplete, it is possible that it is worth trying to create it again. The function isPossibleToCreate() will return true if it is worth trying to (re-)create the file. See the documentation for openIndexFile() for a full list of errors.

To create an index file, call prepareToCreate() first, and then call saveValueForKey(), saveFileOffsetForKey() or saveIntForKey() to to save all the key/value pairs that are required. Finally, call finishCreate() to close the index file. Once finishCreate() has been called, the file is then automatically opened again for reading.

The keyName values can be any text, except values starting with =0. which are reserved for internal use:

  • =0.1 : For duplicate accessions with .cdb files created for Mascot results files.
  • =0.2 : The version string (if any) supplied to the constructor.
  • =0.3 : Used for cases where the CDB file is larger than the maximum allowed.
  • =0.4 : The 'source' file size.
  • =0.5 : The 'source' file date.

These values can be retrieved, for example by passing a value of "=0.2" to getValueFromKey().

Pseudo code for opening/creaing a file with just one value:

create new ms_tinycdb
if !tinycdb->openIndexFile() then
  if tinycdb->getLastError() ==  ERR_CDB_BEING_CREATED) then
    print "Another task is creating the index file, try again later"
  else if tinycdb->isPossibleToCreate() then
    print "(Re)-creating index file because : " . tinycdb->getLastErrorString()
    if tinycdb->prepareToCreate()
      tinycdb->saveValueForKey("MyKey", "MyValue");
      tinycdb->finishCreate()
    else
      print "Cannot create index file because : " . tinycdb->getLastErrorString()
  else
    print "Cannot create index file because : " . tinycdb->getLastErrorString()
end if

# We've either just created the index file or it was created some time ago
if tinycdb->isValid() then
  print "My value = " . tinycdb->getValueFromKey("MyKey");
end if

See also the Examples for the Mascot tools module.

Constructor & Destructor Documentation

◆ ms_tinycdb()

ms_tinycdb ( const char *  indexFileName,
const char *  versionNumber,
const char *  sourceFileName 
)

Constructor for creating or reading a CDB index file.

The constructor does not attempt to open or create an index file. The constructor will only fail if it is run on a big endian system, so there is currently no need to call isValid() after creating the object.

Parameters
indexFileNameis the full path to the index file. If it isn't specified here, then call setIndexFileName() before calling openIndexFile() or prepareToCreate().
versionNumberis a string that defines the version of the index file. This value should be changed whenever the calling application changes the indexes saved in the file in a way that is not backward compatible. If an empty string is passed, then the the version number will not be saved in the index file and there will be no check for consistency.
sourceFileNameis an optional path to the 'source' of the index. If an index is based on a particular file (e.g. a Mascot results file), it may be useful to check if the file has changed size or date, because the index probably then needs to be rebuilt. If the size and date of the files differ, then the ms_errs::ERR_CDB_SOURCE_CHANGE_RETRY or ms_errs::ERR_CDB_SOURCE_CHANGE_NO_RETRY error will be set. If this parameter is an empty string, then no checking will be performed.

◆ ~ms_tinycdb()

~ms_tinycdb ( )

Destructor closes any files and frees memory.

The index file is left open for reading until the destructor is called. In version 2.3.2 and earlier, the whole file is memory mapped, which can require a significant address space overhead for the calling application. In version 2.3.3 and later, the memory required is reduced to a few kB.

Member Function Documentation

◆ appendErrors()

void appendErrors ( const ms_errors src)
inherited

Copies all errors from another instance and appends them at the end of own list.

Parameters
srcThe object to copy the errors across from. See Maintaining object references: two rules of thumb.

◆ clearAllErrors()

void clearAllErrors ( )
inherited

Remove all errors from the current list of errors.

The list of 'errors' can include fatal errors, warning messages, information messages and different levels of debugging messages.

All messages are accumulated into a list in this object, until clearAllErrors() is called.

See Error Handling.

See also
isValid(), getLastError(), getLastErrorString(), getErrorHandler()
Examples
common_error.cpp, resfile_error.cpp, and resfile_summary.cpp.

◆ closeIndexFile()

void closeIndexFile ( )

Close index file.

This function is called when the object is destroyed. You may wish to call it from an application if, for example, you load a file and then find the file is invalid when you test a user defined key. This means that it is only necessary to call isOpenForReading() in your client code.

◆ copyFrom()

void copyFrom ( const ms_errors right)
inherited

Use this member to make a copy of another instance.

Parameters
rightis the source to initialise from

◆ finishCreate()

bool finishCreate ( )

Finish creating the index file and open it for reading.

Returns
Currently always returns true

◆ getAllValuesFromKey()

std::vector< std::string > getAllValuesFromKey ( const std::string &  keyName)

Returns the text values for a key associated with the key name.

See also
getValueFromKey()
Parameters
keyNameis the name of the key. It should not start with =0. as this is used for the version number and other internal values.
Returns
values as a vector of strings for the given key. See Using STL vector classes vectori, vectord and VectorString in Perl, Java, Python and C#.

◆ getErrorHandler()

const ms_errs * getErrorHandler ( ) const
inherited

Retrive the error object using this function to get access to all errors and error parameters.

See Error Handling.

Returns
Constant pointer to the error handler
See also
isValid(), getLastError(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Examples
common_error.cpp, and http_helper_getstring.cpp.

◆ getFileOffsetFromKey()

OFFSET64_T getFileOffsetFromKey ( const std::string &  keyName)

A useful function for returning file indexes.

This function is useful for accessing file indexes that have been stored in a CDB file. To provide a quick look up into a source file (for example a Mascot results file), keys are used to store a byte offset into the file. The advantage of using these functions is that the offset is saved as a binary value and is always more compact than saving it in text format. One disadvantage of these functions is that there is currently no conversion between little endian and big endian format, so index files could not be moved between little endian and big endian systems.

Perl: 64 bit integer support is not available in all versions of Perl, so the saveIntForKey() and getIntFromKey() functions take 32 bit offsets on 32 bit versions of Perl. If used for file offsets, the file size limit is then 2Gb when using those functions.

Parameters
keyNameis the name of the key. In C++, this key value may contain binary information. In other languages, it must be standard text.
Returns
the offset, or 0 if it is not found.

◆ getIntFromKey() [1/2]

int getIntFromKey ( const std::string &  keyName)

Return an integer value associated with a key.

This function is useful for retrieving an integer value from a CDB file. See also getFileOffsetFromKey() for using 64 bit integer values.

The advantage of using this function rather than getValueFromKey() is that the value is saved as a binary value and is always more compact than saving it in text format. One disadvantage of this function is that there is currently no conversion between little endian and big endian format, so index files could not be moved between little endian and big endian systems.

See also
saveIntForKey()
Parameters
keyNameis the name of the key. In C++, this key value may contain binary information. In other languages, it must be standard text.
Returns
the offset, or 0 if it is not found.

◆ getIntFromKey() [2/2]

int getIntFromKey ( const std::string &  keyName,
bool &  found 
)

Return an integer value associated with a key and flag to indicate if the value was found in the CDB file.

This function is useful for retrieving an integer value from a CDB file. See also getFileOffsetFromKey() for using 64 bit integer values.

The advantage of using this function rather than getValueFromKey() is that the value is saved as a binary value and is always more compact than saving it in text format. One disadvantage of this function is that there is currently no conversion between little endian and big endian format, so index files could not be moved between little endian and big endian systems.

See also
saveIntForKey()
Parameters
keyNameis the name of the key. In C++, this key value may contain binary information. In other languages, it must be standard text.
foundwill be set to true if the value is found in the CDB file and false if it is not found.
Returns
the offset, or 0 if it is not found.

◆ getLastError()

int getLastError ( ) const
inherited

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

See Error Handling.

See also
isValid(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Returns
the error number of the last error, or 0 if there have been no errors.

◆ getLastErrorString()

std::string getLastErrorString ( ) const
inherited

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

Returns
Most recent error, warning, information or debug message

See Error Handling.

See also
isValid(), getLastError(), clearAllErrors(), getErrorHandler()
Examples
common_error.cpp, config_enzymes.cpp, config_fragrules.cpp, config_license.cpp, config_mascotdat.cpp, config_masses.cpp, config_modfile.cpp, config_procs.cpp, config_quantitation.cpp, config_taxonomy.cpp, http_helper_getstring.cpp, and tools_aahelper.cpp.

◆ getValueFromKey()

std::string getValueFromKey ( const std::string &  keyName,
const int  count = 0 
)

Returns the text value of a key associated with the key name.

Note, if you want to call this method with a high value for count, it is advised to use getAllValuesFromKey() instead, which is much more efficient.

See also
saveValueForKey()
getAllValuesFromKey()
Parameters
keyNameis the name of the key. It should not start with =0. as this is used for the version number and other internal values.
countshould be zero unless multiple occurrences of the key are expected in the file.
Returns
value as a string for the given key

◆ isCreating()

bool isCreating ( ) const

Return true if the file is open for writing.

If prepareToCreate() returns true, then the file is open for writing

Returns
true if the cdb file is open for writing.

◆ isOpenForReading()

bool isOpenForReading ( ) const

Check to see that the file is valid and open for reading.

Returns
true if the index is useable.

◆ isPossibleToCreate()

bool isPossibleToCreate ( ) const

Check to see if the index file can be created.

This function will return true unless one of the following errors have occurred:

Returns
True if the calling application should try and (re)build the index file.

◆ isValid()

bool isValid ( ) const
inherited

◆ makeExists()

int makeExists ( const char *  keyName) const

See if a key exists while making the file.

Should only be called while creating an index file.

Parameters
keyNameis the name of the key for the key/value pair.
Returns
A value of 1 if the key already exists.

◆ openIndexFile()

bool openIndexFile ( const bool  mayRetryBuilding)

Open, or try to open an index file.

After creating an ms_tinycdb object, call this function to try and open the file. If the file is opened successfully, the function will return true. If it returns false, one or more of the following errors will be set and can be retrieved using getLastError().

ms_errs::ERR_MISSING_CDB_FILE:

The file does not exist or cannot be opened.

ms_errs::ERR_CDB_FAIL_LOCK and ms_errs::ERR_CDB_BEING_CREATED:

The index file is locked, probably because another process is writing to it. The calling application can decide to wait before retrying or to continue without the index if that is possible.

ms_errs::ERR_FAIL_CDB_INIT:

Normally because the index file is incomplete.

ms_errs::ERR_CDB_INCOMPLETE_RETRY or ms_errs::ERR_CDB_INCOMPLETE_NO_RETRY:

If a versionNumber has been specified in the constructor, but there is no key/value pair for the version (using the reserved internal key "=0.2"), then one of these errors is set. Which of the two errors is reported, depends on the mayRetryBuilding parameter.

ms_errs::ERR_CDB_OLD_VER_RETRY or ms_errs::ERR_CDB_OLD_VER_NO_RETRY:

If a versionNumber has been specified in the constructor, and there is a key/value pair for the version (using the reserved internal key "=0.2"), and the version string is different, then one of these errors is set. Which of the two errors is reported, depends on the mayRetryBuilding parameter.

ms_errs::ERR_CDB_SOURCE_CHANGE_RETRY or ms_errs::ERR_CDB_SOURCE_CHANGE_NO_RETRY:

If a sourceFileName has been specified in the constructor then the size and last modified date are compared with the key/value pairs in the index file using the reserved keys "=0.4" and "=0.5". If the values differ, one of these errors is reported. Which of the two errors is reported, depends on the mayRetryBuilding parameter.

ms_errs::ERR_CDB_TOO_LARGE or ms_errs::ERR_CDB_64_BIT_REMAKE:
The maximum size of an index file in Mascot Parser 2.3.3 and earlier was 2GB for a 32 bit application and 4GB for a 64 bit application because the offsets stored in the file were 32 bit. If the index file approaches this size then a value is saved in the partially complete file using the reservered key "=0.3" and this error is set. The maximum file size for Mascot Parser 2.4.1 and later is 256TB. If the file was originally created using an earlier version of Mascot Parser then it is worth trying to re-create the file. If the mayRetryBuilding flag is set then the second error is reported.
Parameters
mayRetryBuildingshould be set to true if an attempt may be made to rebuild the index file if it is incomplete or if it is the wrong version. This just effects the errors put into the ms_errs object. For example, it would use ms_errs::ERR_CDB_OLD_VER_RETRY rather than ms_errs::ERR_CDB_OLD_VER_NO_RETRY.
Returns
true if the file was opened successfully.

◆ prepareToCreate()

bool prepareToCreate ( )

Start creating the index file.

To create an index file, call this function first, and then call saveValueForKey(), saveFileOffsetForKey() or saveIntForKey() to to save all the key/value pairs that are required. Finally, call finishCreate() to close the index file and open it again for reading.

Returns
true if the file is created successfully.

◆ saveFileOffsetForKey()

bool saveFileOffsetForKey ( const std::string &  keyName,
OFFSET64_T  offset 
)

A useful function for storing file indexes.

This function is useful for accessing file indexes that have been stored in a CDB file. To provide a quick look up into a source file (for example a Mascot results file), keys are used to store a byte offset into the file. The advantage of using these functions is that the offset is saved as a binary value and is always more compact than saving it in text format. One disadvantage of these functions is that there is currently no conversion between little endian and big endian format, so index files could not be moved between little endian and big endian systems.

Perl: 64 bit integer support is not available in all versions of Perl, so the saveIntForKey() and getIntFromKey() functions take 32 bit offsets on 32 bit versions of Perl. If used for file offsets, the file size limit is then 2Gb when using those functions.

Parameters
keyNameis the name of the key. In C++, this key value may contain binary information. In other languages, it must be standard text.
offsetis the file offset to be associated with keyName.
Returns
True if the function succeeds.

◆ saveIntForKey()

bool saveIntForKey ( const std::string &  keyName,
int  value 
)

Store an integer value associated with a key.

This function is useful for storing an integer value in a CDB file. See also saveFileOffsetFromKey() for using 64 bit integer values.

The advantage of using this function rather than saveValueForKey is that the value is saved as a binary value and is always more compact than saving it in text format. One disadvantage of this function is that there is currently no conversion between little endian and big endian format,so index files could not be moved between little endian and big endian systems.

See also
saveIntForKey()
Parameters
keyNameis the name of the key. In C++, this key value may contain binary information. In other languages, it must be standard text.
valueis the integer value to be associated with keyName.
Returns
True if the function succeeds.

◆ saveValueForKey()

bool saveValueForKey ( const char *  keyName,
const char *  value,
const unsigned int  keyNameLen = 0,
const unsigned int  valueLen = 0 
)

Add a key/value pair to the file.

The prepareToCreate() function must have been called prior to calling this function.

C++: The keyName and value parameters will normally point to null terminated text strings. To use a binary data (in either the key or the value or both), specify the length of the data in the keyNameLen and the valLen parameters.

Parameters
keyNameis the name of the key. It should not start with =0. as this is used for the version number and other internal values.
valueis the value of the key to be stored.
keyNameLenis the length of the data pointed to by the keyName parameter. If keyNameLen is 0 (the default), then it is assumed that the keyName is a null terminated string.
valueLenis the length of the data pointed to by the value parameter. If valueLen is 0 (the default), then it is assumed that the value is a null terminated string.
Returns
true if the key/value pair is saved. It will fail if this would make the file too large, in which case the ms_errs::ERR_CDB_TOO_LARGE error will be set. It will also fail with the error ms_errs::ERR_WRITE_CDB_FILE if the file cannot be written to. Finally, it will fail without an error if the prepareToCreate() function has not been called.

◆ setIndexFileName()

void setIndexFileName ( const char *  filename)

Set the index file name.

Parameters
filenameThe full path to the index file.

The documentation for this class was generated from the following files: