Matrix Science Mascot Parser toolkit
 
Loading...
Searching...
No Matches
ms_zip Class Reference

Utility class for compressing and decompressing data in compress format, and compressing in gzip format. More...

#include <ms_zip.hpp>

Inheritance diagram for ms_zip:
Collaboration diagram for ms_zip:

Public Member Functions

 ms_zip (const bool isZipped)
 Streaming mode constructor.
 
 ms_zip (const bool isZipped, const std::string &buffer)
 Buffer mode constructor.
 
 ms_zip (const bool isZipped, const unsigned char *buffer, const unsigned long len)
 Buffer mode constructor (only for C++).
 
 ms_zip (const ms_zip &src)
 Copying constructor.
 
void appendErrors (const ms_errors &src)
 Copies all errors from another instance and appends them at the end of own list.
 
void clearAllErrors ()
 Remove all errors from the current list of errors.
 
std::string compressMore (const std::string &dataIn)
 Feed more data to the compressor and retrieve the next chunk of compressed data.
 
void compressMore (const unsigned char *dataIn, const unsigned long inputLen, unsigned char *dataOut, unsigned long *outputLen)
 Feed more data to the compressor and retrieve the next chunk of compressed data.
 
void copyFrom (const ms_errors *right)
 Use this member to make a copy of another instance.
 
void copyFrom (const ms_zip *right)
 Call this member to copy all the information from another instance.
 
std::string decompressMore (const std::string &dataIn)
 Retrieve the next chunk of decompressed data.
 
void decompressMore (const unsigned char *dataIn, const unsigned long inputLen, unsigned char *dataOut, unsigned long *outputLen)
 Retrieve the next chunk of decompressed data.
 
const ms_errsgetErrorHandler () const
 Retrive the error object using this function to get access to all errors and error parameters.
 
int getLastError () const
 Return the error description of the last error that occurred.
 
std::string getLastErrorString () const
 Return the error description of the last error that occurred.
 
std::string getUnZipped () const
 Return the uncompressed buffer.
 
unsigned long getUnZipped (unsigned char *buffer, const unsigned long len) const
 Copy the uncompressed buffer into a buffer.
 
unsigned long getUnZippedLen () const
 Return the length of the uncompressed buffer.
 
std::string getZipped () const
 Return the compressed buffer.
 
unsigned long getZipped (unsigned char *buffer, const unsigned long len) const
 Copy the compressed buffer into a buffer.
 
unsigned long getZippedLen () const
 Return the length of the compressed buffer.
 
bool isValid () const
 Call this function to determine if there have been any errors.
 
ms_zipoperator= (const ms_zip &right)
 C++ style assignment operator.
 

Detailed Description

Utility class for compressing and decompressing data in compress format, and compressing in gzip format.

ms_zip is used internally in Mascot Parser, for example for decompressing unimod.xml and other configuration files when they are downloaded from a remote web site.

There are two ways to use the library: buffer mode and streaming mode.

Usage in buffer mode:

  1. Create a new object using ms_zip::ms_zip(const bool, const unsigned char*, const unsigned long) or ms_zip::ms_zip(const bool, const std::string&). The parameter buffer is the data to process; the parameter isZipped determines whether the data should be compressed (false) or decompressed (true).
  2. Check for errors after creating the object using ms_zip::isValid().
  3. If there are no errors, the resulting data can be read from ms_zip::getZipped() (compress mode) and ms_zip::getUnZipped() (decompress mode).

In buffer mode, both the compressed and decompressed data are kept in an internal buffer. The uncompressed input must not exceed 10 MB. The format of the compressed data is identical to the Unix command-line utility compress, except that the first four bytes of the buffer contain the length of the compressed data. For this reason, ms_zip buffer mode should only be used to decompress data that has been compressed with ms_zip buffer mode.

Usage in streaming compression mode:

  1. Create a new object using ms_zip::ms_zip(const bool) with parameter false.
  2. Check for errors after creating the object using ms_zip::isValid().
  3. If there are no errors, feed in data using ms_zip::compressMore(const unsigned char *, const unsigned long, unsigned char *, unsigned long *) or ms_zip::compressMore(const std::string &).
  4. Check for errors using ms_zip::isValid(). If there are no errors, continue from step 3 until you've run out of input data.
  5. Flush the stream by giving the empty string as parameter to ms_zip::compressMore(const std::string &) (or let inputLen be 0 in ms_zip::compressMore(const unsigned char *, const unsigned long, unsigned char *, unsigned long *) ).
  6. Repeat step 5 as long as ms_zip::isValid() and ms_zip::compressMore() returns more data.

In streaming mode, only the most recent input and output chunk is stored in the internal buffer. The input to ms_zip::compressMore() must not exceed 10 MB. The compressed data stream follows the gzip format with a minimal file header. The data can be decompressed with any tool that supports gzipped files.

Usage in streaming decompression mode:

  1. Create a new object using ms_zip::ms_zip(const bool) with parameter true.
  2. Check for errors after creating the object using ms_zip::isValid().
  3. If there are no errors, feed in data using ms_zip::decompressMore(const unsigned char *, const unsigned long, unsigned char *, unsigned long *) or ms_zip::decompressMore(const std::string &).
  4. Check for errors using ms_zip::isValid(). If there are no errors, continue from step 3 until you've run out of input data.
  5. Flush the stream by giving the empty string as parameter to ms_zip::decompressMore(const std::string &) (or let inputLen be 0 in ms_zip::decompressMore(const unsigned char *, const unsigned long, unsigned char *, unsigned long *) ).
  6. Repeat step 5 as long as ms_zip::isValid() and ms_zip::decompressMore() returns more data.

The input to ms_zip::decompressMore() must not exceed 10 MB. The compressed data stream is expected to be in gzip format with a minimal file header, for example a gzipped file.

Constructor & Destructor Documentation

◆ ms_zip() [1/4]

ms_zip ( const bool  isZipped,
const unsigned char *  buffer,
const unsigned long  len 
)

Buffer mode constructor (only for C++).

This constructor initialises ms_zip for buffer mode. ms_zip can be used to compress and uncompress a small amount of data (less than 10 MB) in a buffer. When decompressing in buffer mode, the only supported data format is data compressed by ms_zip in buffer mode.

The first four bytes of the compressed buffer are used to store the length of the uncompressed data as an unsigned 32 bit integer (little endian) format. No other headers are written. The object will hold a copy of both the compressed and uncompressed data, so it is not suitable for use with large amounts of data. Use the streaming mode in that case.

After creating the object, call isValid() to determine if there are any errors. The errors can be retrieved using getLastErrorString().

Possible errors are:

Parameters
isZippedshould be true if the passed buffer contains compressed data (decompression mode), and false if it contains uncompressed data (compression mode).
bufferis a pointer to the compressed or uncompressed data. If it is uncompressed, there is no assumption that the data is a null terminated string.
lenis the length of the passed buffer.

◆ ms_zip() [2/4]

ms_zip ( const bool  isZipped,
const std::string &  buffer 
)

Buffer mode constructor.

This constructor initialises ms_zip for buffer mode. ms_zip can be used to compress and uncompress a small amount of data (less than 10 MB) in a buffer. When decompressing in buffer mode, the only supported data format is data compressed by ms_zip in buffer mode.

The first four bytes of the compressed buffer are used to store the length of the uncompressed data as an unsigned 32 bit integer (little endian) format. No other headers are written. The object will hold a copy of both the compressed and uncompressed data, so it is not suitable for use with large amounts of data. Use the streaming mode in that case.

After creating the object, call isValid() to determine if there are any errors. The errors can be retrieved using getLastErrorString().

Possible errors are:

Parameters
isZippedshould be true if the passed buffer contains compressed data (decompression mode), and false if it contains uncompressed data (compression mode).
bufferis a string that contains the compressed or uncompressed data. If it is uncompressed, there is no assumption that the data is a null terminated string. C++ programmers should be aware that a std::string constructor needs to be passed the length parameter when creating a std::string that contains binary data. Otherwise, a 'zero' in the data will be considered to be a null terminator for a string.

◆ ms_zip() [3/4]

ms_zip ( const bool  isZipped)
explicit

Streaming mode constructor.

This constructor initialises ms_zip in streaming mode. ms_zip can be used for compressing arbitrarily large data streams into gzip format. When decompressing in streaming mode, the only supported data format is gzip format.

In streaming compression mode, the compressed data stream will start with a minimal gzip header, followed by the compressed data. The constructor does not take any input data as a parameter, unlike in buffer mode. Instead, input data is fed to the object using ms_zip::compressMore(), which returns sequential chunks of compressed data. End of input data is indicated by passing the empty string to ms_zip::compressMore().

In streaming decompression mode, the compressed data stream is expected to start with a minimal gzip header, followed by the compressed data. Input data is fed to the object using ms_zip::decompressMore(), which returns sequential chunks of decompressed data. End of input data is indicated by passing the empty string to ms_zip::decompressMore().

If the underlying zlib library cannot be initialised, the following errors are possible:

Parameters
isZippedmust always be false for compression mode.

◆ ms_zip() [4/4]

ms_zip ( const ms_zip src)

Copying constructor.

Generally only used from C++, but will be called indirectly from other languages.

Parameters
srcis the ms_zip to make a copy of.

Member Function Documentation

◆ appendErrors()

void appendErrors ( const ms_errors src)
inherited

Copies all errors from another instance and appends them at the end of own list.

Parameters
srcThe object to copy the errors across from. See Maintaining object references: two rules of thumb.

◆ clearAllErrors()

void clearAllErrors ( )
inherited

Remove all errors from the current list of errors.

The list of 'errors' can include fatal errors, warning messages, information messages and different levels of debugging messages.

All messages are accumulated into a list in this object, until clearAllErrors() is called.

See Error Handling.

See also
isValid(), getLastError(), getLastErrorString(), getErrorHandler()
Examples
common_error.cpp, resfile_error.cpp, and resfile_summary.cpp.

◆ compressMore() [1/2]

std::string compressMore ( const std::string &  dataIn)

Feed more data to the compressor and retrieve the next chunk of compressed data.

You may need to call ms_zip::compressMore() a few times with more input data until the compressed stream starts. This is indicated by returning a non-empty string.

Both the input and output strings are bounded: the input string length must not exceed 10 MB and the output string length will never exceed 10 MB.

End of input data is indicated by calling ms_zip::compressMore() with the empty string (length 0). The compressor will flush the output and return the last compressed bytes. If there are more bytes to return than the internal buffer allows, you must call the method again with the empty string. When there is no more data to be flushed, the returned string will be empty (length 0).

If the compressor has more output than fits in the output buffer, ms_zip pushes it into an internal output queue. The next call to ms_zip::compressMore() returns the chunk from the head of the internal queue and may append more to the tail of the queue. The operation is not visible to the caller. However, it's best to keep the size of input smaller than the output buffer, as otherwise ms_zip may use an unexpected amount of memory for the output queue.

If this method is called in non-streaming mode, an empty string object will be returned and the error ms_errs::ERR_MSP_ZIP_NOTSTREAMING set.

Other possible error conditions:

Parameters
dataInNext chunk of raw binary data to compress. The data need not be null terminated. C++ programmers should be aware that a std::string constructor needs to be passed the length parameter when creating a std::string that contains binary data. Otherwise, a 'zero' in the data will be considered to be a null terminator for a string.
Returns
A string (possibly empty) of raw compressed binary data.

◆ compressMore() [2/2]

void compressMore ( const unsigned char *  dataIn,
const unsigned long  inputLen,
unsigned char *  dataOut,
unsigned long *  outputLen 
)

Feed more data to the compressor and retrieve the next chunk of compressed data.

You may need to call ms_zip::compressMore() a few times with more input data until the compressed stream starts. This is indicated by returning a non-empty string.

Both the input and output strings are bounded: inputLen and outputLen must not exceed 10 MB, and outputLen must always exceed 0. If any of these conditions fail, a corresponding error will be set.

End of input data is indicated by calling ms_zip::compressMore() with inputLen equal to 0 (dataIn can be nullptr). The compressor will flush the output and return the last compressed bytes. If there are more bytes to return than outputLen, you must call the method again with inputLen equal to 0. When there is no more data to be flushed, outputLen is set to 0.

If the compressor has more output than fits in the output buffer, ms_zip pushes it into an internal output queue. The next call to ms_zip::compressMore() returns the chunk from the head of the internal queue and may append more to the tail of the queue. The operation is not visible to the caller. However, it's best to keep the size of input smaller than the output buffer, as otherwise ms_zip may use an unexpected amount of memory for the output queue.

If this method is called in non-streaming mode, outputLen will be set to 0 and the error ms_errs::ERR_MSP_ZIP_NOTSTREAMING set.

Other possible error conditions:

Parameters
dataInNext chunk of raw binary data compress. The data is not assumed to be null terminated.
inputLenLength of data in dataIn.
dataOutPointer to the memory location where compressed data should be written. The caller must ensure at least outputLen bytes have been allocated.
outputLenMaximum length of data to output. outputLen will be set to the actual length of dataOut on successful return.

◆ copyFrom() [1/2]

void copyFrom ( const ms_errors right)
inherited

Use this member to make a copy of another instance.

Parameters
rightis the source to initialise from

◆ copyFrom() [2/2]

void copyFrom ( const ms_zip right)

Call this member to copy all the information from another instance.

Simply create an instance of the class using the default constructor and call this method.

Parameters
rightis a pointer to another instance to copy from.

◆ decompressMore() [1/2]

std::string decompressMore ( const std::string &  dataIn)

Retrieve the next chunk of decompressed data.

You may need to call ms_zip::decompressMore() a few times with more input data until the compressed stream starts. This is indicated by returning a non-empty string.

Both the input and output strings are bounded: the input string length must not exceed 10 MB and the output string length will never exceed 10 MB.

End of input data is indicated by calling ms_zip::decompressMore() with the empty string (length 0). The compressor will flush the output and return the last decompressed bytes. If there are more bytes to return than the internal buffer allows, you must call the method again with the empty string. When there is no more data to be flushed, the returned string will be empty (length 0).

If the compressor has more output than fits in the output buffer, ms_zip pushes it into an internal output queue. The next call to ms_zip::decompressMore() returns the chunk from the head of the internal queue and may append more to the tail of the queue. The operation is not visible to the caller. However, it's best to keep the size of input smaller than the output buffer, as otherwise ms_zip may use an unexpected amount of memory for the output queue.

If this method is called in non-streaming mode, an empty string object will be returned and the error ms_errs::ERR_MSP_ZIP_NOTSTREAMING set.

Other possible error conditions:

Parameters
dataInNext chunk of raw binary data to decompress. The data need not be a null terminated string. C++ programmers should be aware that a std::string constructor needs to be passed the length parameter when creating a std::string that contains binary data. Otherwise, a 'zero' in the data will be considered to be a null terminator for a string.
Returns
A string (possibly empty) of raw compressed binary data.

◆ decompressMore() [2/2]

void decompressMore ( const unsigned char *  dataIn,
const unsigned long  inputLen,
unsigned char *  dataOut,
unsigned long *  outputLen 
)

Retrieve the next chunk of decompressed data.

You may need to call ms_zip::decompressMore() a few times with more input data until the decompressed stream starts. This is indicated by returning a non-empty string.

Both the input and output strings are bounded: inputLen and outputLen must not exceed 10 MB, and outputLen must always exceed 0. If any of these conditions fail, a corresponding error will be set.

End of input data is indicated by calling ms_zip::decompressMore() with inputLen equal to 0 (dataIn can be nullptr). The compressor will flush the output and return the last decompressed bytes. If there are more bytes to return than outputLen, you must call the method again with inputLen equal to 0. When there is no more data to be flushed, outputLen is set to 0.

If the compressor has more output than fits in the output buffer, ms_zip pushes it into an internal output queue. The next call to ms_zip::decompressMore() returns the chunk from the head of the internal queue and may append more to the tail of the queue. The operation is not visible to the caller. However, it's best to keep the size of input smaller than the output buffer, as otherwise ms_zip may use an unexpected amount of memory for the output queue.

If this method is called in non-streaming mode, outputLen will be set to 0 and the error ms_errs::ERR_MSP_ZIP_NOTSTREAMING set.

Other possible error conditions:

Parameters
dataInNext chunk of raw binary data decompress. The data is not assumed to be a null terminated string.
inputLenLength of data in dataIn.
dataOutPointer to the memory location where decompressed data should be written. The caller must ensure at least outputLen bytes have been allocated.
outputLenMaximum length of data to output. outputLen will be set to the actual length of dataOut on successful return.

◆ getErrorHandler()

const ms_errs * getErrorHandler ( ) const
inherited

Retrive the error object using this function to get access to all errors and error parameters.

See Error Handling.

Returns
Constant pointer to the error handler
See also
isValid(), getLastError(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Examples
common_error.cpp, and http_helper_getstring.cpp.

◆ getLastError()

int getLastError ( ) const
inherited

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

See Error Handling.

See also
isValid(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Returns
the error number of the last error, or 0 if there have been no errors.

◆ getLastErrorString()

std::string getLastErrorString ( ) const
inherited

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

Returns
Most recent error, warning, information or debug message

See Error Handling.

See also
isValid(), getLastError(), clearAllErrors(), getErrorHandler()
Examples
common_error.cpp, config_enzymes.cpp, config_fragrules.cpp, config_license.cpp, config_mascotdat.cpp, config_masses.cpp, config_modfile.cpp, config_procs.cpp, config_quantitation.cpp, config_taxonomy.cpp, http_helper_getstring.cpp, and tools_aahelper.cpp.

◆ getUnZipped() [1/2]

std::string getUnZipped ( ) const

Return the uncompressed buffer.

In buffer mode, return the uncompressed data.

In streaming compression mode, return the most recent input value to ms_zip::compressMore().

In streaming decompression mode, return the most recent value from ms_zip::decompressMore().

Note
In C++, the string will not be null terminated, so you should use the std::string::length function to determing the length of the data.
In Perl, use the length() function on the returned string. The uncompressed data may be binary or text.
Returns
uncompressed data as a string

◆ getUnZipped() [2/2]

unsigned long getUnZipped ( unsigned char *  buffer,
const unsigned long  len 
) const

Copy the uncompressed buffer into a buffer.

In buffer mode, return the uncompressed data.

In streaming compression mode, return the most recent input value to ms_zip::compressMore().

In streaming decompression mode, return the most recent value from ms_zip::decompressMore().

Parameters
bufferis the location where the uncompressed data will be copied to. The calling application should make sure that the buffer is long enough. Use getUnZippedLen() to find the length of the uncompressed data.
lenis the length of the passed buffer.
Returns
The number of bytes copied to the buffer.

◆ getUnZippedLen()

unsigned long getUnZippedLen ( ) const

Return the length of the uncompressed buffer.

Returns
The length, in bytes, of the uncompressed data from getUnZipped(unsigned char*, const unsigned int).

◆ getZipped() [1/2]

std::string getZipped ( ) const

Return the compressed buffer.

In buffer mode, return the compressed data.

In streaming compression mode, return the most recent value from ms_zip::compressMore().

In streaming decompression mode, return the most recent value given to ms_zip::decompressMore().

Note
In C++, the string will not be null terminated, so you should use the std::string::length function to determing the length of the data.
In Perl, use the length() function on the returned string.
Returns
compressed data as a string

◆ getZipped() [2/2]

unsigned long getZipped ( unsigned char *  buffer,
const unsigned long  len 
) const

Copy the compressed buffer into a buffer.

In buffer mode, return the compressed data.

In streaming compression mode, return the most recent value from ms_zip::compressMore().

In streaming decompression mode, return the most recent value given to ms_zip::decompressMore().

Parameters
bufferis the location where the compressed data will be copied to. The calling application should make sure that the buffer is long enough. Use getZippedLen() to find the length of the compressed data.
lenis the length of the passed buffer.
Returns
The number of bytes copied to the buffer.

◆ getZippedLen()

unsigned long getZippedLen ( ) const

Return the length of the compressed buffer.

Returns
The length, in bytes, of the compressed data from getZipped(unsigned char*, const unsigned int).

◆ isValid()

bool isValid ( ) const
inherited

The documentation for this class was generated from the following files: