LZW Compressor
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Pages
lzw_v6.cpp File Reference

LZW file compressor. More...

#include <algorithm>
#include <array>
#include <climits>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <exception>
#include <fstream>
#include <ios>
#include <iostream>
#include <istream>
#include <limits>
#include <memory>
#include <ostream>
#include <stdexcept>
#include <string>
#include <utility>
#include <vector>

Go to the source code of this file.

Classes

class  EncoderDictionary
 Encoder's custom dictionary type. More...
 
struct  EncoderDictionary::Node
 Binary search tree node. More...
 
struct  ByteCache
 Helper structure for use in CodeWriter and CodeReader. More...
 
class  CodeWriter
 Variable binary width code writer. More...
 
class  CodeReader
 Variable binary width code reader. More...
 

Typedefs

using CodeType = std::uint32_t
 Type used to store and retrieve codes.
 

Enumerations

enum  MetaCode : CodeType { Eof = 1u << CHAR_BIT }
 Special codes used by the encoder to control the decoder. More...
 

Functions

std::size_t required_bits (unsigned long int n)
 Computes the minimum number of bits required to store the value of n. More...
 
void compress (std::istream &is, std::ostream &os)
 Compresses the contents of is and writes the result to os. More...
 
void decompress (std::istream &is, std::ostream &os)
 Decompresses the contents of is and writes the result to os. More...
 
void print_usage (const std::string &s="", bool su=true)
 Prints usage information and a custom error message. More...
 
int main (int argc, char *argv[])
 Actual program entry point. More...
 

Variables

const CodeType globals::dms {512 * 1024}
 Dictionary Maximum Size (when reached, the dictionary will be reset)
 

Detailed Description

LZW file compressor.

Author
Julius Pettersson
Version
6
Remarks
This version borrows heavily from Juha Nieminen's work.

This is the C++11 implementation of a Lempel-Ziv-Welch single-file command-line compressor. It was written with Doxygen comments.

See Also
http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch
http://marknelson.us/2011/11/08/lzw-revisited/
http://www.cs.duke.edu/csed/curious/compression/lzw.html
http://warp.povusers.org/EfficientLZW/index.html
http://en.cppreference.com/
http://www.doxygen.org/
Remarks
DF: the data file
EF: the encoded file

Definition in file lzw_v6.cpp.

Enumeration Type Documentation

Special codes used by the encoder to control the decoder.

Todo:
Metacodes should not be hardcoded to match their index.
Enumerator
Eof 

End-of-file.

Definition at line 59 of file lzw_v6.cpp.

59  : CodeType {
60  Eof = 1u << CHAR_BIT,
61 };

Function Documentation

void compress ( std::istream &  is,
std::ostream &  os 
)

Compresses the contents of is and writes the result to os.

Parameters
[in]isinput stream
[out]osoutput stream

Definition at line 449 of file lzw_v6.cpp.

References CodeWriter::get_bits(), CodeWriter::increase_bits(), required_bits(), EncoderDictionary::reset(), CodeWriter::reset_bits(), EncoderDictionary::search_and_insert(), EncoderDictionary::search_initials(), EncoderDictionary::size(), and CodeWriter::write().

Referenced by main().

450 {
452  CodeWriter cw(os);
453  CodeType i {globals::dms}; // Index
454  char c;
455  bool rbwf {false}; // Reset Bit Width Flag
456 
457  while (is.get(c))
458  {
459  // dictionary's maximum size was reached
460  if (ed.size() == globals::dms)
461  {
462  ed.reset();
463  rbwf = true;
464  }
465 
466  const CodeType temp {i};
467 
468  if ((i = ed.search_and_insert(temp, c)) == globals::dms)
469  {
470  cw.write(temp);
471  i = ed.search_initials(c);
472 
473  if (required_bits(ed.size() - 1) > cw.get_bits())
474  cw.increase_bits();
475  }
476 
477  if (rbwf)
478  {
479  cw.reset_bits();
480  rbwf = false;
481  }
482  }
483 
484  if (i != globals::dms)
485  cw.write(i);
486 }
void decompress ( std::istream &  is,
std::ostream &  os 
)

Decompresses the contents of is and writes the result to os.

Parameters
[in]isinput stream
[out]osoutput stream

Definition at line 493 of file lzw_v6.cpp.

Referenced by main().

494 {
495  std::vector<std::pair<CodeType, char>> dictionary;
496 
497  // "named" lambda function, used to reset the dictionary to its initial contents
498  const auto reset_dictionary = [&dictionary] {
499  dictionary.clear();
500  dictionary.reserve(globals::dms);
501 
502  const long int minc = std::numeric_limits<char>::min();
503  const long int maxc = std::numeric_limits<char>::max();
504 
505  for (long int c = minc; c <= maxc; ++c)
506  dictionary.push_back({globals::dms, static_cast<char> (c)});
507 
508  // add dummy elements for the metacodes
509  dictionary.push_back({0, '\x00'}); // MetaCode::Eof
510  };
511 
512  const auto rebuild_string = [&dictionary](CodeType k) -> const std::vector<char> * {
513  static std::vector<char> s; // String
514 
515  s.clear();
516 
517  // the length of a string cannot exceed the dictionary's number of entries
518  s.reserve(globals::dms);
519 
520  while (k != globals::dms)
521  {
522  s.push_back(dictionary[k].second);
523  k = dictionary[k].first;
524  }
525 
526  std::reverse(s.begin(), s.end());
527  return &s;
528  };
529 
530  reset_dictionary();
531 
532  CodeReader cr(is);
533  CodeType i {globals::dms}; // Index
534  CodeType k; // Key
535 
536  while (true)
537  {
538  // dictionary's maximum size was reached
539  if (dictionary.size() == globals::dms)
540  {
541  reset_dictionary();
542  cr.reset_bits();
543  }
544 
545  if (required_bits(dictionary.size()) > cr.get_bits())
546  cr.increase_bits();
547 
548  if (!cr.read(k))
549  break;
550 
551  if (k > dictionary.size())
552  throw std::runtime_error("invalid compressed code");
553 
554  const std::vector<char> *s; // String
555 
556  if (k == dictionary.size())
557  {
558  dictionary.push_back({i, rebuild_string(i)->front()});
559  s = rebuild_string(k);
560  }
561  else
562  {
563  s = rebuild_string(k);
564 
565  if (i != globals::dms)
566  dictionary.push_back({i, s->front()});
567  }
568 
569  os.write(&s->front(), s->size());
570  i = k;
571  }
572 
573  if (cr.corrupted())
574  throw std::runtime_error("corrupted compressed file");
575 }
int main ( int  argc,
char *  argv[] 
)

Actual program entry point.

Parameters
argcnumber of command line arguments
[in]argvarray of command line arguments
Return values
EXIT_FAILUREfor failed operation
EXIT_SUCCESSfor successful operation

Definition at line 608 of file lzw_v6.cpp.

References compress(), decompress(), and print_usage().

609 {
610  if (argc != 4)
611  {
612  print_usage("Wrong number of arguments.");
613  return EXIT_FAILURE;
614  }
615 
616  enum class Mode {
617  Compress,
618  Decompress
619  };
620 
621  Mode m;
622 
623  if (std::string(argv[1]) == "-c")
624  m = Mode::Compress;
625  else
626  if (std::string(argv[1]) == "-d")
627  m = Mode::Decompress;
628  else
629  {
630  print_usage(std::string("flag `") + argv[1] + "' is not recognized.");
631  return EXIT_FAILURE;
632  }
633 
634  const std::size_t buffer_size {1024 * 1024};
635 
636  // these custom buffers should be larger than the default ones
637  const std::unique_ptr<char[]> input_buffer(new char[buffer_size]);
638  const std::unique_ptr<char[]> output_buffer(new char[buffer_size]);
639 
640  std::ifstream input_file;
641  std::ofstream output_file;
642 
643  input_file.rdbuf()->pubsetbuf(input_buffer.get(), buffer_size);
644  input_file.open(argv[2], std::ios_base::binary);
645 
646  if (!input_file.is_open())
647  {
648  print_usage(std::string("input_file `") + argv[2] + "' could not be opened.");
649  return EXIT_FAILURE;
650  }
651 
652  output_file.rdbuf()->pubsetbuf(output_buffer.get(), buffer_size);
653  output_file.open(argv[3], std::ios_base::binary);
654 
655  if (!output_file.is_open())
656  {
657  print_usage(std::string("output_file `") + argv[3] + "' could not be opened.");
658  return EXIT_FAILURE;
659  }
660 
661  try
662  {
663  input_file.exceptions(std::ios_base::badbit);
664  output_file.exceptions(std::ios_base::badbit | std::ios_base::failbit);
665 
666  if (m == Mode::Compress)
667  compress(input_file, output_file);
668  else
669  if (m == Mode::Decompress)
670  decompress(input_file, output_file);
671  }
672  catch (const std::ios_base::failure &f)
673  {
674  print_usage(std::string("File input/output failure: ") + f.what() + '.', false);
675  return EXIT_FAILURE;
676  }
677  catch (const std::exception &e)
678  {
679  print_usage(std::string("Caught exception: ") + e.what() + '.', false);
680  return EXIT_FAILURE;
681  }
682 
683  return EXIT_SUCCESS;
684 }
void print_usage ( const std::string &  s = "",
bool  su = true 
)

Prints usage information and a custom error message.

Parameters
scustom error message to be printed
suShow Usage information

Definition at line 582 of file lzw_v6.cpp.

Referenced by main().

583 {
584  if (!s.empty())
585  std::cerr << "\nERROR: " << s << '\n';
586 
587  if (su)
588  {
589  std::cerr << "\nUsage:\n";
590  std::cerr << "\tprogram -flag input_file output_file\n\n";
591  std::cerr << "Where `flag' is either `c' for compressing, or `d' for decompressing, and\n";
592  std::cerr << "`input_file' and `output_file' are distinct files.\n\n";
593  std::cerr << "Examples:\n";
594  std::cerr << "\tlzw_v6.exe -c license.txt license.lzw\n";
595  std::cerr << "\tlzw_v6.exe -d license.lzw new_license.txt\n";
596  }
597 
598  std::cerr << std::endl;
599 }
std::size_t required_bits ( unsigned long int  n)

Computes the minimum number of bits required to store the value of n.

Parameters
nnumber to be evaluated
Returns
Number of required bits.

Definition at line 434 of file lzw_v6.cpp.

Referenced by compress().

435 {
436  std::size_t r {1};
437 
438  while ((n >>= 1) != 0)
439  ++r;
440 
441  return r;
442 }