Exercise: Write a Huffman compressor and decompressor.
Huffman is a compression algorithm. This particular algorithm is an entropy encoder, that stores a series
of values (e.g. bytes from a file) in the least amount of bits, depending on frequency. This is opposed to
dictionary compression, which compresses by finding repeating strings of bytes in a file. Both forms of
compression complement each other, typical compression programs like winzip or winrar first do
dictionary compression followed by entropy encoding. So an entropy encoding like Huffman (an
alternative is arithmic coding) purely is used for representing data in the least amount of bits once the
repetition has already been taken out of the data.
Entropy encoding assigns sequences of bits to values, depending on how frequently that value occurs.
So, for english text, it would want to use less bits for "e" than for "q", because that way the overall text
can be represented in the least amount of bits. When decompressing, you can find the original values
from the bits. To make this possible, the bits assigned must be unique. How can we do this?
For compression, go through these steps:
• loading a file into memory (be sure to use binary mode). As an example use the included text file
shakespeare.txt, the fact that this is a text file will help with debugging, but your
algorithm should work with ANY file.
• Loop through all bytes of the file, counting the frequency of occurrence of each byte. You might
want to print out the resulting table of frequencies to get an impression of what you’re doing.
• Next, to assign a bit sequence to each character. The easiest way to guarantee they are unique
and optimal is by constructing a binary tree. You’ll need both leaf nodes (that contain a
character) and nodes that have two child nodes (hence, binary).
o Create a leaf node out of each byte that has a frequency > 0, and put them in a list of
nodes
o Find in your list of nodes the two nodes that have the lowest frequency, and take them
out of the list. Create a new node (and insert it into list) and make these two nodes its
children. Make the frequency of this new node the sum of its children.
o Repeat until the list contains just 1 node, the root of the tree.
• Once you have the tree, recurse through the tree in depth‐first order, and collect the list of bits
as a path from the root (where 0 means left branch, and 1 means right). Whenever you arrive at
a leaf, the list is the set of bits for that value. Store the bits. Note that the number of bits is
variable, it can easily be more than 8 for the less frequent values.
• Now loop through all the bytes in the file again. For each byte, write out the corresponding bits
to a file. To write bits to a file, you’ll need to accumulate them in bytes at a time… this requires
the use of bit operators.
• To be able to decompress the file, you’ll also need to store the tree in the file. The easiest way
to write a tree is to write it depth‐first, post order, that way you can reconstruct it using a stack.
Decompression is a lot simpler:
• Reconstruct the tree from the file. If you read a leaf, put it on a stack, if you read a node, take 2
children from the stack and put the node back on. This should leave you just the root on the
stack when you are done.
• To decompress, read bits from the remainder of the file, and use each bit to walk down the tree
(0 means go left, 1 means go right). Whenever you come to a leaf, write the value of that leaf to
the output file, and start again at the root of the tree.
The challenge, of course, is to be able to run the two algorithms back to back, and ensure that the final
output file is identical to the original input. Try this with a bunch of different files, if the files ever end up
different, there is a bug in your code. The exercise is not done until you have fixed these bugs.
I'm an amatuaer programmer who desires a job as professional programmer. I am really trying to improve my C++ programming skills.
I will post the complete source code to my solution. I would really like some professional programmers to critique my code for efficiency and my incorrect or correct use of C++ in terms of professional standards. I'll take critiques on my solution correctness but I'm most interested in how to improve my use of C++ and if my code has any big efficiency problems. So inspect my code, play with it compile it, and I prefer it to not be copied but used as a reference for anyone else needing a huffman solution. Give me feedback ! Thanks in advanced.
//main.cpp
#include "FileLoader.h"
#include "CharFreqInVector.h"
#include "LeafNode.h"
#include "TreeFromListConstruction.h"
#include "GenerateHuffFile.h"
#include "Decompressor.h"
#include <crtdbg.h>
#include <list>
#include <fstream>
int main()
{
{
vector<char> vFile;
vector<char> cvFile;
vector<char> charByte;
vector<int> frequency;
FileLoader File("shakespeare.txt");
vFile = File.loadFileIntoVector();
CharFreqInVector Frequency(vFile);
Frequency.calculateFrequency();
charByte = Frequency.getEachChar();//Frequency.getEachChar(); // need copies of the char and its frequency to put to a LeafNode
frequency = Frequency.getEachFrequency();//Frequency.getEachFrequency();
TreeFromListConstruction tree(charByte, frequency);
//passing into a class that can modify the list into a binary tree
tree.formTree(); //modify the list into a binary tree
GenerateHuffFile Huff(tree.getTree(), vFile, "compressed.txt",charByte, frequency);
Huff.writeHuffFile();
///////////////////////////////////////////////////////////////////////////////////////decompression time
FileLoader cFile("compressed.txt");
cvFile = cFile.loadFileIntoVector();
CharFreqInVector Formated(cvFile);
Formated.calcFreqInCompressed();
charByte = Formated.getEachChar();//Frequency.getEachChar(); // need copies of the char and its frequency to put to a LeafNode
frequency = Formated.getEachFrequency();//Frequency.getEachFrequency();
TreeFromListConstruction recontructedTree(charByte, frequency);
recontructedTree.formTree();
Decompressor uncompress(recontructedTree.getTree(), cvFile, "compressed.txt", "decompressed.txt", vFile.size());
uncompress.remakeFile();
}
cout << _CrtDumpMemoryLeaks(); //no memory leaks yay!
system("pause");
return 0;
}
//GenerateHuffFile.h
#ifndef GENERATEHUFFFILE_H
#define GENERATEHUFFFILE_H
#include "LeafNode.h"
#include "string"
class GenerateHuffFile
{
public:
GenerateHuffFile(vector<Node*> *aList, vector<char> vFile, string outPutFile,vector<char> charBytes, vector<int> frequencies);
void writeHuffFile();//writes the true and bits to file
void writeTreePortionToFile();//writes the tree into a file
void writeBitPortionToFile();//writes the bits to a file
void writeBits(vector<bool> bitVector,int i);
void convertBoolVecToBits();//turns the boolean vector representing bits to real bits to write to file
string getBitsInChar(char dummyByte);
private:
vector<Node*> *myList;
vector<char> vectorFile;
vector<char> charBytes ;
vector<int> frequencies ;
string outputFile;
ofstream fout;
char dummyByte;
int bitCounter;
int bitPosition;
int totalBitsInFile;
staticconstint bitsInAByte = 8;
};
#endif
//LeafNode.cpp
#include "LeafNode.h"
LeafNode::LeafNode(char character, int frequency)
{
characterByte = character;
frequencyNumber = frequency;
isNode = false;
}
LeafNode::LeafNode(char character)
{
characterByte = character;
}
void LeafNode::ConstructBitVector()
{
bitVector.push_back(bitOn); //give its own bit to the bit vector
}
void LeafNode::displayNodeData()
{
cout <<"char:" << characterByte << " freq:" << frequencyNumber << endl;
cout << "The bits assigned to this leaf node are: ";
for(intunsigned i = 1; i < bitVector.size(); i++) //display the bits assigned to the leaves and start on 1 to not include the root node's bit
{
cout << bitVector[i];
}
cout << endl;
}
void LeafNode::displayNodeData(ofstream *fout)
{
*fout << characterByte << frequencyNumber << ',';
/*bitVector.push_back(bitOn); //give its own bit to the bit vector
*fout << "The bits assigned to this leaf node are: ";
for(int unsigned i = 1; i < bitVector.size(); i++) //display the bits assigned to the leaves and start on 1 to not include the root node's bit
{
//cout << "\n bit Vector size is:" << bitVector.size() << endl;
*fout << bitVector[i];
}
*fout << endl;*/
}
// TreeFromListConstruction.cpp
#include "TreeFromListConstruction.h"
#include <cassert>
TreeFromListConstruction::TreeFromListConstruction(vector<char> charByte, vector<int> frequency)
{
for(unsignedint i = 0; i < charByte.size(); i++)
{
myList.push_back(new LeafNode(charByte[i], frequency[i]));
}
characterByte = charByte;
}
void TreeFromListConstruction::formTree()
{
// set least to first element
Node *temp = 0 ;
int leastFrequency = myList[0]->getFrequencyNumber();
int leastIndex = 0;
int loopTwice = 0;
int loopN = 0;
while(myList.size() > 1 )
{
while(loopTwice < 2) //loop twice to get lowest two
{
for(unsignedint i = 0; i < myList.size(); i++)
{
if(myList[i]->getFrequencyNumber() < leastFrequency)//if we find a frequency in the list lower then our current least then set it to least
{
leastIndex = i;
leastFrequency = myList[i]->getFrequencyNumber();
}
}
//// //cout << "the least frequncy is: " << leastfrequency << endl;
if(myList[leastIndex]->getIsNode())
{
lowestTwoFreqs.push_back(new Node);
}
else
{
lowestTwoFreqs.push_back(new LeafNode(myList[leastIndex]->getCharacterByte()));
}
*lowestTwoFreqs[lowestTwoFreqs.size() - 1] = *myList[leastIndex];
delete myList[leastIndex];
myList.erase(myList.begin()+ leastIndex);
if(myList.size() > 0)
{
leastFrequency = myList[0]->getFrequencyNumber();//initialize a few of these guys back
}
loopTwice++;
leastIndex = 0;
}
myList.push_back (new Node(lowestTwoFreqs[lowestTwoFreqs.size() - 1], lowestTwoFreqs[lowestTwoFreqs.size() - 2]));
loopN++;
loopTwice = 0;
}
myList[0]->clearBitVector(); //clear the root node's bit vector so it will not contribute to the leaf node bit assignment
for(unsignedint i = 0; i < myList.size(); i++)
{
myList[i]->ConstructBitVector();
}
/*for(unsigned int i = 0; i < myList.size(); i++)
{
myList[i]->displayNodeData();
}*/
}
void TreeFromListConstruction::deleteMemory()
{
for(unsignedint i = 0; i < myList.size(); i++)
{
delete myList[i];
}
}
TreeFromListConstruction::~TreeFromListConstruction()
{
deleteMemory();
for(unsignedint i = 0; i < lowestTwoFreqs.size(); i++)//destroy list of lowestfreqs
{
delete lowestTwoFreqs[i];
}
}
shakespeare.txt:
SCENE I. Orchard of Oliver's house.
Enter ORLANDO and ADAM
ORLANDO
As I remember, Adam, it was upon this fashion
bequeathed me by will but poor a thousand crowns,
and, as thou sayest, charged my brother, on his
blessing, to breed me well: and there begins my
sadness. My brother Jaques he keeps at school, and
report speaks goldenly of his profit: for my part,
he keeps me rustically at home, or, to speak more
properly, stays me here at home unkept; for call you
that keeping for a gentleman of my birth, that
differs not from the stalling of an ox? His horses
are bred better; for, besides that they are fair
with their feeding, they are taught their manage,
and to that end riders dearly hired: but I, his
brother, gain nothing under him but growth; for the
which his animals on his dunghills are as much
bound to him as I. Besides this nothing that he so
plentifully gives me, the something that nature gave
me his countenance seems to take from me: he lets
me feed with his hinds, bars me the place of a
brother, and, as much as in him lies, mines my
gentility with my education. This is it, Adam, that
grieves me; and the spirit of my father, which I
think is within me, begins to mutiny against this
servitude: I will no longer endure it, though yet I
know no wise remedy how to avoid it.
ADAM
Yonder comes my master, your brother.
ORLANDO
Go apart, Adam, and thou shalt hear how he will
shake me up.
Enter OLIVER
OLIVER
Now, sir! what make you here?
ORLANDO
Nothing: I am not taught to make any thing.
OLIVER
What mar you then, sir?
ORLANDO
Marry, sir, I am helping you to mar that which God
made, a poor unworthy brother of yours, with idleness.
OLIVER
Marry, sir, be better employed, and be naught awhile.
ORLANDO
Shall I keep your hogs and eat husks with them?
What prodigal portion have I spent, that I should
come to such penury?
OLIVER
Know you where your are, sir?
ORLANDO
O, sir, very well; here in your orchard.
OLIVER
Know you before whom, sir?
ORLANDO
Ay, better than him I am before knows me. I know
you are my eldest brother; and, in the gentle
condition of blood, you should so know me. The
courtesy of nations allows you my better, in that
you are the first-born; but the same tradition
takes not away my blood, were there twenty brothers
betwixt us: I have as much of my father in me as
you; albeit, I confess, your coming before me is
nearer to his reverence.
OLIVER
What, boy!
ORLANDO
Come, come, elder brother, you are too young in this.
OLIVER
Wilt thou lay hands on me, villain?
ORLANDO
I am no villain; I am the youngest son of Sir
Rowland de Boys; he was my father, and he is thrice
a villain that says such a father begot villains.
Wert thou not my brother, I would not take this hand
from thy throat till this other had pulled out thy
tongue for saying so: thou hast railed on thyself.
ADAM
Sweet masters, be patient: for your father's
remembrance, be at accord.
OLIVER
Let me go, I say.
ORLANDO
I will not, till I please: you shall hear me. My
father charged you in his will to give me good
education: you have trained me like a peasant,
obscuring and hiding from me all gentleman-like
qualities. The spirit of my father grows strong in
me, and I will no longer endure it: therefore allow
me such exercises as may become a gentleman, or
give me the poor allottery my father left me by
testament; with that I will go buy my fortunes.
OLIVER
And what wilt thou do? beg, when that is spent?
Well, sir, get you in: I will not long be troubled
with you; you shall have some part of your will: I
pray you, leave me.
ORLANDO
I will no further offend you than becomes me for my good.
OLIVER
Get you with him, you old dog.
ADAM
Is 'old dog' my reward? Most true, I have lost my
teeth in your service. God be with my old master!
he would not have spoke such a word.
Exeunt ORLANDO and ADAM
OLIVER
Is it even so? begin you to grow upon me? I will
physic your rankness, and yet give no thousand
crowns neither. Holla, Dennis!
Enter DENNIS
DENNIS
Calls your worship?
OLIVER
Was not Charles, the duke's wrestler, here to speak with me?
DENNIS
So please you, he is here at the door and importunes
access to you.
OLIVER
Call him in.
Exit DENNIS
'Twill be a good way; and to-morrow the wrestling is.
Enter CHARLES
CHARLES
Good morrow to your worship.
OLIVER
Good Monsieur Charles, what's the new news at the
new court?
CHARLES
There's no news at the court, sir, but the old news:
that is, the old duke is banished by his younger
brother the new duke; and three or four loving lords
have put themselves into voluntary exile with him,
whose lands and revenues enrich the new duke;
therefore he gives them good leave to wander.
OLIVER
Can you tell if Rosalind, the duke's daughter, be
banished with her father?
CHARLES
O, no; for the duke's daughter, her cousin, so loves
her, being ever from their cradles bred together,
that she would have followed her exile, or have died
to stay behind her. She is at the court, and no
less beloved of her uncle than his own daughter; and
never two ladies loved as they do.
OLIVER
Where will the old duke live?
CHARLES
They say he is already in the forest of Arden, and
a many merry men with him; and there they live like
the old Robin Hood of England: they say many young
gentlemen flock to him every day, and fleet the time
carelessly, as they did in the golden world.
OLIVER
What, you wrestle to-morrow before the new duke?
CHARLES
Marry, do I, sir; and I came to acquaint you with a
matter. I am given, sir, secretly to understand
that your younger brother Orlando hath a disposition
to come in disguised against me to try a fall.
To-morrow, sir, I wrestle for my credit; and he that
escapes me without some broken limb shall acquit him
well. Your brother is but young and tender; and,
for your love, I would be loath to foil him, as I
must, for my own honour, if he come in: therefore,
out of my love to you, I came hither to acquaint you
withal, that either you might stay him from his
intendment or brook such disgrace well as he shall
run into, in that it is a thing of his own search
and altogether against my will.
OLIVER
Charles, I thank thee for thy love to me, which
thou shalt find I will most kindly requite. I had
myself notice of my brother's purpose herein and
have by underhand means laboured to dissuade him from
it, but he is resolute. I'll tell thee, Charles:
it is the stubbornest young fellow of France, full
of ambition, an envious emulator of every man's
good parts, a secret and villanous contriver against
me his natural brother: therefore use thy
discretion; I had as lief thou didst break his neck
as his finger. And thou wert best look to't; for if
thou dost him any slight disgrace or if he do not
mightily grace himself on thee, he will practise
against thee by poison, entrap thee by some
treacherous device and never leave thee till he
hath ta'en thy life by some indirect means or other;
for, I assure thee, and almost with tears I speak
it, there is not one so young and so villanous this
day living. I speak but brotherly of him; but
should I anatomize him to thee as he is, I must
blush and weep and thou must look pale and wonder.
CHARLES
I am heartily glad I came hither to you. If he come
to-morrow, I'll give him his payment: if ever he go
alone again, I'll never wrestle for prize more: and
so God keep your worship!
OLIVER
Farewell, good Charles.
Exit CHARLES
Now will I stir this gamester: I hope I shall see
an end of him; for my soul, yet I know not why,
hates nothing more than he. Yet he's gentle, never
schooled and yet learned, full of noble device, of
all sorts enchantingly beloved, and indeed so much
in the heart of the world, and especially of my own
people, who best know him, that I am altogether
misprised: but it shall not be so long; this
wrestler shall clear all: nothing remains but that
I kindle the boy thither; which now I'll go about.
Alright, I've looked through a few huffman solutions before and am currently trying to perfect one at the moment. So far yours works great - the code is nice and neat. Only thing I can say so far Is i think it's possible to shorten it a bit. But we'll find out. I'll go over it tomorrow and let you know what a think. But by the looks of it you got the desired outcome you wanted.