Hey guys. So I still didn't get round to trying to implement the LZSS algorithm because I didn't understand it. After finding out what the terms used in an explanation I found meant, I decided to give it another go.
I've written most of the function, now, but I still can't understand what to do here or how to do it (the second line is the only one troubling me anymore)
* Place the coding position to the beginning of the input stream;
* find the longest match in the window for the lookahead buffer:
* P := pointer to this match;
* L := length of the match;
* is L >= MIN_LENGTH?
* YES: output P and move the coding position L characters forward;
* NO: output the first character of the lookahead buffer and move the coding positon one character forward;
* if there are more characters in the input stream, go back to step 2. |
And as an explanation I found on the same website of what those terms actually mean:
* Input stream: the sequence of characters to be compressed;
* Character: the basic data element in the input stream;
* Coding position: the position of the character in the input stream that is currently being coded (the beginning of the * lookahead buffer);
* Lookahead buffer: the character sequence from the coding position to the end of the input stream;
* The Window of size W contains W characters from the coding position backwards, i.e. the last W processed characters;
* A Pointer points to the match in the window and also specifies its length. |
Now it's specifically the second line of those instructions I'm having a hard time understanding.
find the longest match in the window for the lookahead buffer |
What is referred to by "the longest match?" and how am I supposed to find it?
Should I read a couple of characters (say, 4) and try to find where they're repeated (p = strstr(src, s)) or what?
Presumably the "longest match" would be the longest string that is repeated, yes? Then we're working backwards from the longest string to the shortest (the shortest thats >= minlength anyway). But what I'm unable to understand is how to find the longest match. Surely I'm not supposed to just read a bunch of characters, find a matching string and go from there? It says to find the longest match, which would imply that I should try to find the longest string that has a matching string. How would I do that?
If it helps, here's what I've written
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
|
#ifndef _SIZE_T
#define _SIZE_T
typedef unsigned long size_t;
#endif /* ! _SIZE_T */
/* Here's a way of breaking dependancy on NULL existing */
#ifndef _NULL
# ifdef NULL
# define _NULL NULL
# else
# define _NULL ((void*)0)
# endif /* NULL */
#endif /* ! _NULL */
#ifndef DEFAULT_MINLENGTH
# define DEFAULT_MINLENGTH 4
#endif
#ifndef DEFAULT_N
# define DEFAULT_N 4096
#endif
/* lzss_com: compress and then copy at most n bytes from source to dest,
* assume dest is big enough. Returns the amount of bytes from src compressed,
* so pos should == n if src contained >= n bytes.
*
* @dest: destination buffer to place the compressed data into
* @src: a [null-terminated] source buffer to compress
* @n: maximum amount of bytes from src to compress
* (pass <= 0 for default, which is #defined as DEFAULT_N)
* @minlength: minimum string length to be compressed
* (pass <= 0 for default, which is #defined as DEFAULT_MINLENGTH)
*/
size_t lzss_com(char* dest, const char* src, size_t n, size_t minlength) {
size_t pos = 0; /* Store the current position in src */
const size_t src_len = strlen(src);
char* laBuf = _NULL, /* laBuf: the characters in src from pos to src_len */
* window = _NULL; /* Window: the last (pos) characters processed */
/* Check n and minlength for correctness; if <= 0, defaults should be used
* as specified above. */
if (n <= 0)
n = DEFAULT_N;
if (minlength <= 0)
minlength = DEFAULT_MINLENGTH;
/* Compress at most n bytes from src in a single loop: */
while (pos++ < n) {
window = (char*)(src + pos);
char* p;
size_t len = strlen(p);
/* Check len against minlength to see if we should bother
* compressing it */
if (len >= minlength) {
while (len--)
*dest++ = *p++;
} else {
*dest++ = *(src + pos++);
}
}
return pos; /* pos should == n */
}
| |
Thanks.
Edit: the reason p is uninitialized is because I don't know what to do to initialize it.