Still can't understand the LZSS algorithm

Hey guys. So I still didn't get round to trying to implement the LZSS algorithm because I didn't understand it. After finding out what the terms used in an explanation I found meant, I decided to give it another go.

I've written most of the function, now, but I still can't understand what to do here or how to do it (the second line is the only one troubling me anymore)
* Place the coding position to the beginning of the input stream;
* find the longest match in the window for the lookahead buffer:
* P := pointer to this match;
* L := length of the match;
* is L >= MIN_LENGTH?
* YES: output P and move the coding position L characters forward;
* NO: output the first character of the lookahead buffer and move the coding positon one character forward;
* if there are more characters in the input stream, go back to step 2.


And as an explanation I found on the same website of what those terms actually mean:
* Input stream: the sequence of characters to be compressed;
* Character: the basic data element in the input stream;
* Coding position: the position of the character in the input stream that is currently being coded (the beginning of the * lookahead buffer);
* Lookahead buffer: the character sequence from the coding position to the end of the input stream;
* The Window of size W contains W characters from the coding position backwards, i.e. the last W processed characters;
* A Pointer points to the match in the window and also specifies its length.


Now it's specifically the second line of those instructions I'm having a hard time understanding.
find the longest match in the window for the lookahead buffer

What is referred to by "the longest match?" and how am I supposed to find it?

Should I read a couple of characters (say, 4) and try to find where they're repeated (p = strstr(src, s)) or what?

Presumably the "longest match" would be the longest string that is repeated, yes? Then we're working backwards from the longest string to the shortest (the shortest thats >= minlength anyway). But what I'm unable to understand is how to find the longest match. Surely I'm not supposed to just read a bunch of characters, find a matching string and go from there? It says to find the longest match, which would imply that I should try to find the longest string that has a matching string. How would I do that?

If it helps, here's what I've written
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#ifndef _SIZE_T
#define _SIZE_T
typedef unsigned long size_t;
#endif /* ! _SIZE_T */

/* Here's a way of breaking dependancy on NULL existing */
#ifndef _NULL
# ifdef NULL
#  define _NULL NULL
# else
#  define _NULL ((void*)0)
# endif /* NULL */
#endif  /* ! _NULL */

#ifndef DEFAULT_MINLENGTH
# define DEFAULT_MINLENGTH 4
#endif

#ifndef DEFAULT_N
# define DEFAULT_N 4096
#endif

/* lzss_com: compress and then copy at most n bytes from source to dest,
 * assume dest is big enough. Returns the amount of bytes from src compressed,
 * so pos should == n if src contained >= n bytes.
 *
 * @dest:      destination buffer to place the compressed data into
 * @src:       a [null-terminated] source buffer to compress
 * @n:         maximum amount of bytes from src to compress
 *             (pass <= 0 for default, which is #defined as DEFAULT_N)
 * @minlength: minimum string length to be compressed
 *             (pass <= 0 for default, which is #defined as DEFAULT_MINLENGTH)
 */
size_t lzss_com(char* dest, const char* src, size_t n, size_t minlength) {
    size_t pos = 0; /* Store the current position in src */
    const size_t src_len = strlen(src);
    char*  laBuf  = _NULL, /* laBuf: the characters in src from pos to src_len */
        *  window = _NULL; /* Window: the last (pos) characters processed */
        
    /* Check n and minlength for correctness; if <= 0, defaults should be used
     * as specified above. */
    if (n <= 0)
        n = DEFAULT_N;
    
    if (minlength <= 0)
        minlength = DEFAULT_MINLENGTH;
    
    /* Compress at most n bytes from src in a single loop: */
    while (pos++ < n) {
        window = (char*)(src + pos);
        char*  p;
        size_t len = strlen(p);
        
        /* Check len against minlength to see if we should bother
         * compressing it */
        if (len >= minlength) {
            while (len--)
                *dest++ = *p++;
        } else {
            *dest++ = *(src + pos++);
        }
    }
    
    return pos; /* pos should == n */
}


Thanks.

Edit: the reason p is uninitialized is because I don't know what to do to initialize it.
Last edited on
well u can refer to the book "Data Compression" by khalid .
You still haven't figured this out? Seriously? And I even gave you a heavily commented finished version...

find the longest match in the window for the lookahead buffer
In other words from the current reading position in the input buffer, find the longest string that is also in the sliding window buffer.
For example, if the string that begins at the reading position is "123456" and the sliding window contains "123012123461", the bold part would be the longest match. That's what you need to find.
well u can refer to the book "Data Compression" by khalid .

I'll think about it.

You still haven't figured this out? Seriously?

Well, I've been doing other things that I could do already, such as writing the rest of the program. I've only just now come to writing the compressor and decompressor.

And I even gave you a heavily commented finished version...

You did?

In other words from the current reading position in the input buffer, find the longest string that is also in the sliding window buffer.
For example, if the string that begins at the reading position is "123456" and the sliding window contains "123012123461", the bold part would be the longest match. That's what you need to find.

Thanks, that's all I need is a good explanation. I don't like copying other people's code anyhow... Thanks.

So I just have to read a bunch of characters (how do I know how many?), and then search the window buffer for that string, and replace it with the position and length.
Last edited on
You did?
Yes... Some time ago in a thread in Lounge.

how do I know how many?
Depends on the parameters of the algorithm. This was addressed in that post.
Yes... Some time ago in a thread in Lounge.

Oh. I don't remember that, I must not have noticed it.

Depends on the parameters of the algorithm. This was addressed in that post.

So should I take minlength characters?
closed account (z05DSL3A)
The LZSS algorithm
http://www.cplusplus.com/forum/general/17012/
Topic archived. No new replies allowed.