Does strtok lead to memory leak??

Hi All,

I've been exploring the string library with particular interest to tokenising. Below is a slightly modified example from www.cplusplus.com/reference/string/string/c_str/. My question was: what is going to happen to all that memory space allocated for cstr after it has been processed by strtok? The answer I got was: memory leak...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
//---------------------------------------------------------------------------
#include <iostream>
#include <cstring>
#include <string>
using namespace std;
//---------------------------------------------------------------------------
int main ()
{
  char * cstr, * p, * t;
  string str ("-Please, split this phrase into tokens.");

  cstr = new char [str.size()+1];
  strcpy (cstr, str.data());  //cstr is a copy of str + terminating '\0'

  int count = 0;
  p = strtok (cstr," .,-");
  while (p!=NULL)
  {
	 cout<<p<<endl;  //outputs word by word, ignores punctuation marks
	 count++; if (count==4)t = p;
	 p = strtok(NULL," .,-");  //continues from previous stop
  }

  cout<<endl<<"cstr contains: "<<cstr<<endl; //"-Please"
  delete[] cstr;

  cout<<"t contains: "<<t<<endl; //"phrase" - memory not released!!

  cout<<"str contains: "<<str<<endl<<endl; //full phrase intact

  system("pause");
  return 0;
}
//--------------------------------------------------------------------------- 


Has anyone come to think about it ever before? Is there an explanation for this or should one simply avoid using strtok for saving the memory grace?
Last edited on
strtok returns a pointer into the original buffer passed to it; you should not attempt to free it.
Since cstr is freed via delete [], there is no memory leak.
But as written, it will crash, because at line 20 count++; if (count==4)t = p;, so
t points into the cstr buffer, but cstr] is deleted at line line 25, but we try
to dereference t at line 27.

Move delete[]cstr to line 30 or later
Last edited on
Thanks for the prompt replies,

jsmith: That's right I would have thought so when freeing the memory at line 25...

guestgulkan: That is the problem: it does NOT crash and outputs the 4th word AFTER the memory has been released.

Have you run this code? What results do you get?
Ah, it is to do with the way MSVC works.
At least In debug mode - the delete function fills in the buffer being deleted with the special character MSVC uses to denote an 'unitialized' variable.
So 99.99% of the time it crashes as it has trouble finding a zero value to terminate the t
string.

In release mode, it doesn't do this, so although the memory is marked as released, the data still remains there for an undefined length of time - until it gets overwritten.reused by the next memory allocation request.

So remember:
the results of using memory that has been deallocated is undefined.
Last edited on
Funny enough, I'm using Turbo CPP, not MSVC.

And it hasn't crashed once, apart from yielding a strange result ("toke,") with count==6.

But the next point you make is the answer I was looking for and started to suspect: some data remains there without being physically deleted , but ready to be overwritten.

Thanks for the tip, guestgulkan!
Turbo C is an old 16-bit DOS program generator -- meaning it runs in the 16-bit emulator -- which emulates, as much as possible, the 'I can touch anything' mode of the 80x86.

Even so, just because your program doesn't crash when you say
1
2
int xs[ 20 ];
xs[ 22 ] = 42;

does not mean that there isn't a problem. All it means is that you are lucky enough that xs[ 22 ] belongs to your program in a writable section of memory.

That can change depending on the compiler you use, its version, the computer you execute it on, the time of day, the OS version and updates, etc.

"Just because it works for me" is not a valid reason to permit it. To reiterate: the results of using memory that has been deallocated is undefined.


The confusion you have about the memory is normal, but the conclusion your are drawing is incorrect. When you delete an item, it is immediately unavailable to your program. The physical RAM boards sitting on your motherboard are still there, though... so if your program still has access rights to that section of memory it can still be accessed.

It is because the C/C++ library that you asked to give you memory (with new) has to have memory that it can give to you. At first, it doesn't. So it asks the OS for a chunk of memory it can use. Then it partitions a piece of that chunk to you to satisfy your new request.

Getting memory from the OS (the 'true owner' of all memory on the system) is not a quick and easy thing to do, so the memory manager will get large chunks from the OS at a time, and partition it into the little pieces that your program uses as needed.

When you call delete, the memory manager may not be done with that large chunk (there could be other small pieces still in use). However, it could also say, "hey, none of this is in use anymore!" and actually release that chunk of RAM back to the OS. After that time, if you still try to access that memory then your program will crash and your users will start hating you.

There is a difference between allocating and deallocating memory and simpy physically existing. You don't have any rights to the memory unless it has been allocated to your exclusive use.


strtok
The problem with strtok() is that it modifies your source string. You began handling it the proper way: get the string from the user, duplicate it, and tokenize the duplicate. Once done, free the duplicate. In all cases, there is never a need to access memory you don't have rights to. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void print_tokens( const char* str, const char* separators )
  {
  /* First we create our duplicate */
  char* s = strdup( str );

  /* We tokenize over the duplicate */
  char* token = strtok( s, separators );
  while (token)
    {
    puts( token );
    token = strtok( NULL, separators );
    }

  /* Now we're done: destroy the duplicate */
  free( s );
  }

int main()
  {
  print_tokens( "-Please, split this phrase into tokens.", " .,-" );
  return 0;
  }

I'm not much of a fan of strtok(). I think it could have been written in a better way...
In C++, at least, there are many ways better to do it.

Hope this helps.
Last edited on
Topic archived. No new replies allowed.