Remove char from file

Hi, i am doing some work with files in C/C++ and i wonder if there is some way, how to remove a word (array of chars) from a file. Thanks for any suggestions.
AFAIK you have to rewrite the hole file from the position where you wanted to delete your word.
Last edited on
Unless you are working with really big files it is usually worth it to just load the entire file into memory, make the changes you want, and overwrite it back to disk.
The problem is, that I am working with really big files, so that is why I am trying to think of another solution.
Since files aren't the fanciest of data structures, all you can do is rewrite the entire thing past the modification.
Hmm, huge files... If all you are doing is removing data, then you've got it made.

A B C D E F G H I J K & & & L M N O P Q R S T U V W X Y Z

To remove: & & &
Needed: A buffer at least as large as the stuff to remove (preferrably as large as possible). For this example, I'll use a 6 character buffer.
Step 1: find the stuff to remove:

A B C D E F G H I J K & & & L M N O P Q R S T U V W X Y Z
^
[. . . . . .]

Step 2. read past it and note its length

A B C D E F G H I J K & & & L M N O P Q R S T U V W X Y Z
^ length = 3
[. . . . . .]

Step 3. read till your buffer is full or EOF. Remember how many you read.

A B C D E F G H I J K & & & L M N O P Q R S T U V W X Y Z
^
[L M N O P Q]

Step 4. seek backward (removed-length + buffer-length)

A B C D E F G H I J K & & & L M N O P Q R S T U V W X Y Z
^ seek( -9 )
[L M N O P Q]

Step 5. write your buffer

A B C D E F G H I J K L M N O P Q O P Q R S T U V W X Y Z
^
[L M N O P Q]

Step 6. seek forward the (removed-length)

A B C D E F G H I J K L M N O P Q O P Q R S T U V W X Y Z
^ seek( 3 )
[L M N O P Q]

Step 7. goto step 3 and repeat until eof

A B C D E F G H I J K L M N O P Q O P Q R S T U V W X Y Z
^
[R S T U V W]

A B C D E F G H I J K L M N O P Q O P Q R S T U V W X Y Z
^
[R S T U V W]

A B C D E F G H I J K L M N O P Q R S T U V W U V W X Y Z
^
[R S T U V W]

A B C D E F G H I J K L M N O P Q R S T U V W U V W X Y Z
^
[R S T U V W]

A B C D E F G H I J K L M N O P Q R S T U V W U V W X Y Z
^ EOF
[X Y Z . . .]

Step 8. remember that you may have not filled the buffer entirely when you hit EOF. Seek backward (removed-length + bytes-read)

A B C D E F G H I J K L M N O P Q R S T U V W U V W X Y Z
^ seek( -6 )
[X Y Z . . .]

Step 9. write what remains

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z X Y Z
^
[X Y Z . . .]


Finally, you may notice that there are still extra data at the end of the file. There is no standard way to truncate the file. You will have to use an OS-specific method. In any case, first close the file. We will be using OS-specific streams now.

Win32
Compile with or without Unicode support. Works for C or C++.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <stdint.h>
#ifndef uint64_t
  #error I need uint64_t from <stdint.h> (ISO C99 and ISO C++)
#endif

#include <windows.h>

BOOL shorten_file( LPCTSTR filename, long bytes )
  {
  HANDLE filehand;
  uint64_t filesize;
  DWORD result;
  WIN32_FILE_ATTRIBUTE_DATA filedata;

  /* Learn the file's length */
  if (!GetFileAttributesEx( filename, GetFileExInfoStandard, &filedata )) return FALSE;
  filesize = (filedata.nFileSizeHigh << 32) +filedata.nFileSizeLow;

  /* Calculate the new length */
  filesize -= bytes;
  filedata.nFileSizeHigh = filesize >> 32;
  filedata.nFileSizeLow  = filesize & 0xFFFFFFFFL;

  /* Modify the file's length */
  filehand = CreateFile(
    filename,
    GENERIC_READ|GENERIC_WRITE,
    0,
    NULL,
    OPEN_EXISTING,
    FILE_FLAG_RANDOM_ACCESS,
    NULL
    );
  if (filehand == INVALID_HANDLE_VALUE) return FALSE;

  result = SetFilePointer(
    filehand,
    filedata.nFileSizeLow,
    &filedata.nFileSizeHigh,
    FILE_BEGIN
    ) != 0xFFFFFFFF;

  if (result) result = SetEndOfFile( filehand );

  CloseHandle( filehand );
  return result;
  }


Linux
You need large file support (LFS) to work with >2GB files, both in the Kernel and glibc, which I presume you already know.

Compile with -D_FILE_OFFSET_BITS=64 if you know for sure that will work, or to be more general, make sure to use getconf. See
http://www.suse.de/~aj/linux_lfs.html

Modern Linuxes take UTF-8, so the first argument will work for both ASCII and Unicode. Works with C or C++. Requires XSI-compliance.
1
2
3
4
5
6
7
8
9
10
11
12
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int shorten_file( const char *filename, long bytes )
  {
  off_t filesize;
  struct stat filedata;
  if (stat( filename, &filedata )) return 0;
  filesize = filedata.st_size -bytes;
  return truncate( filename, filesize ) == 0;
  }


Whew. Good luck!
Topic archived. No new replies allowed.