Oct 22, 2008 at 8:01am Oct 22, 2008 at 8:01am UTC
Hi,
I'm stuck and hope you guys can help me out.
I wrote a program to analyse a file. so basically it opens FileA and finds the wanted data with a While loop till EOF, writes it to FileB... Now i have a problem finding duplicates in that single file as i know one file can only have a fread pointer. Is there other ways to find the duplicates. If anyone is free to help me solve, i can send the whole output file.
Thanks
this is part of the output file.
01 19 2D
01 1D 2C
01 1C 2D
01 1E 2C
01 1D 2D
01 1F 2C
01 1E 2D
01 1B 2D
01 1A 2E
01 1A 2D
01 19 2E
01 18 2D
01 17 2E
01 17 2D
01 16 2E
01 15 2D
01 14 2E
01 13 2D
01 12 2E
01 11 2D
01 10 2E
01 0F 2D
01 0E 2E
01 0D 2D
01 0C 2E
01 0B 2D
01 0A 2E
01 09 2D
01 08 2E
01 08 2D
01 07 2E
01 09 2E
01 08 2F
01 0B 2E
01 0A 2F
01 0D 2E
01 0C 2F
01 0F 2E
01 0E 2F
01 11 2E
01 10 2F
01 13 2E
01 12 2F
01 15 2E
01 14 2F
01 18 2E
01 17 2F
01 1B 2E
01 1A 2F
01 1C 2E
01 1B 2F
01 1D 2E
01 1C 2F
01 1E 2E
02 1D 2F
01 19 2F
01 18 30
49 18 2F
04 17 30
01 16 2F
02 15 30
01 15 2F
01 14 30
01 13 2F
01 12 30
01 11 2F
01 10 30
01 0F 2F
01 0E 30
01 0D 2F
36 0C 30
01 0B 2F
01 0A 30
01 09 2F
01 08 30
01 07 2F
01 09 30
01 0B 30
01 0A 31
01 0D 30
01 0C 31
01 0F 30
01 0E 31
01 11 30
01 10 31
01 13 30
01 12 31
01 16 30
01 15 31
01 19 30
01 18 31
01 1A 30
01 19 31
01 1B 30
01 1A 31
01 1C 30
01 1B 31
01 17 31
01 16 32
01 16 31
01 15 32
01 14 31
01 13 32
01 13 31
01 12 32
01 11 31
01 10 32
01 0F 31
01 0E 32
01 0D 31
01 0C 32
01 0B 31
01 0A 32
01 09 31
01 0B 32
01 0D 32
01 0C 33
01 0F 32
01 0E 33
01 11 32
01 10 33
01 14 32
01 13 33
01 17 32
01 16 33
03 18 32
49 17 33
01 19 32
01 18 33
02 1A 32
01 19 33
01 15 33
01 14 34
81 14 33
01 13 34
01 12 33
01 11 34
01 11 33
01 10 34
01 0F 33
01 0E 34
01 0D 33
01 0B 33
01 0D 34
01 0F 34
01 12 34
01 11 35
01 15 34
01 14 35
03 16 34
01 17 34
01 13 35
01 12 35
01 10 35
01 10 00
01 14 00
1C 13 02
8C 0D 02
01 0D 01
01 18 02
01 19 02
49 1A 03
01 19 03
01 0A 03
19 0B 05
01 10 04
1C 1C 05
1C 1B 06
01 13 05
01 11 05
01 07 06
8C 08 0B
01 0A 0A
01 0B 0B
8C 04 0C
1A 1F 0D
01 20 0D
01 20 0F
01 21 0F
01 1D 10
01 1C 0F
01 09 10
1C 15 11
01 19 10
01 21 10
01 04 11
01 0C 12
88 02 14
01 01 14
8C 02 17
01 23 16
01 22 18
01 02 18
8C 03 19
8C 01 1B
01 04 1A
01 1B 1B
01 12 1B
49 0F 1B
65 0E 1C
8A 03 1C
01 07 1C
1C 14 1D
01 17 1C
01 1E 1E
8C 17 1E
01 15 1D
01 15 1F
01 19 1E
01 1B 1E
01 1B 1F
01 20 22
01 15 21
01 1D 22
01 21 23
01 1D 23
88 04 24
8A 04 23
01 10 25
11 05 26
01 03 28
01 13 28
81 09 29
19 07 29
01 04 2A
8C 1C 2B
01 0C 2C
19 05 2B
02 1D 2F
01 18 2F
01 15 30
01 17 33
02 1A 32
01 14 33
Oct 22, 2008 at 10:02am Oct 22, 2008 at 10:02am UTC
I don't get what you mean by one fread pointer.
I would use fgetc for this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
#include <cstring>
#include <cstdio>
#define LINE_LEN 8
bool read8(FILE* f, char * c)
{
int i;
for (i=0;i<LINE_LEN;i++)
{
do
{
c[i] = fgetc(f);
if (c[i]==EOF)
return false ;
} while (c[i]=='\n' ||c[i]=='\r' );
}
c[LINE_LEN] = '\0' ;
return true ;
}
bool duplicate(FILE* f, char * c)
{
static char dup[LINE_LEN+1];
while (read8(f, dup))
{
if (strcmp(c, dup)==0)
return true ;
}
rewind(f);
return false ;
}
int main()
{
FILE * pFileIn, *pFileOut;
char buff[LINE_LEN+1];
pFileIn = fopen("in.txt" , "r" );
pFileOut = fopen("out.txt" , "w+" );
if (pFileIn==NULL) perror ("Error opening file" );
else
{
while (read8(pFileIn, buff))
{
if (! duplicate(pFileOut, buff))
{
fseek(pFileOut, 0, SEEK_END);
fprintf(pFileOut, "%s\n" , buff);
rewind(pFileOut);
}
}
fclose(pFileIn);
fclose(pFileOut);
}
return 0;
}
I think this way is the most memory efficient, but you should be able to find a faster way.
I would suggest doing this in c++, your code would probably be waaay shorter.
Last edited on Oct 22, 2008 at 10:03am Oct 22, 2008 at 10:03am UTC
Oct 23, 2008 at 7:43am Oct 23, 2008 at 7:43am UTC
hi,
what i meant was i can't read the start and the end of a txt file at the same time right...
anyways do you know how to read the last line of a text file.. i used seek_end... but i just couldn't read the last line.
Oct 23, 2008 at 6:51pm Oct 23, 2008 at 6:51pm UTC
Your best bet is to read the file into memory, then work on it in memory.
Create a vector<string> vFileList;
and load each line into that with vFileList.push_back(sCurrentLine);
. Once you've loaded the whole file then you can work on it from the vector with no problems.
Edit: If you just wanna find duplicate lines the fastest way possible. Use a Map. As you load each line increment the value of it in the map by 1.
map<string, int > mLinesCounter;
:)
Last edited on Oct 23, 2008 at 6:52pm Oct 23, 2008 at 6:52pm UTC
Oct 28, 2008 at 7:49pm Oct 28, 2008 at 7:49pm UTC
I think seek_end will take you past the last line.
If your file is not unicode I'm guessing it would be fseek(FILE, -8, SEEK_END)
Oct 29, 2008 at 1:46am Oct 29, 2008 at 1:46am UTC
++to zaitas solution, nice way of doing it.