I am interested in creating a fixed size file (~25M) with random data. I am currently considering copying data from /dev/urandom into my file.
I am familiar with the following method of copying a file (first seen in a @JLBorges post). I love the efficiency and the fact that no intermediate buffer needs to be created.
Thank-you @coder777 for the input. Unfortunately, unless I'm missing something, this will not work for me. Feel free to correct me if I'm wrong.
I guess I should spell out my needs a little bit better. I want to write random data to a fixed size file as quickly as possible. The current plan is to allocate a buffer the size of a block on the file system, read from /dev/urandom into it, and write into the file. I will loop until I have written the entire required size (calculating the remainder during the last iteration).
I would like to avoid the intermediate buffer and copy directly from the input file's filebuf into the output file which is what I believe is happening in the sample code I posted in my first post1. However, it appears to only copy the entire contents of the input stream into the output stream. For /dev/urandom, that will be never-ending.
Using a for loop to count values copied into the output file would be more overhead than creating and deleting the buffer. Writing 1 byte (or int or long long) at a time is not the answer. (Correct me if I'm wrong).
I would like to set the output filebuf's epptr, but that seems to be (understandably) inaccessible to me.
Other methods of generating large quantities of random data are welcome, too.
1 There may be a hidden intermediate buffer in this scenario that I don't know about. If that is the case, this is all moot.
Sorry for the obfuscation. I have been overly sensitive to the possibility of sharing protected information and I didn't want to even come close to violating contractual obligations. I guess I can be a little bit more transparent.
We need to (semi-) securely erase files. We will be using a single-pass, random data overwrite of an existing file before deleting it.*
I wanted to open /dev/urandom, open the file-to-be-deleted for R/W, set the file's write pointer to the beginning of the file, and stream the correct number of bytes from urandom to the file. My reason for asking the original question is to try to avoid writing to and reading from a memory buffer in RAM.
Currently we are filling a buffer with random data (from urandom) and then writing to the file. We re-fill the buffer and write it out until the file has been overwritten. I am weighing the idea of filling the buffer one time and using it over and over to fill the file. If this is suitable to our customer, it will probably make this discussion moot. However, if we need to generate "random" data for the whole file, I would love to avoid the buffer and stream directly from urandom to the file.
I also realize that reading from urandom may be more time-consuming than writing to / reading from the buffer and I may be trying to optimize the wrong thing. But if I can't figure out how to do what I want to do, I can't do a comparison test.
*This method of "secure" erase was chosen by our customer and is not really up for discussion. I am only trying to determine the most efficient way to implement it.
Unfortunately the requirement (the method of erasing) is not up for discussion. The question is "Is there a more efficient way of implementing the requirement?"
There can be lots of lively discussion about saving all 0's, all 1's, copying a specific file onto the existing file, etc. I would gladly participate in those discussions were it not for the instructions from our customer. We were instructed to overwrite the file with random data. We're doing it. I'm trying to improve performance.
It's probably okay to create a large buffer and write the same random data over and over.
If you want to maximize performance you should consider using the operating system IO interface (probably read() and write()). Write large amounts of data in each call (probably between 64k and 1MB).
Your goal is to make this process I/O bound, which shouldn't be hard.
Alternate method: flip a perfectly random coin. Heads is 0, tails is 1. Whichever side it lands on, fill the file with only that bit-value. Guaranteed to be perfectly random.
Other than the huge memory leak in line 5 and the fact that I want will use write() in line 10, that is essentially what I will be doing. I was trying to avoid the intermediary buffer so I could copy directly from infile to outfile. (And I don't think that our customer will accept xkcd as a valid support during requirements testing, although I appreciate the reasoning.)
@dhayden,
Thanks for your comments. Reading it from a disinterested 3rd party confirms to me that we should reuse a single random buffer over and over for a single file. We will measure performance improvement using this scenario. Thanks.
@JLBorges,
I'm sorry that I missed your post when it came in. It must have come in while I was responding to @coder777. It is a great solution to the problem as I first posted it. However, as you can see from my later posts, I was a little too cryptic in my requirements. Wanting to overwrite an existing file, dd is probably not the correct answer for me.
It was just example code, I figured you knew to delete[] a 25Mb buffer. I was attempting to keep your code format from your OP.
I don't see a way of solving this problem without a buffer in-between somewhere. rdbuf() still has a buffer, it's just abstracted so you don't see it (fstream maintains its own internal buffer according to the docs on this site).