class Bar
{
private:
float x,y,z;
};
template<class T=DateTime, class D=Bar> :
public ChunkOfData : public Streamable
{
void readBinary( istream& is ) {
// some other is.read() calls
D *datum = new D;
is.read( reinterpret_cast<char*>(datum), sizeof( D ) );
delete datum; // <------------ segmentation fault here
}
};
NOTES:
1. if I do not call is.read( ... ), new and delete are fine
2. if I look at the contents of m_datum after the is.read( ... ), it's ok
3. if I don't use new, it works fine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
class Bar
{
private:
float x,y,z;
};
template<class T=DateTime, class D=Bar> :
public ChunkOfData : public Streamable
{
void readBinary( istream& is ) {
// some other is.read() calls
D x;
is.read( reinterpret_cast<char*>(&x), sizeof( D ) );
} // does not die upon deallocation of automatic variable D x;
};
4. I don't know if this is relevant, but readBinary() is being called inside a operator<<() method.
5. The strange thing is, when I look at the results of is.read(), it all looks good. I only die if I do a new and delete. If I do a memory leak by doing a new and no delete, it runs fine.
6. Also, I don't know if this makes a difference, but I am running this code inside a googletest.
I am running on an amd64 box:
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
If anyone has any ideas on what I can try, please let me know. Thx.
I may have to go to C's fread() and fwrite() if I can't get this to work.
Either a) the call to read() is corrupting your memory, or b) you have some other memory bug elsewhere in the program that's corrupting your memory and it just happens to be kicking in here.
This is new news to me. Does that mean when we open to read binary file, we can only use an array of bytes to contain them e.g fread function call ? Then we iterate through the array element and element to process or interpret them based on our business logic ?
Dunno 'bout the byte array thing. But if you're type isn't a POD, you cannot portably and reliably use raw memory operations on it. That's what stream operations are for.
Each non-POD type must know how to stream itself. It does so by calling it's constituent parts until there's a POD type that can interact with the stream directly.
kbw, I think you have made the most reasonable explanation so far, but I'm still trying to think through if this is the case here.
Bar, in my example of D above, is a POD, but ChunkOfData is not.
1. I have edited the OP: the class that I am trying to stream is called ChunkOfData, and because it inherits from Streamable, it is not POD (it's like SS in that C++0xFAQ). Not only that, ChunkOfData actually contains virtual methods (not shown), so it's also not POD in the sense of SSS (in that C++0xFAQ). However, Bar, as I have defined above, is a POD (no inheritance, no virtual methods) - it is, effectively, a struct.
2. In my requirements, I need to write out an array [] of Bar (or some other POD) contiguously onto disk, in the sense of C's fread() and fwrite() functions (and, I think, C++'s istream::read() and istream::write()...). In other words, I want to practically write out and read in the entire thing in one big chunk and not worry about iteration, as what sohguanh mentioned.
3. I want to manipulate Bar like a C struct (like a POD). I want to write out an array [] of Bar like I write out an array of plain old records or an array of primitive types like int or double. So the part I am still uncertain about is, ChunkOfData is non-POD but Bar is POD. It would seem kbw's explanation is good, but I'm not sure yet - I will have to write some new testing code to see for sure.
If this turns out to the the problem, this would be one freakish explanation for a segfault that I have not seen before.
I wonder if C++0x resolves this POD issue in templates. I mean, is there a way I could specify or restrict D in the template to be of a POD type? Or is it an istream problem? Should istream somehow know if it is receiving a POD-type in its argument? The char* casting actually causes some issues.
I will write some test programs and report back a little later on what I find - ie whether the POD/non-POD issue is causing my segfault. Thank you.
kbw, you were right!!! Helios may be right, also - I have just verified that there is a POD issue, but that does not preclude the possibility of heap corruption!
Short Answer: I found that even though Bar does not inherit from anything, it does have a virtual destructor (I think I left it in a few weeks back while modifying code) which makes it non-POD, after all! So trying to read a contiguous array of Bar is no good. Though, oddly, writing a contiguous array of Bar (when I wrote out the binary file) seemed to work fine.
Long Answer: I debugged the code by writing a trivial class NewBar which just has data members (effectively a struct/record). I removed templates and all the subclassing garbage and just wrote the binary file reading code inside a simple function. Everything worked fine, even with the new's and deletes. When I replaced NewBar with Bar, I got a segfault, which suggested that I needed to take another look at Bar - that's when I found the virtual destructor.
This problem was rather scary for two reasons:
1. This segfault, technically, was not caused by a bad use of new/delete, but rather the improper interactive between POD and non-POD requirements - the compiler currently cannot catch this and in general, it doesn't seem to be well documented. Thanks to kbw's reference to that FAQ, I will definitely watch out for PODs whenever I play around with memory.
2. Under my current situation, if someone decides to make Bar virtual for subclassing or other reasons, my code will die horribly during run-time. It would be nice if it died during compile time (maybe C++0x addresses this?).
This is yet another example that I definitely need a unit-test for this to catch any modification in the code before something bad happens.
Thanks everyone (especially kbw for that fantastic insight!) - I really learned something new about C++ with this example!
Under my current situation, if someone decides to make Bar virtual for subclassing or other reasons, my code will die horribly during run-time. It would be nice if it died during compile time (maybe C++0x addresses this?).
Nope. If you make explicit pointer casts, you accept the possibility of shooting yourself in the foot.
This is a really bad way of doing serialization, anyway.
Nope. If you make explicit pointer casts, you accept the possibility of shooting yourself in the foot.
This is a really bad way of doing serialization, anyway.
Helios, say I want to serialize Bar[] where I know how many members are in the contiguous array.
How do you suggest that I do it? Could you write some quick sample C++ code that's as efficient (or close to as efficient) as C's fread()/fwrite()?
I am guessing that in the proper C++ way, I need to stream them out one at a time (I have 1,000,000+ of these objects) to avoid issues if Bar ever changes. Is this right?
This example seems to be a case of C++'s multi-paradigm-ness (C++ OOP mixed with plain-old-memory-based-C) biting me in the derriere.
The proper way to do any serialization is to convert the internal representation to a consistent, platform-agnostic binary representation and write that to the file. Directly dumping memory contents to a file is just asking for trouble. This isn't just in C++. I would be saying the same if you were writing it in C.
oddly, writing a contiguous array ... seemed to work fine
The write didn't crash because you only read the vtable pointer. But the pointer will have turned up in the file. Have you tried to verify the file content? Of course, the read overwrote the vtable pointer so the call to the destructor was vectored to some indeterniate place.
the compiler currently cannot catch this ...
As helios said, the type system was bypassed with your use of reinterpret_cast. There's a huge responsibility associated with each cast. Casting is commonplace in C, but each and every cast in C++ should give you pause for thought.
if someone decides to make Bar virtual for subclassing or other reasons, my code will die horribly
I could not find a way to detect if a class is not a POD type at compile time in the 15 mins I've spent looking thru Loki, so that rules out something you can plug in, or a policy based solution. I agree, the solution does feel unsatistactory.
Helios, I understand what you mean conceptually, but what do you mean specifically?
I should watch out for things like endian-ness or byte-alignment?
I am aware of some of the issues that you are raising. For example, I have 64-bit machines and 32-bit machines so, actually, by dumping directly to a shared disk from a 64-bit OS, I expect to have problems with 32-bit machines. I expect other issues if I am doing so from OSes with different endian-ness.
I guess I have to know the limitations of what I am doing and make sure I stay within those limitations. For example, I could change the code to work on both 32/64 bit machines or have binary files that are readable from Java or C++, but due to real-life limitations, I probably have to restrict my scope somewhat or I cannot finish my project.
I probably should try streaming the objects one-by-one just to see how much of a performance hit I take, since this approach is certainly more generic/portable/platform-agnostic. BTW, one of the main reasons I chose C++ over Java was to take advantage of the speed benefits of being able to play directly with memory and file I/O. AFAIK, streaming objects one by one is the only choice in Java (can't dump memory to disk), which makes it far safer, but less efficient. For caching huge chunks of memory (hundreds of megs to gigs), I thought direct memory to file I/O would work.
Maybe I have optimized too early; only tests will tell.
I totally respect the depth of experience of posters like you or kbw, so if you do have some specific suggestions, I am very open to considering them in the context of my project.
The write didn't crash because you only read the vtable pointer. But the pointer will have turned up in the file. Have you tried to verify the file content? Of course, the read overwrote the vtable pointer so the call to the destructor was vectored to some indeterniate place.
Wow! This is an amazing description of what happened. TYVM, kbw. After turning Bar into a POD (no vtable), the file verification code worked fine. I have not tried the crazy case of turning Bar into a non-POD, but your explanation makes perfect sense - the destructor being vectored into some indeterminate place would certainly explain the OP behavior of crashing with a segfault upon delete!
Thinking aloud, I am wondering why does the compiler need to have a vtable pointer for every object in my Bar array? If I do a:
Bar* barArray = new Bar[100];
to allocate my objects, shouldn't the compiler know that every single one of my objects is strictly a Bar object and not a child of Bar? If that's the case, shouldn't the compiler just keep track of one vtable for barArray rather than a vtable for every element in Bar[]?
Maybe I should take that back. If I do this:
Bar** barPtrArray = new Bar* [100];
then the compiler would need to keep the vtable for each and every element in Bar* since some subclass of Bar could live in each slot. Not so easy to distinguish then.
I think I need to read up on how vtables are kept by C++ compilers...
why does the compiler need to have a vtable pointer for every objectIt clearly doesn't. It may be an locality of reference optimisation or something..., in short I don't know. The ARM had a lot to say on this sort of thing.
I don't think it's a good idea to try to work around implementation characteristics. C++ was designed to be as efficient as C as specified.
I should watch out for things like endian-ness or byte-alignment?
Endianness, yes. Alignment is only an issue for performance, not portability or safety. Also, as you've said, the size of the data matters.
For caching huge chunks of memory (hundreds of megs to gigs), I thought direct memory to file I/O would work.
There's generally an inverse relation between the complexity of an object and how many instances of it are allocated at any given time. In other words, if you need to commit hundreds of MiB, you probably have thousands of simple objects, possibly even PODs. It would be very rare to need to allocate thousands of binary trees that are also tightly related.
(Valid) Optimization relies heavily on making assumptions about your behavior. For example, you can choose to dump memory directly if you can assume that the program that will read the file has a memory model similar to the one that wrote it. If you can't, and you still perform the optimization, you'll just run into incorrect behavior.
I don't think I'm following your reasoning on the vtable part.
shouldn't the compiler know that every single one of my objects is strictly a Bar object and not a child of Bar?
What's the difference?
If that's the case, shouldn't the compiler just keep track of one vtable for barArray rather than a vtable for every element in Bar[]?
What if you wanted to do this? Foo *p=dynamic_cast<Foo *>(barArray+5);
How would the compiler know that barArray points to an array? Even if it did know that, how would it know to which element in the array barArray happens to be pointing at that point in the program, and thus where the vtable is in relation to barArray?
points to a homogenous array or not. If they are, in fact homogenous (all truly Bar objects), a compiler could, in theory, optimize in how it keeps track of vptrs for that array. As a counter-example, if, in fact, the array points to a BarChild1 objects and maybe some BarChild2 objects, then vptrs would have to be kept for every object of an heterogenous array. Apparently, there is some leeway in how a C++ compiler keeps track of vptrs, but I threw in that counter-example suggesting that it may not be as easy I first thought. As your example indicates, in the general case, you really do need to keep a pointer to vptrs for each object in the array.
All this got me thinking. I wonder why sizeof( someInstance ) in C++ doesn't just include the vptr? But I guess if you do this, you would have to assume that the program on the other side (reading) would have to adhere to the exact conventions as your writer - exactly as you suggested in your last comment. If the vptr were included, the reader on the other side couldn't be something like a Java program, for instance. So if sizeof( someInstance) were to include the vptr, it would prevent something like a C++ streaming to a pipe with a Java program on the other side. Maybe when streaming get complex (anything beyond PODs) and one still want to maximize portability is when one should start to consider platforms like CORBA.
There's generally an inverse relation between the complexity of an object and how many instances of it are allocated at any given time. In other words, if you need to commit hundreds of MiB, you probably have thousands of simple objects, possibly even PODs. It would be very rare to need to allocate thousands of binary trees that are also tightly related.
(Valid) Optimization relies heavily on making assumptions about your behavior. For example, you can choose to dump memory directly if you can assume that the program that will read the file has a memory model similar to the one that wrote it. If you can't, and you still perform the optimization, you'll just run into incorrect behavior.
That's an excellent characterization of my problem. In fact, I have millions of very simple, small instances (PODs in fact), and the only data structure I need for them is a contiguous array. No binary trees or other types.
Thank you for taking the time to explain your thoughts - I think your last comment contains some excellent suggestions on how to think about the balance between optimization and writing reliable code!