The "best" way to return an aggregate value?

Hello, I'm new to the forum.

I'd like to get this straight:

When you return, say, a class object from a function by value, in principle (depending on the compiler) a copy is made of this object on the heap (if it reserves dynamic memory) and this is costly. Like this:

std::string f(void)
{
std::string str("something"); //First allocation
...
return str; //Second allocation. Copy is made on the heap, costly
}

So it's probably wiser to return a pointer (or reference):

std::string* f(void)
{
std::string* str = new std::string("something"); //Allocation
...
return str; //Copy of the pointer is made only, on the stack (fast!)
}

However, then you loose the "security" of memory handling. I mean you have to take care of freeing it in the calling function (which may be complicated and error prone).

Then wouldn't it be the best "compromise" to return an aggregate value by returning an std::auto_ptr (or e.g. boost::shared_array if you have an array) referring to it? Those auto-pointers are aggregate values, but they are quite small and afaik don't reserve dynamic memory (thus they'd be created on the stack), so wouldn't making copies of them be fairly fast? And also you would not have to worry about freeing memory in the calling function.

So the "best" way to return an aggregate value would be:

auto_ptr<std::string> f(void)
{
auto_ptr<std::string> str(new std::string("something")); //Allocation
...
return str; //Copy of the auto_ptr is made, on the stack???
//(No dynamic memory needed?)
}

If the last function was compiled with GNU gcc and -Waggregate-return, there would still be warnings, so you would not get rid of those, but in every other way, wouldn't this be better?
http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#Warning-Options :
-Waggregate-return
Warn if any functions that return structures or unions are defined or called. (In languages where you can return an array, this also elicits a warning.)


Copying a string ( or any other sequence container ) is just linear complexity and depending on the implementation, copy of the internal buffer may not occur at all.

If you don't want any copy to ever be done, pass an output parameter by reference
Last edited on
I knew about the "returning the output parameter by reference" option. But that is sometimes quite cumbersome from the coding point of view. Say you have a function that you need often, say, for example when you have to translate a parameter to a different format before you call a function. Then you'd have to declare that extra variable and make the extra call to the translation function on a separate line. This would be hard to read, compared to the case when you could just insert the function call as a parameter to the other function. So imagine this:

void translate(string& temp)
{
//do something to the string
...
}

int main()
{
...
string var1("something1");
string var2("something2");
translate(var1);
translate(var2);
...
function_often_called_and_with_lots_of_parameters(var1,var2,...);
...
}

...compared to this:

int main()
{
...
function_often_called_and_with_lots_of_parameters(translate("something1"),translate("something2"),...);
...
}

Also, I'm discussing the here about the general case, when you don't know on which platform or compiler your code will be compiled, so you don't know if your compiler will make copies or not. And I'm not limiting my point to the "string" example. I'm talking about *any* aggregate types, say vector of vectors, etc. Although in my opinion, if you find a good coding practice, you should stick to it because it helps you to write less error prone code.
Actually, (IMO) method 1 looks more readable than the second.
You shouldn't think about a solution which would be the most efficient in all cases
eg:
1
2
struct myAggregateType1 { int x, y; };
typedef vector<vector<int> > myAggregateType2;
myAggregateType1 is very cheap to copy and myAggregateType2 is not.

If you have a function which builds some object and has to return it, the most intuitive way is to return it by value.
If that object is too large, the simplest option is to modify a reference passed to the function ( so you don't have to manage memory allocation/deallocation )

Let's examine your idea of returning an auto_ptr:
1
2
3
4
auto_ptr<huge_type> build_huge_object()
{
    return auto_ptr<huge_type> ( new huge_type );
}
The first thing is that auto_ptr will soon be deprecated so it won't be quite a good choice
Someone using a function called build_huge_object ( which has the job to create an object of huge_type ) returning an auto_ptr is counter-intuitive.
Returning a plain pointer to huge_type would be more intuitive but it will require to keep track of that object and delete it when not needed any more


PS: Please use format tags http://www.cplusplus.com/articles/firedraco1/
Actually, (IMO) method 1 looks more readable than the second.


Well, I suppose all programmers have their preferred way of coding and what they consider "beautiful". If I was reading code that was coded in the style of the first of my examples, my eyes would be going back and forth between the final function call and the definition of the variables and the translation functions and in time I'd probably go crazy :)

You shouldn't think about a solution which would be the most efficient in all cases


Yes, of course. In general, I would not use a vector to save just two integers. I'd use a vector when I don't know the number of values I'm going to save.

The first thing is that auto_ptr will soon be deprecated


You're right. It's better to use some of the other alternatives.

Someone using a function called build_huge_object ( which has the job to create an object of huge_type ) returning an auto_ptr is counter-intuitive.


How is that counter intuitive? You can think of it just as a pointer, except you don't have to free the memory it's pointing. I suppose this is another point where many programmers would disagree about the "beauty" factor.
How is that counter intuitive? You can think of it just as a pointer, except you don't have to free the memory it's pointing. I suppose this is another point where many programmers would disagree about the "beauty" factor.
Say you have to store that in a variable:
1
2
3
huge_type my_nice_object = build_huge_object(); // :^( sad programmer, doesn't work
huge_type* my_nice_pointer = build_huge_object(); // :^( sad programmer, doesn't work
auto_ptr<huge_type> my_nice_smart_pointer = build_huge_object(); // :^) happy programmer, it works! 
Notice that the happy programmer had to look at the function prototype to know what type it returned.
Last edited on
Well don't you usually have to look at the function prototype?
Not really, if you know what a function does. And if you use a function many time you won't look at its prototype every time.
Remembering that a function returns a smart pointer is harder to remember than one which returns an object or a built-in pointer.
I think the keyword here is consistency. If you select a way to do things, you'd better do it always.
The problem with smart pointers is that if you actually rely on them in too much, performance-wise you'd be better off with a truly GCed language. Coding with a "smart-pointers everywhere" style can easily make your applications slow, unresponsive, and hard to scale.


Returning a plain pointer to huge_type would be more intuitive but it will require to keep track of that object and delete it when not needed any more


What is wrong with that? If you don't want to do that - just use a language with a proper GC.
Trying to program C++ as if it was Java (and the other way round) is a bad idea.
Last edited on
Sometimes you just have to use C++ instead of Java. For example, if you're coding plugins to a framework that doesn't exist in Java. Or if Java libraries don't satisfy your needs. I don't think it's a bad idea at all to get rid of deletes. No need for deletes means no chance of memory leaks and no time spent for tracking them. BTW there's packages for doing garbage collection in C++, too.

Ok. Maybe I was a bit too enthusiastic about using smart pointers in return values. I suppose most compilers these days can optimize out the unnecessary copies and the overhead is negligible. For most people returning aggregates by value is sufficient.
Topic archived. No new replies allowed.