Candidates for class membership

Hi there,
I'm having some doubts about how to arrange some functions for my simple SplayTree class, anyway the topic discussed here should be applicable to virtually any other class. If you have an opinion or a comment about this I'd be glad to hear it ;)

Let the following be a the class definition:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
template<class T> class SplayTree : public BaseTree<T> {
public:
	typedef T ValueType;

	// Exception classes
	class AlreadyExist {
	public:
		ValueType& get_value() {return present_node.value();}
		friend class SplayTree;
	private:
		AlreadyExist(BaseNode<T>& node) : present_node(node) {}
		BaseNode<T>& present_node;
	};
	class NotFound {
	public:
		friend class SplayTree;
	private:
		NotFound() : last_accessed(0) {}
		NotFound(BaseNode<T>& last) : last_accessed(new BaseNode<T>(last)) {}
		~NotFound() { delete last_accessed; }
		BaseNode<T>* last_accessed;
	};

	virtual ValueType& insert(const ValueType&) throw(AlreadyExist);
	virtual ValueType& find(const ValueType&) throw(NotFound);
	virtual ValueType& findmax() throw(NotFound);
	virtual ValueType& findmin() throw(NotFound);
	virtual void remove(const ValueType&) throw();
	virtual void clear() throw();

	SplayTree() : root(0) {}
	virtual ~SplayTree() { clear(); }

private:
	BaseNode<T>& insert_rec(const typename SplayTree<T>::ValueType&, BaseNode<T>&);
	BaseNode<T>& find_rec(const typename SplayTree<T>::ValueType&, BaseNode<T>&);
	BaseNode<T>& findmax_rec(BaseNode<T>&);
	BaseNode<T>& findmin_rec(BaseNode<T>&);
	void clear_rec(BaseNode<T>&);

	void splay(BaseNode<T>&);
	void zig(BaseNode<T>&, BaseNode<T>&);
	void zigzag(BaseNode<T>&, BaseNode<T>&, BaseNode<T>&);
	void zigzig(BaseNode<T>&, BaseNode<T>&, BaseNode<T>&);

	BaseNode<T>* root;
};


Now, Stroustrup says (just to mention one) that the only functions that may have membership in a class are those that must access to its representation.

In this case the representation can be thought either as the pointer BaseNode<T>* root or as the entire tree, made up by a bunch of BaseNode<T>s linked together.

The only private function here that access the pointer *root is splay(), others such as insert_rec(), zigzag() and so on doesn't. Should I export all those functions out of my class?
I mean: even if they don't access to the representation (well, not explicitly by the root pointer) they have been written only for use with that class and they'd be useless for anything else except it. Furthermore when exported those functions may also need their own namespace, something like SplayUtilities. In this situation, would you place those funcions inside or outside that class?

As I said before, any opinion is welcome!

Thanks.


EDIT: typos
Last edited on
I share the opinion of Scott Meyer about when to include something into a class as member and when not (http://www.drdobbs.com/cpp/184401197)

1. If it has to be virtual, it has to be a member. Period. No discussion here.
2. Some operators have to be members (conversion operators, assignment etc.) No discussion either.
3. If it's any other operator who can take advance of left side type conversion (operator ==, operator < etc..), make it a global function (as friend, if it needs access to private stuff). This has been somewhat discussed, but I think the major consens is in favour of this definition.
4. If it needs access to class internals, make it a member of the class.
5. If you plan to access the function in a generic way from a template and if the function can't be resolved from its signature alone, you have to make it part of some class (as namespaces can't be template parameter). No discussion here (except: "What are you talking about?" ;).
6. Else make it global. That means: if it can be implemented outside the class without making it friend, do so. This is the most controversial point.

The last point is source of a lot of discussion and probably the core of your question. My recommendation is: yes, put the function out of the class. And putting it into a namespace is a good idea, if the functionality belongs to the class and not to a "common concept".

(In other words: I wouldn't put a "find" function into a namespace of some "SplayTree" only because the find works with the SplayTree only. But if you have some function "unsplay" that really makes only sense with splay trees, put it in namespace "splay" or something like that. ;)


However! Every time you find that a lot of functions should be moved out of the class body (according to the definition above), you should check whether your class really provides good encapsulation of your data and your concept. Sometimes, you recognize, that by providing one nice small "getData" function that you added last december actually makes it possible to implement almost all functions outside of the class. Then most probably this "getData" breaks your class design and should be removed instead.

Hopes it helped?

Ciao, Imi.
PS: Sometimes, I add a "0. If you just need to get things done quickly or need to access something for debugging temporary, do what you want." ;-)
I just finished reading Scott's article, that was a nice piece of stuff, really interesting, thanks for sharing it ;)

After all I still think those functions should stay into the class, I'll try to explain the reason in the following lines. Any critic to it is welcome and appreciated!

The situation here is quite different from the one explained by Scott, that is: all the functions with the _rec suffix are recursive functions which are only being called by other class' functions (that's not big news, they're private). Furthermore each recursive function is being called by its interface-equivalent (same name without _rec suffix) with *root as argument to start a recursion in the tree. To me it feels like the recursive functions are like an implementation of the non-recursive interfaces.
Another reason that I just thought over (by reading your message and Scott's article) is related to encapsulation: The major reason for a function to get declared outside class' scope is to increase encapsulation, that is have less code which depends on class' implementation. In this case the class' implementation can be thought as the entire tree, made up of many BaseNode<T>s, whose root is the *root pointer. Let's say that in one year time I'll come up with some new class for my nodes, let's say UberNode<T> which has a completely different interface from BaseNode<T>. Changing BaseNode to UberNode would obviously affect the code from _rec suffixed functions, indeed they're dependent from the implementation, and should't that be reflected by making them class members?

I'd like and I'd be glad to know what you think about it!


Hopes it helped?

It sure did ;)


ciao! (are you italian?)
Last edited on
(that's not big news, they're private).

Oh, you were talking about the private functions? I'm sorry. I had only public functions in mind. Hm, but thinking about it, my policies for making private helpers class members or not are very similar to those for public functions.

1) If they need to be part of the class, e.g. operators, constructors etc, make them. No discussion possible.
2) If they can be implemented without making them friend to the class, do so in the cpp file (in anonymous namespace). Usually, you have to add another parameter holding the this-pointer. (I usually
pass this as reference, though).
3) Else, I use to put them in the class file. (Means: If I would have to make the helper function "friend", I rather implement it directly in the class file).


Now your case would fall into 2). I put rule 2) there mostly because allthough private functions *should* not be part of an interface, they actually are. They can change the class structure, e.g. a virtual function in the private section can make things very different for the whole class. Also, changes in private member signatures forces recompiles. And they can introduce some side effects, when you names clashes with functions defined elsewhere (e.g. in base classes). But that rarely happens in C++ (except you make the functions virtual). For that reasons, I try to clean up classes - even from small private helper functions (if possible). Scott Meyer in his article uses a different base argument: encapsulation. It is also true here, but more about this later.

You have templates.. So you can't implement them in anonymous namespace and can't hide them in a CPP file. Your clients have to recompile on every change anyway and the functions under question aren't virtual. So in your specific case, it might be OK to implement them within the class.

Still.... I would implement them in some "private" namespace, e.g. if your class reside in "namespace tree", you could use "namespace tree::private".


You name two reasons in favour of putting them into the class:

First, you recognize that your implementation helpers have heavy dependencies to your public interface and you want to shortens the distance of both implementations, so they can't be out of sync so easily. Is this right?

Well, I find this a rather weak reason. Actually, I think its even of advantage if the implementation is stown away somewhere else, which would turn your argument upside down. For the danger of getting out of sync, you could put a comment in the class "// implementations are in header splaytreeprivate.h".

The second reason you have seem to be about encapsulation. But honestly, I don't get the exact point you want to make here.

You speak about extensibility and what you would have to change if you introduce a new class. But: If your UberNode public inherits from BaseNode, then your argument is clearly wrong, as all code that applies to BaseNode will still be working with UberNode's (or else, you are violating the Liskov's substitution principle. And than you most probably have a big design flaw in your class hierarchy.)

If it doesn't inherit BaseNode, then I don't see a connection to this problem. It will have its own set of
helper functions then, and you can start deciding again where to put these. Even better - if you make the
utility functions global instead of members, you may be able to reuse them for any other class like UberNode, if functionality is similar. (Do this only, if the similarity is not accidential, but "by design"!)

But all this has nothing to do with encapsulation in the way Scott Meyer uses it. That's about changing
the implementation details of one class and looking at the amount of code you have to verify because of the change. It's not about extending the class hierarchy. And his argument is still an argument in favour of getting private helper functions out of the class (if you can without making them friend). If you change the class implementation detail - say: make the root pointer a smart pointer - you would have to verify the helper functions only, if they are part of the class (as they *could* directly access the new smart pointer (and may have trouble with the change).


I think, most people's argument about this issue is rather an aesthetic one: It just looks better for them, when everything is sorted nicely into one place. This feeling provokes statements like "They are used only by the class, so they belong to the class, so they should be part of the class." This logic is not correct. There is no concrete definition of "belong" in this abstraction level (there may be one in UML diagrams ;), so that is rather an empty idea. And I don't see a necessity of something to be part of something else, even if it belong to the other.

The other argument that usually brought up is consistency. If you put some helper (and public functions) inside the class and others outside, users get confused which one belongs to the class and which not and where to find the functions etc.. I think Scott Meyer dismantles this argument nicely: It is already the case. Programmer who think that "everything that belongs to this class must be declared within the class declaration itself" have already a misconception.


Ciao, Imi.
PS:
ciao! (are you italian?)

Aww. actually I am german. I don't know why I say "Ciao" always at the end. Just a habbit ^^
Last edited on
Actually, I think its even of advantage if the implementation is stown away somewhere else, which would turn your argument upside down. For the danger of getting out of sync, you could put a comment in the class "// implementations are in header splaytreeprivate.h".

Well, I understand that sometimes it could be OK to take away some parts of the implementation from the class itself, anyway I wouldn't apply this as a rule of thumb: it would lead to nearly empty classes made up of bare representation and interfaces. In other words: every interface function would be a wrapper for some external function that modify the class' representation, with the interface passing a [pointer to the representation (or a part of it)] to the external function as argument. In the other hand having only the class to access to its representation (i.e. without member functions delegating work to external functions) would lead to a huge amount of code and a big load of dependencies inside the class' scope.

I'd say that what is needed here is a compromise, I'd say:if you have to do some data processing on the representation and you have a lib already written that does it, use it, even by passing a [pointer to the representation (or part of it)] to it if needed (that is: allow the external lib to modify your representation). Else if the function you want to implement is specific to your class (i.e. is not like a sort algorithm that you could use somewhere else) AND to get it implemented you should pass a [pointer to the class' representation (or part of it)] to the function, well I don't see why not to let it be a class' member so that it can directly access to the representation and modify it in a more explicit way.
I hope you got the point this time, I'm not that good at explaining things ^^.


ciao!
Last edited on
So to sum up: We are left with the decission whether the following example 1 or example 2 is better, right?

Example 1:
1
2
3
4
struct Vector3D {
    ...
    Vector3D cross(); // cross products really make sense only for 3d vectors
};



Example 2:
1
2
struct Vector3D {...};
Vector3D cross(const Vector3D& v); // it *can* be implemented outside of Vector3D 



Let's assume that my comment in first example is right (there are applications for cross product for other vectors, but they are more rare ;).

In my examples, cross() could be implemented by using only the public interface of Vector3D. Still, it would only make sense to pass the whole vector reference, as its implementation decide which part of Vector3D's interface it needs.

Your argumentation is, that it should belong inside Vector3D, as the comment suggested: It does make only sense together with Vector3D and so it would "clean up" things by putting related code together.

My argument would be, that since it *can* be implemented outside of Vector3D, it should be.

If you agree with my setup and examples, I will still hold the flag, that implementing cross() outside of Vector3D is generally a better design. On the other hand, I would probably define it inside of Vector3D just out of reflex.


every interface function would be a wrapper for some external function that modify the class' representation, with the interface passing a [pointer to the representation (or a part of it)] to the external function as argument.


Well.. functions outside of classes can belong to the interface of this very class. It's a misconception of what "Object Oriented" means, if you suggest that only the class declaration with all class members belong to the interface of your data abstraction. That's not true at all!

An interface of a class (or better of 'one unit of data abstraction') is the class itself plus everything around it as global functions, helper classes, helper structs etc. (But not the implementation-only dependencies).

Speaking in my examples above, both have the very same functionality in their interface. "cross" in my second example does not belong to the class declaration, but is still part of the interface of the "3D - Vector".

But as I said above: Out of reflex, I would probably put the cross inside the class. Not every design has to be super-perfect in terms of extensibilityflexibilityabstracionabilitynonsense. And many programmer just want stuff together in one class and wouldn't even look next to the class definition.


There's one thing left to say: Putting cross() inside the class declaration has one advantage we haven't spoken about yet: You can change the *interface* of Vector3D without affecting the implementation of cross(). (Say, you decide to provide only length() and azimuth() but not x,y,z anymore). If you have to think about whether this could happen, then this is a sign that your Vector3D may not encapsulating the idea of a vector properly but exposing too much "plain" information (only a sign, not necesarily always the case).


Ciao, Imi.
Last edited on
Actually I was speaking about something like this:

1
2
3
4
5
6
7
8
9
10
11
class MyClass {
public:
    void some_function() { do_something(representation); }
private:
    SomeClass* representation;
}

void do_something(SomeClass* ptr_sc)
{
   /*modifies *ptr_sc */
}


Let's pretend do_something() is not like a standard algorithm or something which could be used for any SomeClass*, it's something that makes sense only with MyClass. Therefore there is no necessity of having it available in the global scope.
With this said I would proceed in either of the following two ways:
- if do_something() is only being used by some_function() -> copy-paste its implementation inside some_function()
- if do_something() is used by more than one of the MyClass' members -> make it a MyClass' private member.

My point is: isn't it better having a function which actually modifies the class' representation inside the class itself rather than having it outsides ?

Regarding your example, I share your opinion.


ciao!
Last edited on
If you use "normal" classes (no templates), I'd still prefer my "decission algorithm" above, means put do_something into an anonymous namespace and hide it completely from any header files. (For non-templates, you get a lot more sound and solid arguments for independent functions)


So here a sum up for the scenario "Where to put an independend helper function of a template class?":

Pro of putting common functionality into a class:
- Looks better to some people, when code locations are close to each other. (Some people just expect it to be part of the class)
- Doesn't depend on the classes interface (sometimes you can change interface of the class without to change the code of the private function)

Pro of putting functionality into global methods:
- Looks better to some people, when class declaration is smaller and not bloated with helper declarations.
- Doesn't depend on the classes internals (you can always change implementation without to change the code of the global function)
- No accidently overwriting (accidently introduced virtual functions with the same name in base classes)

I guess, the remaining points of concern are now so tiny, that the first argument "Looks better if code is close to each other" can override any of the other pros and contras. ;-)

For my personal opinion, the same as for public functions goes here: I would probably implement the function out of reflex in the class, but when I think a lot about "why", I would agree that it would make more sense to move it into an own place. (Note: I always try to "move it out of the class", when I have a separate CPP file where I can hide the existance easily).


Note, that this only applies for those independend helper functions which don't need access to internals of the class. It's no use to pass all private member variables of a class to an other function by non-const reference just to force to move the helper out of the class. In this case, the function surely is part of the private implementation of the class (and when you change anything in the implementation structure, you have to change the helper's interface anyway).


Ciao, Imi.
Topic archived. No new replies allowed.