Headers and Includes: Why and How

Pages: 12

May 2, 2009 at 3:12pm

************************************************* ** 0) Introduction ** *************************************************

This article is meant to address a common newbie problem regarding failure to understand #include, headers, and source file interaction. Several good practices are outlined and explained to show how to avoid some ugly snags. Later sections get into more advanced topics (inlining and templates), so even C++ coders with some experience under their belt might benefit from a read!

If you are already familiar with the basics, feel free to skip ahead to section 4. That is where practices and design strategies are discussed.

************************************************* ** 1) Why we need header files. ** *************************************************

If you're just starting out in C++, you might be wondering why you need to #include files and why you would want to have multiple .cpp files for a program. The reasons for this are simple:

(1) It speeds up compile time. As your program grows, so does your code, and if everything is in a single file, then everything must be fully recompiled every time you make any little change. This might not seem like a big deal for small programs (and it isn't), but when you have a reasonable size project, compile times can take several minutes to compile the entire program. Can you imagine having to wait that long between every minor change?

Compile / wait 8 minutes / "oh crap, forgot a semicolon" / compile / wait 8 minutes / debug / compile / wait 8 minutes / etc

(2) It keeps your code more organized. If you seperate concepts into specific files, it's easier to find the code you are looking for when you want to make modifications (or just look at it to remember how to use it and/or how it works).

(3) It allows you to separate interface from implementation. If you don't understand what that means, don't worry, we'll see it in action throughout this article.

Those are the upsides, but the big, obvious downside is that is makes things a little more complicated if you don't understand how it all works (in reality, though, it's simpler than the alternatives as the project gets larger)

C++ programs are built in a two stage process. First, each source file is compiled on its own. The compiler generates intermediate files for each compiled source file. These intermediate files are often called object files -- but they are not to be confused with objects in your code. Once all the files have been individually compiled, it then links all the object files together, which generates the final binary (the program).

This means that each source file is compiled separately from other source files. As a result of this, in terms of compiling, "a.cpp" is clueless as to what's going on inside of "b.cpp". Here's a quick example to illustrate:

// in myclass.cpp

class MyClass
{
public:
  void foo();
  int bar;
};

void MyClass::foo()
{
  // do stuff
}

// in main.cpp

int main()
{
  MyClass a; // Compiler error: 'MyClass' is unidentified
  return 0;
}

Edit & Run

Even though MyClass is declared in your program, it is not declared in main.cpp, and therefore when you compile main.cpp you get that error.

This is where header files come in. Header files allow you to make the interface (in this case, the class MyClass) visible to other .cpp files, while keeping the implementation (in this case, MyClass's member function bodies) in its own .cpp file. That same example again, but tweaked slightly:

// in myclass.h

class MyClass
{
public:
  void foo();
  int bar;
};

// in myclass.cpp
#include "myclass.h"

void MyClass::foo()
{
}

//in main.cpp
#include "myclass.h"  // defines MyClass

int main()
{
  MyClass a; // no longer produces an error, because MyClass is defined
  return 0;
}

Edit & Run

The #include statement is basically like a copy/paste operation. The compiler will "replace" the #include line with the actual contents of the file you're including when it compiles the file.

*************************************************************** ** 2) What is the difference between .h/.cpp/.hpp/.cc/etc ** ***************************************************************

All files are fundamentally the same in that they're all text files, however different kinds of files should have different extensions:

- Header files should use a .h__ extension (.h / .hpp / .hxx). Which of those you use doesn't matter.
- C++ Source files should use a .c__ extention (.cpp / .cxx / .cc). Which of those you use doesn't matter.
- C Source files should use .c (.c only).

The reason C and C++ source files are seperated here is because it makes a difference in some compilers. In all likelyhood (since you're reading this on a C++ site), you'll be using C++ code, so refrain from using the .c extension. Also, files with header extensions might be ignored by the compiler if you try to compile them.

So what's the difference between Header files and Source files? Basically, header files are #included and not compiled, whereas source files are compiled and not #included. You can try to side-step these conventions and make a file with a source extension behave like a header or vice-versa, but you shouldn't. I won't list the many reasons why you shouldn't (other than the few I already have) -- just don't.

The one exception is that it is sometimes (although very rarely) useful to include a source file. This scenario has to do with instantiating templates and is outside the scope of this article. For now... just put it in your brain: "do not #include source files"

Last edited on May 2, 2009 at 5:38pm

May 2, 2009 at 3:13pm

Disch (13742)

***************************************************** ** 3) Include guards ** *****************************************************

C++ compilers do not have brains of their own, and so they will do exactly what you tell them to. If you tell them to include the same file more than once, then that is exactly what they will do. And if you don't handle it properly, you'll get some crazy errors as a result:

// myclass.h

class MyClass
{
  void DoSomething() { }
};

1
2
3

// main.cpp
#include "myclass.h"   // define MyClass
#include "myclass.h"   // Compiler error - MyClass already defined

Edit & Run

Now you might be saying to yourself "well that's stupid, why would I include the same file twice?" Believe it or not, it will happen a lot. Not quite as illustrated above, though. Usually it happens when you include two files that each include the same file. Example:

1
2

// x.h
class X { };

// a.h
#include "x.h"

class A { X x; };

// b.h
#include "x.h"

class B { X x; };

// main.cpp

#include "a.h"  // also includes "x.h"
#include "b.h"  // includes x.h again!  ERROR

Edit & Run

Because of this scenario, many people are told not to put #include in header files. However this is bad advice and you should not listen to it. Sadly, some people are actually taught this in C++ courses that they are paying money for. If your C++ instructor tells you not to #include in header files, then [grudgingly] follow his instructions in order to pass the course, but once you're out of his/her class, shake that habit.

The truth is there is nothing wrong with putting #include in header files -- and in fact it is very benefitial. Provided you take two precautions:

1) Only #include things you need to include (covered next section)
2) Guard against incidental multiple includes with include guards.

An Include Guard is a technique which uses a unique identifier that you #define at the top of the file. Here's an example:

//x.h

#ifndef __X_H_INCLUDED__   // if x.h hasn't been included yet...
#define __X_H_INCLUDED__   //   #define this so the compiler knows it has been included

class X { };

#endif

This works by skipping over the entire header if it was already included. __X_H_INCLUDED__ is #defined the first time x.h is included -- and if x.h is included a second time, the compiler will skip over the header because the #ifndef check will fail.

Always guard your headers. Always always always. It doesn't hurt anything to do it, and it will save you some headaches. For the rest of this article, it is assumed all header files are include guarded (even if I don't explicitly put it in the example).

You do not need to guard your .cpp files, because they are not #included (Or at least they shouldn't be... right? RIGHT?)

***************************************************** ** 4) The "right way" to include ** *****************************************************

Classes you create will often have dependencies on other classes. A derived class, for example, will always be dependent on its parent, because in order to be derived from the parent, it must be aware of its parent at compile time.

There are two basic kinds of dependencies you need to be aware of:
1) stuff that can be forward declared
2) stuff that needs to be #included

If, for example, class A uses class B, then class B is one of class A's dependencies. Whether it can be forward declared or needs to be included depends on how B is used within A:

- do nothing if: A makes no references at all to B - do nothing if: The only reference to B is in a friend declaration - forward declare B if: A contains a B pointer or reference: B* myb; - forward declare B if: one or more functions has a B object/pointer/reference as a parementer, or as a return type: B MyFunction(B myb); - #include "b.h" if: B is a parent class of A - #include "b.h" if: A contains a B object: B myb;

You want to do the least drastic option possible. Do nothing if you can, but if you can't, forward declare if you can. But if it's necessary, then #include the other header.

Ideally, the dependencies for the class should be layed out in the header. Here is a typical outline of how a "right way" header might look:

myclass.h

//=================================
// include guard
#ifndef __MYCLASS_H_INCLUDED__
#define __MYCLASS_H_INCLUDED__

//=================================
// forward declared dependencies
class Foo;
class Bar;

//=================================
// included dependencies
#include <vector>
#include "parent.h"

//=================================
// the actual class
class MyClass : public Parent  // Parent object, so #include "parent.h"
{
public:
  std::vector<int> avector;    // vector object, so #include <vector>
  Foo* foo;                    // Foo pointer, so forward declare Foo
  void Func(Bar& bar);         // Bar reference, so forward declare Bar

  friend class MyFriend;       // friend declaration is not a dependency
                               //   don't do anything about MyFriend
};

#endif // __MYCLASS_H_INCLUDED__

This shows the two different kinds of dependencies and how to handle them. Because MyClass only uses a pointer to Foo and not a full Foo object, we can forward declare Foo, and don't need to #include "foo.h". You should always forward declare what you can -- don't #include unless it's necessary. Unnecessary #includes can lead to trouble.

If you stick to this system, you will bulletproof yourself, and will minimize #include related hazards.

Last edited on May 2, 2009 at 5:42pm

May 2, 2009 at 3:14pm

Disch (13742)

***************************************************** ** 5) Why that is the "right way" to include ** *****************************************************

Note: in this section I refer to the "right way" method outlined above as "mine". While I did come up with it on my own (after struggling through the mucky muck for a while) -- I can't say I was the first person who ever thought of it, so it isn't really "mine". But for purposes of this article, I call it "mine" for simplicity.

You: "So-and-so says that #including in a header is dangerous, but you say it's not! Why is your way so much better than what so-and-so says?"

So-and-so is partially right, but is explaining it wrong. Frivilous and careless #includes can lead to trouble. And one way to avoid those troubles is to never #include inside a header file. So yeah, so-and-so's heart is in the right place. But ultimately, using so-and-so's approach is going to give yourself TONS of additional work and headaches.

The concept I'm illustrating is very OO, and enhances encapsulation. The general idea is that it makes "myclass.h" fully self-contained and doesn't require any other area of the program (other than MyClass's implementation/source file) to know how MyClass works internally. If some other class needs to use MyClass, it can just #include "myclass.h" and be done with it!

The alternative (so-and-so's method), would require you to #include all of MyClass's dependencies before #including "myclass.h", since myclass.h can't include its dependencies itself. This is headache city, because now using a class isn't so straightforward.

Here is an example of why my method is good:

//example.cpp

//  I want to use MyClass
#include "myclass.h"   // will always work, no matter what MyClass looks like.
                       // You're done
               //  (provided myclass.h follows my outline above and does
               //   not make unnecessary #includes)

Here is an example of why so-and-so's method is bad:

//example.cpp

//  I want to use MyClass
#include "myclass.h"
   // ERROR 'Parent' undefined

so-and-so: "Hrm... okay...."

1
2
3

#include "parent.h"
#include "myclass.h"
   // ERROR 'std::vector' undefined

#include "parent.h"
#include <vector>
#include "myclass.h"
   // ERROR 'Support' undefined

so-and-so: "WTF? MyClass doesn't even use Support! But alright..."

#include "parent.h"
#include <vector>
#include "support.h"
#include "myclass.h"
   // ERROR 'Support' undefined

so-and-so: "Give me a break! I'm including it! What else do you want!"

Believe it or not, the above does happen. Little did poor so-and-so know, but "parent.h" uses Support, and therefore you must #include "support.h" before "parent.h".

And what happens if support.h needs something else? What if that something else needs something else? We're already up to 4 #includes just to use a single class! With so-and-so's method, not only do you have to remember which includes are needed for each class, but also the order in which you need to #include them. This becomes a huge nightmare very quickly.

And what happens if you want to tweak MyClass? Let's say you decide that it would be better to use std::list instead of std::vector. With so-and-so's method, you now have to go back and change every single file that #includes "myclass.h" and change it to include <list> instead of <vector> (which might be dozens of files depending on the size of the project and how often MyClass is used), whereas with my method you only have to change "myclass.h", and maybe "myclass.cpp".

The "right way" I illustrated above is all about encapsulation. Files that want to use MyClass don't need to be aware of what MyClass uses in order for it to work, and don't need to #include any MyClass dependencies. All you need to do to get MyClass to work is #include "myclass.h". Period. The header file is set up to be completely self contained. It's all very OO friendly, very easy to use, and very easy to maintain.

***************************************************** ** 6) Circular Dependencies ** *****************************************************

A circular dependency is when two (or more) classes depend on each other. For example, class A depends on class B, and class B depends on class A. If you stick to "the right way" and forward declare when you can instead of #including needlessly, this usually isn't a problem. As long as the circle is broken with a forward declaration at some point, you're fine.

Here's the perfect example of why you should only #include what is necessary:

// a.h -- assume it's guarded
#include "b.h"

class A { B* b; };

// b.h -- assume it's guarded
#include "a.h"

class B { A* a };

An initial glance might see nothing wrong with this. B is a dependency of A, so you include it, and A is a dependency of B, so you include it. So what's wrong with this?

This is a circular inclusion (also called an infinite inclusion) and is the result of one or more includes that shouldn't be there. Say for example you compile "a.cpp":

1
2

// a.cpp
#include "a.h"

The compiler will do the following:

#include "a.h"

   // start compiling a.h
   #include "b.h"

      // start compiling b.h
      #include "a.h"

         // compilation of a.h skipped because it's guarded

      // resume compiling b.h
      class B { A* a };        // <--- ERROR, A is undeclared

Even though you're #including "a.h", the compiler is not seeing the A class until after the B class gets compiled. This is because of the circular inclusion problem. This is why you should always forward declare when you're only using a pointer or reference. Here, "a.h" should not be #including b.h, but instead should just be forward declaring B. Likwise, b.h should be forward declaring A. If you make those changes, the problem is solved.

The circular inclusion problem may persist if both dependencies are #include dependencies (ie: they can't be forward declared). Here's an example:

// a.h (guarded)

#include "b.h"

class A
{
  B b;   // B is an object, can't be forward declared
};

// b.h (guarded)

#include "a.h"

class B
{
  A a;   // A is an object, can't be forward declared
};

You may note, however, that this situation is conceptually impossible. There is a fundamental design flaw. If A has a B object, and B has an A object, then A contains a B, which contains another A, which contains another B, which contains another A, which contains another B, etc, etc. You have an infinite recursion problem, and either class is simply impossible to instantiate. The solution is to have one or both classes contain a pointer or reference to the other, rather than a full object. Then you can forward declare, and then you can get around the circular inclusion problem.

Last edited on May 2, 2009 at 5:40pm

May 2, 2009 at 3:15pm

Disch (13742)

***************************************************** ** 7) Function inlining ** *****************************************************

The catch-22 with inline functions is that their function body needs to exist in every cpp file which calls them, otherwise you get linker errors (since they can't be linked during the linker process -- they need to be compiled into the code during the compiler process).

This might open circular reference holes or other scenarios that might complicate the "right way" outline.

class B
{
public:
  void Func(const A& a)   // parameter, so forward declare is okay
  {
    a.DoSomething();      // but now that we've dereferenced it, it
                          //  becomes an #include dependency
               // = we now have a potential circular inclusion
  }
};

The key is that while inline function need to exist in the header, they do not need to exist in the class definition itself. This lets us exploit a loophole:

// b.h  (assume its guarded)

//------------------
class A;  // forward declared dependency

//------------------
class B
{
public:
  void Func(const A& a);  // okay, A is forward declared
};

//------------------
#include "a.h"        // A is now an include dependency

inline void B::Func(const A& a)
{
  a.DoSomething();    // okay!  a.h has been included
}

While you might not think so at first glance... this is perfectly safe. The circular inclusion problem is avoided completely, even if a.h includes b.h. This is because the additional #includes don't come up until AFTER class B is fully defined, and they are therefore harmless.

You: "But putting function bodies at the end of my header is ugly. Isn't there a way to avoid that?"

Me: Yep! Just move the function bodies to another header:

// b.h

    // blah blah

class B { /* blah blah */ };

#include "b_inline.h"  // or I sometimes use "b.hpp"

// b_inline.h (or b.hpp -- whatever)

#include "a.h"
#include "b.h"  // not necessary, but harmless
                //  you can do this to make this "feel" like a source
                //  file, even though it isn't

inline void B::Func(const A& a)
{
  a.DoSomething();
}

This seperates the interface from the implementation, while still allowing the implementation to be inlined. You can also have a normal "b.cpp" file for the implementation that isn't inlined.

***************************************************** ** 8) Forward declaring templates ** *****************************************************

Forward declaring is pretty straight-forward when it comes to simple classes, but when dealing with template classes, things aren't so simple. Consider the following scenario:

// a.h

// included dependencies
#include "b.h"

// the class template
template <typename T>
class Tem
{
 /*...*/
  B b;
};

// class most commonly used with 'int'
typedef Tem<int> A;  // typedef'd as 'A'

// b.h

// forward declared dependencies
class A;  // error!

// the class
class B
{
 /* ... */
  A* ptr;
};

While this seems perfectly logical, it doesn't work! (Although, logically you really think it should. This is an irritation of the language). Because 'A' isn't really a class, but rather a typedef, the compiler will bark at you. Also notice that we can't just #include "a.h" here because of a circular dependency problem.

In order to forward declare 'A', we need to typedef it. Which means we need to forward declare what it's typedef'd as. The way to forward declare it is like so:

1
2

template <typename T> class Tem;  // forward declare our template
typedef Tem<int> A;               // then typedef 'A'

This is quite a bit uglier than class A;, but nonetheless is a necessary evil. This, however, makes templated classes hard to encapsulate, because it requires every class which forward declares them to know exactly how the template is layed out. If that ever changes you have a big mess to clean up.

A practical solution to this problem is to create an alternative header which has the forward declarations of your templated classes and their typedefs. Here's a more elegant way to approach the above example:

//a.h

#include "b.h"

template <typename T>
class Tem
{
 /*...*/
  B b;
};

//a_fwd.h

template <typename T> class Tem;
typedef Tem<int> A;

//b.h

#include "a_fwd.h"

class B
{
 /*...*/
  A* ptr;
};

This allows B to include a header which forward declares A without including the entire class definition.

Last edited on Jun 15, 2009 at 10:59am

May 2, 2009 at 4:22pm

helios (17607)

You should mention that only files that match source code patterns (*.c, *.cpp, *.cc, *.C, *.cxx, etc) are compiled, even if other files are passed to the compiler. For example, if you use the command line
g++ main.cpp file.c file.h
Only main.cpp and file.cpp will be compiled. A side effect of this is that header extensions are arbitrary.
Also mention that the compiler to use is inferred from the extension, except when the compiler is explicitly told to compile something as C or C++.

About section 7, are you sure forward declaration doesn't work when passing parameters by value, yet it works when passing pointers/references and manually making copies?

Finally, about templates, I'd say it's better practice to put the template definition in the class declaration. Particularly if there are non-template methods in the class.

May 2, 2009 at 4:56pm

Disch (13742)

g++ main.cpp file.c file.h
Only main.cpp and file.cpp will be compiled. A side effect of this is that header extensions are arbitrary.

I wasn't sure that was the case. iirc, you could compile headers in VS. I haven't tried it since i switched to CodeBlocks+GCC. But that's a valid point.

About section 7

Oh crap! That's what i get for not testing enough. You're totally right, forward declaring works fine. Only problem happens if its implicitly inlined, but that's another matter.

Finally, about templates, I'd say it's better practice to put the template definition in the class declaration.

Well -- I'm not a big fan of putting implementation in the class itself (unless it's a really small get() function or some other kind of 1-liner). I guess with templates it's alright because any dependencies can be forward declared and included after the class body (at least I think so, I'd have to actually test that).

There are other considerations, too, though. Like if the template class is exceedingly large and you want to ease compile time (though it would have to be pretty freaking big to make a difference)

Anyway overall I agree. I just included that bit out of completeness. I figured I should focus more on the instantiating method since everybody knows how to do the inlining method. But really -- the more I think about it, the more I think that should belong in another article (like one specifically talking about templates).

In response to that, I've decided to cut sections 7 and 9 completely, and touch up a few related things. I'll edit the posts once I get it straightened out on my local copy.

Thanks for the feedback!

May 2, 2009 at 5:21pm

helios (17607)

Visual C++ doesn't compile headers, either.

Last edited on May 2, 2009 at 5:22pm

May 2, 2009 at 5:33pm

Disch (13742)

Then I guess I didn't recall correctly XD

May 3, 2009 at 5:53pm

Oromis (80)

Thanks for the article.

When using short memberfunctions, the body is often included, like this:

// MyClass.h

class MyClass
{
  private:
    int a;

  public:
    void set(int n) { a = n;}
    int  get()      { return a;}
};

This results in an error when the header is included in both main.cpp and MyClass.cpp. Does this mean it should always be avoided?

Last edited on May 3, 2009 at 5:54pm

May 3, 2009 at 5:56pm

helios (17607)

Yes. it costs nothing to put the function definitions in a separate file and you avoid errors errors.

May 3, 2009 at 5:57pm

Oromis (80)

Oke, thanks :)

May 3, 2009 at 6:11pm

Disch (13742)

I disagree, actually.

There is nothing wrong with shortcutting functions like that. It's called "implicit inlining", and doing it may actually provide a performance boost for small functions like get/set members. Taking the functions out of the class definition and putting them in a .cpp file means they can no longer be inlined, which may result in [ever-so-slightly] less efficient code (though you'll only really notice if you're calling the funciton a whole lot -- otherwise it doesn't really matter).

That above MyClass.h example you gave should not generate any errors, no matter how many .cpp files you include it in. If you are getting errors it must be due to something else. Can you provide a minimal compilable example of where you are getting errors doing this?

May 3, 2009 at 6:41pm

helios (17607)

Taking the functions out of the class definition and putting them in a .cpp file means they can no longer be inlined

Functions without the inline keyword are never inlined. Functions with the inline keyword are inlined as the compiler sees fit.

May 3, 2009 at 6:45pm

Disch (13742)

I have read and heard otherwise from various sources.

Here is one of them:

http://www.parashift.com/c++-faq-lite/inline-functions.html#faq-9.8

May 3, 2009 at 6:58pm

helios (17607)

It says defining the function inside the class declaration is an alias for the inline keyword.

May 3, 2009 at 7:52pm

Disch (13742)

Er right.

Meaning you can have functions inlined without using the inline keyword. IE implicit inlining.

May 3, 2009 at 8:07pm

helios (17607)

Right. It still doesn't guarantee the compiler will inline, and it's possible to still have them in a separate file by using the inline keyword.

Meh. Haven't had a good argument in a while. I think I'll start a thread and try to defend a purposely wrong premise.

May 3, 2009 at 8:25pm

Disch (13742)

Well basically what I'm saying is this:

- If he shortcuts and puts the function bodies in the class definition, he's implicitly inlining (yes)
- If he moves them to a .cpp file, they're no longer inlined (no)
- You can't inline a function that isn't #included in all source files that use it. So even if he tries to add the inline keyword, if he moves the functions to a .cpp file, they won't be inlined (no)

I think I'll start a thread and try to defend a purposely wrong premise.

hahahaah. I can't tell if you're being serious or sarcastic or what. Either way it's funny. XD

May 3, 2009 at 8:31pm

Oromis (80)

I thought I tested it, but it compiles fine now... I must have made some other mistake. The fact that it's implicitly inlined explains why there is no problem with defining the function twice, I guess. Thanks.

May 5, 2009 at 6:28am

translore (31)

I had a teacher "so and so" tell me not to include inside header files and I think I even read this in a C++ text but I really do like the method in this article. It seems like it will help more in the OOP approach and will help make the code more robust! Thanks for the article...

Pages: 12

Headers and Includes: Why and How

C++

Forum