Also, keep in mind that inline is just a polite request to the compiler. It doesn't have to inline it. Most important, the compiler is in a better position to know whether to actually inline the function than you are. It knows exactly how many calls you'll make, exactly how large the function is, how it can be optimized in each of the locations and probably a bunch of other stuff that I can't think of.
you can force inline with a #include instead of a function call if you feel strongly about it. Some compilers support a force inline extended keyword, but those won't do it if it can't (eg recursive). (The #include will blow up if you try to do a recursive inline too, its not viable).
The C/C++ inline keyword is terribly misunderstood.
It doesn't help that historically the meaning may have changed a bit (or maybe our assumptions about it).
The inline keyword is, first and foremost, about linkage.
You know the standard concept of the function, so keep that in mind for a moment. In this instantiation, the code for the function is written within a compilation unit, and the linkage for that function is external, which is to say the linker will "realize" that the code for such a function is to be found in some "other" or "external" compilation unit, and only one will (or should) be found. A warning or error about duplicate functions will be issued by the linker if multiple instantiations of such a function is found. By default, this scenario implies a function call.
Although "language lawyers" may have more detailed explanations, the colloquial version is this:
The inline keyword merely informs the linker that the linkage is not necessarily external, and that duplicates of such a named function should be assumed to merge into a single instantiation, as the code is probably in a header.
There is a subtle implication that this also might mean the code doesn't even resolve to a function at link time, but that the linker should expect optimizations are applied to emit that code inline as assembler, instead of fashioning a call to a function.
However, emitting inline assembler for a short function is automatic to the optimization process, and you have very little control over it.
As others point out, there are non-standard (though rather common) "forced inline" keywords that can be used to further insist that the compiler/linker emit code as inline assembler, avoiding the function call overhead, but it is still only a hint. Many linkers and compilers still insist upon doing things "their way", and frankly they are often better about that decision than we are.
If you fashion C++/C code to depend upon an inline assembler generation of a function, it is rather unrepeatable across compilers. You may "get it to work" as you expect, through inspection of the optimized assembler output, for now, but some compiler update or switch to another compiler or platform can completely undo the assumptions that caused it to work, which is to say some future build could "out of line" that beautifully arranged "inline" optimization due to trivial changes.
There is only one way to be absolutely sure you are in control. As with all "there's only one way", it does break out into several tributaries. The central theme is that you must take the option out of the compiler/linker's domain and put it in yours. This may mean you write the code as a macro. It is ugly to do so, but if you must insist on this level of control you must go in realizing there is no certain, portable C/C++ way to ensure the code emitted during the build will, without question, end up being inline assembler when written as a function.
you don't have to write the *code* as a macro.
you can write normal c++ and use a macro to dump that code where you want it, as I said above. Basically injecting lines of code in directly where you want them. This avoids some, but not all, of the problems that a pure macro coded one can create.