Peculiar Memory Usage

Is there something going on under-the-hood that I am unaware of? Why does this happen?

#include <iostream>
using namespace std;

int main() {

    int f[15];
    int b[10];

    cout << (long)f << endl;
    cout << (long)b << endl;
}

Outputs the expected value. The addresses differ by 60 bytes. (My machine gives 4 bytes to int type)

f has address: 0
b has address: 60

#include <iostream>
using namespace std;

int main() {

    int f[10];
    int b[10];

    cout << (long)f << endl;
    cout << (long)b << endl;
}

f has address: 0
b has address: -40

int main() {

    int f[5];
    int b[10];

    cout << (long)f << endl;
    cout << (long)b << endl;
}

f has address: 0
b has address: -40

I'm using relative addresses to demonstrate the phenomena. In the first case, f is defined before b, as would be expected. In the second case, b is defined before f, for unknown reasons. In the third case, there are 20 bytes extra between them for unknown reasons and b is defined first.

Can anyone shed some light? Using GNU GCC compiler.

Last edited on

simeonz (490)

I can give you part of the answer. First, the "natural" order of pushing automatic objects on the stack is the order in which their definitions are introduced. (They are not actually pushed with a push instruction, but it doesn't matter.) The stack grows towards smaller addresses. That is, pushing on the stack does not move the top upwards, but instead moves the top downwards so to speak.

Why is that? If I recall correctly, the heap memory is bellow the stack (or the free store, which is one part of it). So, the heap extends up, when more dynamic memory is needed, and the run-time stack grows down. This way, the free space between them is exhausted by whoever needs it.

The consequence is this. The address you output is indeed the smallest address in each array. In the second and third example, the usual (or whatever) order is used for the layout. In this case, b is pushed after f. Consequently, b has lower address, and f has higher address. The address of f is the address of b + the size of b, which is 40 bytes. So, b is positioned -40 bytes with respect to f.

In the first example, the compiler chooses alternative order for placing the objects on the stack. It seems that it is desirable to push the smaller object first (b in this case), making its address bigger (because the stack grows towards smaller addresses.) Why is this done? I don't know. Usually bigger objects are moved closer to the location that is guaranteed to be aligned. This is done because bigger objects have greater alignment needs. In this case however, the aligned location is the beginning of the stack frame. It is at the highest address and if you look at the assembly output, you will see that it is aligned on a 16 byte boundary. And the smaller object is pushed first, closer to it, which is contrary to my intuition. Of course, I am not compiler developer, so my guess is as good as yours.

Just to clarify, in the first example, for some unknown reason, b is pushed first, placing it above f in the address space. So, the address of b is the address of f + the size of f, and the size of f is 60 bytes.

Regards

EDIT: Of course, this behavior is compiler and platform dependent. The standard imposes no constraints AFAIK.

Last edited on

buffbill (467)

I think:
Arrays must be stored in memory contiguously.
The f[0] or b[0] which are the names of the two arrays and which are pointers is the only specific address stored for each. The remainder of memory required is reserved for the remaining elements if the array is not initialised. The system determines where the array name (pointer) is stored.
Presumably on your machine an int is of 2 bytes , b[0] was allocated memory first and and f[0] immediately after the last element of b or b[9]

Last edited on

Deathly (21)

Thanks simeonz, and not so much to buffbill.

I was trying to define 2 arrays with a char between them in memory, to see if I could change the char by using array indexing. But it appears my compiler won't let me put them in memory this way. I presume this would work though, since array indexing is just like accessing with pointer.

simeonz (490)

buffbill had responded with best intentions. That's kind of rude man.

Indeed you can overwrite some memory by indexing out of bounds. But you don't even need array to experiment with that:

1
2

char x;
char y;

You can do (&y)[1] = n if &x == &y + 1, or you can do (&y)[-1] = n if &x == &y - 1. Of course, depending on the compiler, the offset may not be +-1, but generally, it should be. Actually, not that it matters, but it should be the first case (+1), I think. You can also overwrite y using x by doing the diametrically opposite stuff. You can do it with arrays, but you don't have to use arrays. In practice (bugs and security attacks) it would happen with arrays. An object is actually one element array representation-wise (even per the standard.) Just remember that you can use negative indices. Using negative indices is not standard compliant, but neither is out of bounds indexing, so that doesn't matter.

Regards

PanGalactic (1658)

Deathly wrote:
I was trying to define 2 arrays with a char between them in memory, to see if I could change the char by using array indexing. But it appears my compiler won't let me put them in memory this way. I presume this would work though, since array indexing is just like accessing with pointer.

As simeonz says, that's not allowed. It is squarely in the realm of "undefined behavior", which is a C++ euphemism for "the compiler may not flag an error, but results are random." A lot of hard to track down bugs are introduced by beginner programmers because they try something and it works in their test, not fully understanding the rules of the language. I think this is one area that makes C++ much harder for beginner programmers than anything else. That's the way we naturally explore new languages: try and see what works. Many languages will immediately flag an error when attempting to read past an array boundary (or anything that may lead to undefined behavior). C and C++ will let you compile and execute such code even thought the results are undefined.

This may help: http://stackoverflow.com/questions/367633/what-are-all-the-common-undefined-behaviour-that-a-c-programmer-should-know-abo/367662#367662

Unfortunately there are lots more. Calling variadic functions such as printf() with the wrong types is a big one that's not listed.

Deathly (21)

Thanks again. The question about putting a char between two arrays and trying to access it by array indexing was actually an exercise out of Eckel's text. I think his intention was to demonstrate that C++ doesn't flag behavior such as reading beyond an array's bounds like Java does.

I would presume C++ uses this undefined behavior designation (as opposed to outlawing it) in order facilitate some types of coding scenarios. Could you provide any examples of where undefined behavior might offer a significant advantage over using standard notation?

simeonz (490)

Well, I don't understand the exact difference between undefined and implementation defined behavior. Actually, they may be the same, but the words are sometimes used in technical discussions as distinct.

There are two types of questions. First, do you want to outlaw something that you technically don't need to outlaw. In other words, do you want to outlaw constructs that do not render the program source ambiguous and impossible to compile. Sometimes it can be desirable to demand an error, just because you want to prevent a bug, not because the compilation is impossible.

Second, how efficiently can you outlaw particular usage in practice. It may turn out that you need run-time checks to be sure. For example, if I explicitly index the 6th element in a 5 element array, the compiler can look at the syntax and bark at me. But suppose I use an indexing variable and it happens to receive the value 5 (6th element) at run-time. Then the compiler has to be very intelligent to diagnose that statically. In fact, it may be impossible to diagnose statically. So, to avoid run-time checks, the standard may allow some leniency to keep the performance.

The other important thing, is that the compiler vendor may specify additional semantics, as long as they don't conflict with the standard. This allows support for non-portable low-level programming using C++. The advantage is that you can interact with your devices from the comfort of a much more self-explanatory language without having to manually optimize every arithmetic expression. The disadvantage is that some low-level features, especially processor instructions in CISC processors, may not have analogous C++ constructs.

Particularly, abusing pointer arithmetic and indexing non-portably, you can inspect memory ranges without using the standard C++ access methods. If you are in embedded software for example, you can use this for diagnostic purposes, and dump over some network channel the contents of some memory range. You can analyze the contents later using the so called memory map that is usually produced from the C++ linker.

Regards

PanGalactic (1658)

C++ must compile to executable form on numerous platforms (CPUs and operating systems). Those systems all have different architectures and implementation quirks. So there are things that are defined by the platform which C and C++ must remain compatible with. In those areas, the language has to leave some wiggle room.

C++ is also designed to give the compiler a lot of leeway in how it can optimize the intentions of the programmer. The compiler is allowed to optimize memory layout, elide code and variables, transform code -- anything that, within the rules, does not change the externally observable behavior of the program. Many of those rules are inherited from C. And C is basically a high-level assembler.

In this example, C does not check for bounds errors on arrays because the cost is too high (and C++ must follow suit). C++ vectors provide both checked and unchecked access to elements to give developers a choice between speed and safety.

Some of the behavior is outside the scope of the language. Things like violations the "one definition rule" (ODR) say "no diagnostics required" because ODR violations can only be caught by a linker and they don't wish to impose behavior on linkers. Neither the C or the C++ language standards define the behavior of linkers.

Topic archived. No new replies allowed.

Peculiar Memory Usage

C++

Forum