Generics in C -- Argh

Hey all. As usual, my question isn't a simple "malloc kilz prg w/segfault".

I needed a generic list type in C, so I hacked up a simple library that does that.

I parameterized it like is done in C++ templates:
1
2
#include "list.h"
list_t (int) numbers;


However, I quickly realized that I'd have to use the GCC typeof() operator to avoid having to name the element type when using things like
1
2
/* Pretty GCC stuff */
int x = *list_first( numbers );
/* But I know the type anyway... */
int x = *list_first( int, numbers );

(This code has to be extremely portable -- so GCC extensions are out, especially for just a couple small things like that pretty stuff.)

Next it became apparent that the actual type of the elements in the list are not needed. I could just as easily have coded the above as:
1
2
3
4
#include "list.h"
list_t numbers;
...
int x = *list_first( int, numbers );

However, I liked having the element type in the typename of the list. I like the explicit typing, and it makes it easier for the programmer to remember to avoid abuses like adding differently typed data to the same list.
1
2
3
list_t numbers;
list_append( numbers, 12 );
list_append( numbers, "Hello" );

Of course, without the typeof() operator, or true generics (which I'll mention below) it is impossible to guarantee against such behavior...



Also, I still need to typedef stuff, since you can't use anonymous structs in argument lists and the like (at least not without complaints).
1
2
3
4
5
6
void print_list( list_t (point_t) ls );
...
list_t (point_t) points = {0};
...
print_list( points );
test-list.c:39: warning: anonymous struct declared inside parameter list
test-list.c:39: warning: its scope is only this definition or declaration, which
 is probably not what you want
test-list.c: In function 'main':
test-list.c:63: error: incompatible type for argument 1 of 'print_list'
...

In this case, it probably is what I want, but I don't want my library code generating error messages, so I have to eschew using the generic type directly in lieu of the aforementioned typedef:
1
2
3
4
5
6
7
typedef list_t (point_t) point_list_t;
...
void print_list( point_list_t ls );
...
point_list_t points = {0};
...
print_list( points );

That's a bit disappointing. So ultimately, there doesn't seem to be much point in parameterizing over a type at all. Except, of course, for the same pretty factor that was dismissed earlier.



True generics can be achieved in C. For example:

generic_list.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#ifndef DATATYPE 
#error You must #define DATATYPE before #including this file!
#endif

/* The structure definition */
typedef struct DATATYPE ## _node_tag
  {
  struct DATATYPE ## _node_tag* next;
  struct DATATYPE ## _node_tag* prev;
  DATATYPE                      data;
  }
  DATATYPE ## _node_t;

typedef struct
  {
  DATATYPE ## _node_t** first;
  DATATYPE ## _node_t** last;
  unsigned long         length;
  }
  DATATYPE ## _list_t;

/* Prototypes */
DATATYPE* DATATYPE ## _list_first( DATATYPE ## _list_t list );
unsigned long DATATYPE ## _list_append( DATATYPE ## _list_t list, DATATYPE data );
etc

#undef DATATYPE 
my_prog.c
1
2
3
4
5
6
7
8
9
10
11
12
13
#include "generic_list.h"

#define DATATYPE int
#include "generic_list.h"

typedef struct { int x, y; } point_t;
#define DATATYPE point_t
#include "generic_list.h"
...

int_list_t     primes;
point_t_list_t points;
...
This, of course, includes all the overhead of regular C++ templates, plus it requires some careful compilation... (And this is all simplified considerably, of course. If I go this route I'll make the #including and the like much less gory for the user which is likely just myself anyway...)


So here's the question.
What do you gurus think? Should I stick with my simple untyped-generic above? Should I get rid of the "pretty template jazz" which has no apparent point?

Or should I just give in and use true generics?

Remember, I'm using ANSI C89 here. Thanks for reading. :-@
How about putting the structure functions in a macro? Then you can avoid the error-prone #define DATATYPE and replace it with this:
1
2
3
4
5
6
7
8
#include "generic_list.h"

LIST_INSTANCE(int);

//...
LIST_T(int) list;
LIST_INIT(int,list);
LIST_PUSH_BACK(int,list,8);
Last edited on
I suppose I should have mentioned that one of the nice points about my original approach is that the compilable parts (that appear in a .c source file) need no special handling to compile and link with the project.

That way I avoid inlining non-trivial chunks of code or multiple compilations of the same source file.

The #define DATATYPE isn't so error prone when kept away from the user -- the new approach simply moves it to a macro-local namespace.

I don't want the bloat of an include-only option. The code to link and unlink a node, sort a list, and search for a node is not sufficiently small.


However, I do like the idea that the LIST_T() macro simply resolves to a previously typedefed type -- that thought had missed me. That would automatically solve the problems with anonymous structs in argument lists and type compatabilities. Thanks!
In case it wasn't clear, I wasn't saying the macro should handle the code for say, initializing. This is what I meant:
#define LIST_INIT(type,obj) type##_list_init(&list)

LIST_INSTANCE would actually define all functions used by the type:
1
2
3
4
5
6
7
8
9
10
11
12
13
#define LIST_INSTANCE(type)\
typedef struct{\
	type data;\
	type##_node *next;\
} type##_node;\
\
typedef struct{\
	type##_node *first;\
} type##_list;\
\
void type##_list_init(type##_list *list){\
	list->first=0;\
} 
Last edited on
Ah, that is pretty much what I have got for approach number one... but using the LIST_INIT() macro is a definite improvement so that LIST_T() can resolve to a previously typedefed type. Thanks!
OK, so now I have another stylistic question...

There are some operations to perform on the elements of the list, such as comparison and finalization. And there are two ways to set this up.

First, a quick base type and a simple constructor:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
typedef struct
  {
  int x;
  int y;
  }
  point_t;

point_t point( int x, int y )
  {
  point_t result;
  result.x = x;
  result.y = y;
  return result;
  }


One
The first way I think matches the C++ paradigm more...
1
2
3
4
5
6
/* Prototypes for extant functions to operate over elements of a list */
int compare_points( const point_t* left, const point_t* right );
void copy_point( const point_t* source, point_t* target );
void free_point( point_t* const element );

DECLARE_LIST_T( point_t, compare_points, copy_point, free_point );

That was a complete example. The functions can be specified as NULL for default behavior:
1
2
3
...
DECLARE_LIST_T( point_t, compare_points, NULL, NULL );
/* Performs a binary copy of a point_t using memcpy() and does nothing to finalize a point_t. */

After this, of course, the user can create and use a list of points easily:
1
2
3
4
5
6
LIST_T( point_t ) points = EMPTY_LIST_T( point_t );

list_append( point_t, points, point( 42,  7 ) );
list_append( point_t, points, point( -3, 19 ) );
list_append( point_t, points, point( 24,  2 ) );
list_stable_sort( point_t, points );

Advantages: very compact, easy to use, easy to read code.
Disadvantages: requires an extra line of code or two to change the (eg) comparison function. I don't consider this particularly eggregious because a sorted list must always be sorted with the same function. The only time it makes a difference is when searching for particular members, and the list_find()... functions allow you to specify a comparison predicate that differs from the one given when the list type was declared.

Two
The second type I think matches the C paradigm more...
1
2
3
4
5
6
7
8
DECLARE_LIST_T( point_t );

LIST_T( point_t ) points = EMPTY_LIST;

list_append( point_t, points, point( 42,  7 ) );
list_append( point_t, points, point( -3, 19 ) );
list_append( point_t, points, point( 24,  2 ) );
list_stable_sort( point_t, points, compare_points );

Advantages: lazier setup, more C-like, explicit function usages
Disadvantages: more typing for things that typically don't change between invocations


Please critique. What do you like better? What would you do?

Thanks again for your time.
Personally, I like the terseness of the second one's declaration better. It's easier on the eye than DECLARE_LIST_T( point_t, compare_points, copy_point, free_point );.

Why not combine the two and add to the first one a second list_stable_sort that also takes a function pointer?
Last edited on
That is the only real difference between the two.

I agree about the niceness of DECLARE_LIST_T( point_t );

The reason I was considering otherwise is simply that you only declare the list type once, instead of multiple times. Whereas functions that operate over the list, such as list_sort() and list_insert_sorted(), are called many times.

But again, the first form introduces a possible ambiguity: when the item is appended is it a binary copy of the argument or a constructed copy of the argument (using copy_point)? With the first type, there must be two functions: list_append() and list_append_copy(). With the second it is always just a binary copy.

But again, it is really easy for the user to say:
1
2
list_sort( point_t, points, &fn1 );
list_insert_sorted( point_t, points, &fn2, pt );


Foo.

Well, I guess I'll go with the second way. It is easier to code anyway, and not that much more typing to use. And user errors can be eliminated by taking away the sort function argument from the insert_sorted() function...

Thanks!
I've changed my mind. While not as pretty, the first way is better.

When I finish, I might play around with some GCC extensions (like typeof), but I don't know yet exactly how badly that would impact portability. (As the GCC is, after all, probably the most portable compiler that exists... After all, given the choice between compiling with Sun CC and GCC... But then again, if restricting to the GCC I might as well just use C++...)

Current code:
1
2
3
4
5
6
list_sort( point_t, points );
list_insert_sorted( point_t, points, point( -3, 5 ) );
...
point_t* p;
for (p = list_first( point_t, points ); p; p = list_next( point_t, p ))
  printf( "(%d,%d)\n", p->x, p->y );

What it would look like with typeof nonsense:
1
2
3
4
5
6
list_sort( points );
list_insert_sorted( points, point( -3, 5 ) );
...
point_t* p;
for (p = list_first( points ); p; p = list_next( p ))
  printf( "(%d,%d)\n", p->x, p->y );
etc.


When done, I'll post it for you all to see and critique (and use, if you like). Playing with the CPP has been interesting -- especially to be able to create a macro that does:
 
point_t* pt = list_find( point_t, points, point( 12, -7 ) );
without macro argument side-effects and returning value -- all using ANSI C (no GNU extensions). [Some recursive list-ing was required to make it all re-entrant...] :-)
I can honestly say I have no idea what this thread is about, slightly saddening...
heh... alol

I'm just making a templated, doubly-linked list type in C89. (None of that fancy C++ stuff, where I could just use std::list.)

Compare:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iomanip>
#include <iostream>
#include <list>
using namespace std;

struct point_t
  {
  int x, y;



  point_t( int x, int y ):
    x( x ),
    y( y )
    { }
  };





int main()
  {
  list <point_t>            points;
  list <point_t> ::iterator pt;

  points.push_back( point_t(  2,  3 ) );
  points.push_back( point_t( -7, 19 ) );
  points.push_back( point_t( 12, 44 ) );
  points.push_back( point_t(  8,  5 ) );

  for (pt  = points.begin();
       pt != points.end();
       pt++)
    cout << "(" << setw( 2 ) << pt->x
         << "," << setw( 2 ) << pt->y << ")\n";



  return 0;
  }
#include <stdio.h>

#include "generic-list.h"


typedef struct
  {
  int x, y;
  }
  point_t;

point_t point( int x, int y )
  {
  point_t result;
  result.x = x;
  result.y = y;
  return result;
  }

DECLARE_LIST( point_t );

int main()
  {
  list_t (point_t) points = EMPTY_LIST;
  point_t*         pt;

  list_append( point_t, points, point(  2,  3 ) );
  list_append( point_t, points, point( -7, 19 ) );
  list_append( point_t, points, point( 12, 44 ) );
  list_append( point_t, points, point(  8,  5 ) );

  for (pt = list_first( point_t, points );
       pt;
       pt = list_next( point_t, pt ))
    printf( "(%2d,%2d)\n", pt->x, pt->y );


  list_finalize( point_t, points );

  return 0;
  }

Slick, no?

:^)


[edit] BTW, the size of the files, source and compiled in release mode and "strip -s"ed:
1
2
3
4
5
6
7
10/18/2009  02:46 PM    <DIR>          .
10/18/2009  02:46 PM    <DIR>          ..
10/18/2009  02:32 PM               752 a.c
10/18/2009  02:46 PM             8,704 a.c.exe
10/18/2009  02:18 PM               590 a.cpp
10/18/2009  02:45 PM           450,048 a.cpp.exe
               4 File(s)        460,094 bytes

:^]
Last edited on
Wait a second. How does list_next() know what the next node is from a data pointer? Shouldn't it need a node pointer?
A basic node looks like this:
1
2
3
4
5
6
struct node_t( T )  /* not real code */
  {
  node_t( T )* next;
  node_t( T )* prev;
  T            data;
  };

Conversion between a node_t( T )* and a T* is easy.

:^>
Last edited on
Oh, you sneaky bastard.
AARRGGHH!.

I want to make it possible to do things like
 
list_append( int, numbers, 42 );

The back end code requires a pointer to the object to append... so what I typically do is something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#define list_append( T, list, data ) \
  do {                               \
    T datum = data;                  \
    duthomhas_list_append(           \
      (any_list_t*)&(list),          \
      (void*)&datum,                 \
      sizeof( T ),                   \
      false                          \
      );                             \
  } while (0)

#define list_append_copy( T, list, data ) \
  do {                                    \
    T datum = data;                       \
    duthomhas_list_append_copy(           \
      (any_list_t*)&(list),               \
      (void*)&datum,                      \
      sizeof( T ),                        \
      true                                \
      );                                  \
  } while (0)

unsigned long duthomhas_list_append(
  any_list_t* list,
  void*       data,
  unsigned    size,
  bool        copy
  );

What this means is that the user can pass in anything that evaluates to the proper value type as the third argument to the macro.

However, with macros that return value I need to be sneakier. Remember, now, that I am eschewing all GCC-isms. Remember also that the macro argument must not be evaluated more than once.

The first trick is to convert the value to something I can take the address of. Then I push that value onto a special private stack, where it can be accessed during the actual backend function call. The result of the function call (a pointer to a node) is added to the stack. Finally, the stack is cleaned up, returning the resulting node pointer.

Yes, it is a dirty trick. The following should do it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/*----------------------------------------------------------------------------
 * DUTHOMHAS_VALUE_TO_POINTER()
 * duthomhas_push_value()
 * duthomhas_push_node()
 * duthomhas_last()
 * duthomhas_pop()
 *
 *   Used to return values from a complex macro expansion.
 *   There is no need to call these directly.
 *
 *   The DUTHOMHAS_VALUE_TO_POINTER() macro converts a value to a binary copy
 *   of that value in a volatile spot. (Hence the need to next create a copy
 *   on our private stack.)
 *
 *   See the implementation in "generic-list.c" for more documentation.
 */
#define DUTHOMHAS_VALUE_TO_POINTER( T, value )          \
  ( (void)((*((T*)duthomhas_val2ptr__result) = value)), \
    (void*)duthomhas_val2ptr__result                    )

extern unsigned long duthomhas_val2ptr__result[];

void* duthomhas_push_value( unsigned size, void* value );
void* duthomhas_push_node( void* value );
void* duthomhas_last_value();
void* duthomhas_pop_value_node();

My question is that lines 17 through 19 have been producing a perplexing warning message from GCC 4.3.0:
warning: value computed is not used

This is obnoxious for two reasons:

  1. It is standard C code to use sequenced operations.
  2. The value is used -- it plainly has a side effect.

I've been googling this for a while now, and other than 'GCC is weird' kind of stuff, I don't know exactly how to fix it. The left expression is cast to (void) -- which should shut the compiler up (according to GCC-related postings).

Am I just going about this the hard way? Would it be better to just make the user pass a pointer to the result, like:
1
2
int* iptr;
list_find( int, numbers, 42, &iptr );
instead of the desired:
1
2
int* iptr;
iptr = list_find( int, numbers, 42 );


Thanks again for reading. (It is late, so if I have been unclear, let me know.)
1. "Duthomhas"?
2. Is it really necessary to use void * as the pointer-to-data type in the interface? I thought DECLARE_LIST defined all functions with the correct data type.
3. I don't understand why you need DUTHOMHAS_VALUE_TO_POINTER. It would probably help if you showed how you're using it.
1. "Duoas" is short, and easily readable and pronounceable, version of Dúthomhas.

2. No, to get true generics would mean having to compile the cpp file a whole bunch of times. This library only emulates it for the user. The back-end functions all work on a pointer to the data, which could be anything. The front-end macros do the work of making types match.

3. You know, I was going to type-up a big explanation of all the tricky stuff I had to do to make this work... but it finally dawned upon me that I didn't surround the body of my macro with parentheses. That is, I had:
1
2
3
#define foo( x ) \
  barf( x ),     \
  quux( unbarf() ) 
instead of:
1
2
3
#define foo( x )     \
  ( barf( x ),       \
    quux( unbarf() ) ) 

The exact details are a bit more involved, but not much more complex than that. When I've finished with my test suite I'll post the library for you all to peruse and enjoy|gasp in horror.

Thank you for helping me. (I wouldn't have figured it out had I not been trying to explain it here.)
Topic archived. No new replies allowed.