Character Array Manipulation Issues

I'm creating a large library of functions to help C++ users with a bunch of basic operations. It's similar to boost, but more general purpose functions and is much more user friendly. So far I've got a ton of the libraries to work, however my character/string manipulation library is having issues... All of the functions which return the static char string are over-allocating the character array... That is, when I execute the function I receive the correct character array I want, for the first half, the second half is a bunch of garbage because it seems to have allocated more memory than it should have...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#define ACAML_UPPERBEGIN 65
#define ACAML_UPPEREND 90
#define ACAML_LOWERBEGIN 97
#define ACAML_LOWEREND 122
#define ACAML_CAPSDIFFERENCE (ACAML_LOWERBEGIN - ACAML_UPPERBEGIN)
static char *toupper(char *string1) {
	unsigned short l1 = strlen(string1);
	static char *string2 = new char[l1];
	memcpy(&string2[0],&string1[0],l1);
	for (unsigned short i = 0; i < l1; i++)
		if (string2[i] >= ACAML_LOWERBEGIN && string2[i] <= ACAML_LOWEREND) string2[i] -= ACAML_CAPSDIFFERENCE;
	return string2;
}
void main() {
	cout << toupper("Hello World!") << endl;
}


It copies the specified character array to a new one so it can manipulate it separately, then steps through it and changes the letter's case. Is there a better method to allocate the memory? Malloc perhaps...? I already checked that strlen does return the # of character expected.
Last edited on
Couldn't you just do:

 
std::transform(start, end, &mytolowerfunc);


Where mytolowerfunc is a function that makes a character lowercase (like Cs tolower())

Oh btw, why are you using void main()? Use int main().
Last edited on
That function was only an example, I have like 30 functions total, nearly all of which have no built in c++ macro or std function that does the processes for me. The transform function from what i see executes another function on each iteration of the supplied array/vector/list/stack/etc, which means it is called outside the function (Or I have 2 functions), which is NOT user friendly (I want it to be all-in-one). And since it seems to use iterators and the like and does the operation on each iterator, it can't possibly be more efficient than having my function do the operations in a single lower-level loop as opposed to switching between the multiple memory locations (2 arrays+transform <-> tolower) for each iteration, when you get down to the levels of CPU cache. Does transform even work on a standard array, or just a STL class with their iterator? Tell me if I'm thinking of this wrong.

Void and int main makes absolutely no difference, seeing as I am not having the program return a value. All int does is make me need to return a value, which is one line of unneeded code.

My question was why is the function I provided over-allocating memory.
Last edited on
Void and int main makes absolutely no difference, seeing as I am not having the program return a value. All int does is make me need to return a value, which is one line of unneeded code.

-_-
You do not copy an EOS character, therefore the output string seems to be longer. You should do something like:

1
2
static char *string2 = new char[l1 + 1];
memcpy(&string2[0], &string1[0], l1 + 1);


Also, 'static' is not needed here:

static char *string2

nor there (how can an interface be local?):

static char *toupper(char *string1) {

Also, caller of this function must remember to deallocate memory. I would not call such design "user friendly".
The function and string2 were static because I was using it in threading, and it needed static to be able to receive the returning value. Also I had the function static because I had it in a .hpp file, which was included in a few .cpp files so the linker would complain because the function was already defined (Once in the main thread, once in each individual thread spawned thereafter I believe), static made it only defined once. Ok, thanks for reminding me I need the null 0. And about it returning a pointer, it's still fairly user friendly. Sure you need to remember to delete the pointer when you're done, but at least you don't need to do all the work. You're sending the function character pointers anyways, which means chances are you're already using pointers. Also, using the example of sprintf, it makes you allocate a character string buffer pointer beforehand, then pass it as an argument which it will populate. My function is more or less skipping one step by allocating the array for you, and returns it instead of requiring it as an argument. And in addition, I have a memory management library included in the set, so users can easily tell if they have memory leaks. Making a string v2 class would work if i wanted to add all the manipulation functions to it, but it's not as low-level and probably less efficient. If any of that made sense.
Last edited on
I think that a string class with "reference counting / copy on write" mechanism implemented would be as efficient, and more user friendly, but that's just my opinion. Anyway, good luck writing your library.
It's similar to boost, but more general purpose functions and is much more user friendly.


IMHO, no, and no. It is not more general purpose than boost. It is not more user-friendly either; it may
be that it is easier to understand for programmers who are not comfortable with templates.

If toupper() is to return a new dynamically allocated string, then you should pass string1 as a
const char*, not a char. But I hardly consider it user-friendly to require the user to free the
memory allocated by toupper().


And since it seems to use iterators and the like and does the operation on each iterator, it can't possibly be more efficient than having my function do the operations in a single lower-level loop


This isn't necessarily true. Your algorithm is far from optimal because:
1) You use new, which requires memory to be allocated. This is a linear walk of the heap to find a free block
large enough to suffice the request.
2) You walk the input string linearly once to do strlen().
3) You walk it a second time to do the memcpy().
4) You walk it essentially a third time (albeit the destination string, not the source, but still it walks the same
number of bytes) to perform the upper-case conversion.

Furthermore, using std::transform() can actually be faster thanks to inlining. If the compiler can see the
body of the function when it is compiling the call to std::transform(), the compiler can inline the body of
the function directly inside the std::transform() instantiation, which itself could be inlined in your
function.


Does transform even work on a standard array...?


Yes, it does. It works on any input/output iterator type. Pointers match that criterion.


Void and int main makes absolutely no difference, seeing as I am not having the program return a value. All int does is make me need to return a value, which is one line of unneeded code.


It makes a difference in that void main() is not C++ standards compliant whereas int main() is. And, thanks
to all the C programmers out there who wrote int main() and then never bothered to write a return statement
in the body of the function, it is legal and compliant to omit the return statement from main(), even if it returns
int. So no, it does not force you to write an extra line of code.


Is there a better method to allocate the memory? Malloc perhaps...?


Well, going your route, char* strdup( const char* src ) copies the src string to a newly allocated buffer which is exactly large enough to hold the source string. You could use that in lieu of your own new/memcpy. However
by your standards, more efficient would be to new/malloc the memory, then walk the source string and populate the buffer with the correct characters eg,

1
2
3
char* buf = new char[ strlen( src ) + 1 ];
for( size_t i = 0; i < strlen( str ) + 1; ++i )  // + 1 causes us to process the \0 also
    buf[ i ] = str[ i ] >= 65 && str[ i ] <= 90 ? str[ i ] + 32 : str[ i ];



You're sending the function character pointers anyways, which means chances are you're already using pointers.


Ok, sure, but not every pointer is to heap memory. It is NOT user friendly, no matter how you slice it, to
force the user to remember to free memory your function allocated, regardless of how detailed your
memory manager is.


Making a string v2 class would work ... but it's not as low-level and probably less efficient.


Probably less efficient? Hardly. You won't see any difference in execution time between:

1
2
3
4
class my_string {
  public:
      my_string toupper() const;
};


and

 
char* toupper( const char* );








Very well put, jsmith.
I want to correct one of my examples above because as it is written, it is definitely less efficient than OP's
solution (and it also has a typo):

1
2
3
4
size_t length = strlen( str ) + 1; // strlen( str ) does not include \0, but we need one
char* buf = new char[ length ];
for( size_t i = 0; i < length; ++i )  
    buf[ i ] = str[ i ] >= 65 && str[ i ] <= 90 ? str[ i ] + 32 : str[ i ];


Previous code would have walked the source string strlen( str ) + 3 times. Corrected
example walks it twice.
Topic archived. No new replies allowed.