I am looking to optimize the speed of the following code as much as possible.
I'm from Java world so thing isn't my kind of thing...
Any help or tips would be appreciated.
I would prefer to use stl string's as oppose to char*, but if there is a big performance difference I'm open to char*.
That's a dangerous way to do it. The reference need not be to the internal data.
The least dangerous way to do that dangerous dangerousness is to use data() directly, as all known implementations simply return the string's internal buffer. (You'll have to const_cast<CharT*> it, though.)
But the standard doesn't guarantee that either data() or c_str() return anything you can manipulate, so it is better just to use assign() or append().
You're still calling wcslen in the loop, and wcsncat_s twice in the loop. These walk the string each time, and and the string gets longer with each iteration giving you an O(n^2) algorithm.
Maintain the end pointer yourself and copy the characters yourself.
You need to alllocate memory dynamically for the buffer and grow it when you reach the end, or start with a large-ish fixed buffer and drop out of the loop when it's full. This will yeild a linear algorithm, directly proportional to the total number of characters you need to process.