So I'm working on a series of string classes which work with UTF encoded strings.
I'm not working on the 'substring' functions and am a little torn as to the right way to do it.
My first thought was to have the following functions:
1 2 3 4 5 6 7
|
utfstr s = "01234567";
cout << s.Left(3); // 012
cout << s.Mid(3); // 34567
cout << s.Mid(3,3); // 345
cout << s.Right(3); // 567
cout << s - 3; // 01234
| |
I was satisfied with this approach until I realized that these functions use indeces, and indeces can't really be used anywhere else with these strings (since UTF encondings have variable width codepoints, so they're not random access friendly).
So pretty much, working with these strings means you'll have to use iterators. Find functions and the like will all be returning iterators instead of indeces, so it makes more sense to have these substring functions take iterators, right?
But this brought up some other issues:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
utfstr s = "01234567";
utfstr::iterator i = s.begin() + 3;
cout << s.Left(i); // 012 .. pretty straight forward
cout << s.Mid(i); // 34567 .. same
// this one is the problem
cout << s.Mid(i,3); // 345 .. could work. uses an integral count... is that okay?
cout << s.Mid(i,i+3);// 345 .. would this make more sense?
// the problem with this, though, is now instead of the 2nd
// param being the 'length' as with the previous function
// it's now the 'end' which is inconsistent.
// these other two are a little perplexing as well
cout << s.Right(i); // ??? this is nonsensical. Should I just omit this?
cout << s - i; // ??? same
| |
Here's the other thing. I don't want to completely omit the index versions since they still have practical use. (wanting to see if the first few characters of a string are something specific, for example).
What do you guys think? Which way should I go with this?
EDIT:
also, before you recommend going with
substr
instead of Mid for the above function -- I'll probably have substr in addition to the above functions (it'll probably just call Mid()).
EDIT2:
And apparently I've been spelling "indexes" wrong for years now. Ignore that please. ^^. I coulda swore it turned to a c when you made it plural.