Given a string "teeter", the first non repeating character would be 'r'.
in "toothless", it would be 'h'.
I'm wondering about the most efficient way to get this done? One option is to use a hash table, with the characters in the string as keys, and frequencies of each character (key) as values. Then return the first key with the value "1" (meaning it only appears once).
This seems to be a good way of doing it, but how exactly would I implement a hash table or array in which the indexes are chars?
What would be the big-O complexity of this algorithm?
1 - How many times do you need to traverse the string to determine if a letter is repeated?
2 - For practical purposes, stick to the ASCII set (and a simple array). Time complexity won't significantly increase with a hash table, but coding complexity does.
No it's actually from the book "Programming Interviews Exposed." I'm preparing for an interview, and don't exactly get the explanation the book gave for this problem...
The book implemented the proposed solution in Java, using the HashTable class. I was wondering how to to this in C++.
I don't know how to use the ASCII set (I just know that it is a set of integers representing each character), or how to use that as the INDEX of the array.
The book also mentions that an array is good for ASCII strings that are large, but for Unicode strings it would be inefficient, and that Hash Tables would be great for small strings of any character set...
How would I even make an array where I can use the chars as indexes for look up?
I think the trick is to write a program that starts from the center character and moves outwards, therefore 'r' would be the first non repeating character in teeter, and 'h' would be the first non repeating character in toothless, 'l' being the 2nd, 'e' being the 3rd. a single pass should do it
But how would you know if a character is repeating? More importantly, what about a case such as "151823440"? If I understood properly, your algorithm would return '2' as the first nonrepeating char, but the correct answer is '5'.
But how would you know if a character is repeating?
You are going to have to count them. Let's take an "algorithms class" approach to the problem:
Let's start with a simple algorithm:
for each character,
traverse the string counting its number of occurrences
if it was not repeated, break, otherwise repeat
This is an algorithm, with quadratic efficiency or O(n2), similar to a bubble sort where there would be two nested for loops. It can solve the problem but is not the most efficient.
A better algorithm is to make a single pass counting characters and storing their occurrences in some kind of container, such as a hash table, in one for loop and then extracting the data from the container. As Duoas said, this would be linear efficiency or O(n).
It's often useful to think about an algorithm in simple terms and then optimizing it to improve its efficiency.
I was looking for code to practice today and remembered this topic, I wrote the following code, and even though it has goto statements all over the place, it works. I could have used functions but I was just messing around with routing possibilities. It has a single pass through the string that will first check the characters already disqualified, then check against the characters not yet tested for non repeating status. Once a non repeating character is encountered, the program outputs the result and exits, it will preserve the case of the character, and it will also output the appropriate message if there are no non repeating characters present in the string. Its kinda neat. Anyhow, here it is;
#include <algorithm>
#include <cctype>
#include <functional>
#include <iostream>
#include <iterator>
#include <map>
#include <string>
usingnamespace std;
struct count_t: public map <char, int>
{
// This container associates each insert()ed character with
// the number of times it has been inserted.
//
// You can test whether or not a character is listed in this container
// with an insert count of one using the match1() method.
//
typedef map <char, int> ::iterator iterator;
void insert( char c ) { (operator [] ( c )) += 1; }
bool match1( char c ) { return count( c ) ? ((operator [] ( c )) == 1) : false; }
};
int main()
{
count_t counts;
string s, t;
cout << "This program finds the first alphanumeric character that has no repeat.\n""Please enter the text to evaluate:\n""> "
<< flush;
getline( cin, s );
// We are only interested in alphanumeric characters, so we'll populate our
// counts using a copy of the user's input string where all unwanted
// characters have been removed.
remove_copy_if(
s.begin(),
s.end(),
back_inserter( t ),
not1( ptr_fun <int, int> ( isalnum ) )
);
// Count the occurrances of each char in the modified string
for_each( t.begin(), t.end(), bind1st( mem_fun( &count_t::insert ), &counts ) );
// Now find the first character in the original string whose count == 1
string::iterator found = find_if( s.begin(), s.end(), bind1st( mem_fun( &count_t::match1 ), &counts ) );
if (found == s.end())
{
cout << "Your text does not contain any non-repeating alphanumeric characters.\n";
}
else
{
cout << string( distance( s.begin(), found ) + 2, ' ' ) << "^here\n"
<< "The first non-repeating character is '" << *found << "'.\n";
}
return 0;
}
This program finds the first alphanumeric character that has no repeat.
Please enter the text to evaluate:
> Happy, Happy, Joy, Joy, We're not happy enough!
^here
The first non-repeating character is 'W'.
A few notes about the algorithm. It is not possible to solve this problem without some form of lookback. What that means is that the input string cannot be traversed only once. At worst, it must be traversed at least twice. Thats O(n+n) == O(2n) == O(n).
My program above traverses the string three times, because of the extra step on lines 37-42 to remove all non-alphanumeric characters. This could have been combined into the next step (line 45) by adding the alphanumeric condition into the insert method (line 19), but that is less general. Either way would have been fine. It is just that this way allows more convenient control over which characters are ignored and which are not -- it keeps the two considerations (which characters to ignore and counting given characters) separate.
My program also makes full use of the STL to do stuff for me, instead of doing it myself, so it is much easier to read and maintain. The purpose of a high-level language is to enable programmers to think high-level thoughts instead of pretending to be an assembler. Two months from now I (or anyone else) can still come back and instantly understand what it is that this code does.
My generous use of whitespace and commentary makes it look longer, but it is actually shorter code also.
cool, thanks for the code, I like that you are only interested in alphanumeric characters, I didnt think of that. It was a fun project in any event Duaos
this is madness!! http://www.youtube.com/watch?v=AZPRf8qL8h0
also I appreciate and understand the concept and importance of object oriented programming, polymorphism, encapsulation, & inheritence. I am just beginning to learn about classes now, so I'm still early on, so even though my code is often way out there, it helps me get more familiar with the syntax and flow of the simpler commands and arguements. And also I totally agree with your assesment that the goal is high level language needs high level thoughts, so thank you for that reminder i need a kick in the butt from time to time
:)
You could store, in addition to the count, the offset in the string of the first occurrence of the character. That would
eliminate the extra traversal, but then you'd need a boost::multi_index_container so you avoid a linear search to
find the "left-most" unique character.
Just a thought...
And since I particularly like celebrating boost when it comes to providing homework solutions, you could also
eliminate some of the mem_fun stuff by using boost::lambda :)
Hmm, that's an interesting thought. (Which I had not thunk.)
Hey, BettyBoopTS, don't take it too hard. I'm not trying to beat-up on you. I get a good kick in the pants often enough myself.
For example, I am currently working on my wife's business webpage. I did something beautiful, using pure CSS and HTML (no stinkin' JavaScript). Then I began testing it in major browsers. Guess what?
What the heck!? I still don't know how to make IE stop goobering my images. So now I'm rewriting using stupid image maps and js for IE. "Internet Standards" indeed. Loopy bunch of nonsense... @#$()*!@&#(*@&%)!!&^%&#(@!
What I'm trying to decide right now is whether it is worthwhile to maintain two separate versions of the home page (which automatically switch to the JS version for IE users), or whether I should give up the beautiful CSS and just use the JS version for everyone. (The vast majority of her clientel will be likely be using IE with JavaScript in its default-enabled state.)
Too bad I'm not an HTML/CSS guru -- I barely know what I am doing to begin with and random nonsense like this just ticks me off. I mean, cummon, it isn't like my code isn't valid CSS3 and strict HTML DTD 4.01 or anything... That ought to count for something...
I don't take it hard but i am always willing to take advice : ) .
If this doesnt work out I can always open a restaurant/ bar called "The Last Byte" with user stations built into the table at every seat, and all along the bar,,serve drinks like "The infinite loop" that could be much likea long island ice tea",the "Semantic sugar"(gotta check if Helios has a copywrite on that statement first though)which would be like a Mint Julep,, the "srand', where the bartender adds random liquors and ingredients to the drink, stuff like that. Then of course the food,,, starting of course with the signature "The Last Byte ",, a 16 ounce steak with the fixings; "Spaghetti Logic ", spaghetti of course,'' order of french fries would be called 'an array of 'integers', and onion rings would be 'string of characters', a kiddie meal could be called a 'newbie', and many others!.
hmmmm,, probably have to move to Cambridge or the Silicon Valley to have it truly be succesful.
I read the other posts...
Here is a simple code but given the purpose exposed by "stingblah" in his posts, i think it woud do or at least give u a hint, I hope.... :)
You would add some function to compare if lower or capital letters maybe....
Also if someone knows a better way to get out of the inner loop without the goto-label it'd be highly appreciated.... Cottections are welcomed!