May 24, 2012 at 3:40am UTC
Count the number of occurrences of sequences of N (acquired through user input) or more consecutive 'T's in a string consisting of the characters a,c,g, and t, and report this as a whole number. For example, actttaattttactttcctta has 3 poly-t sequences of length 3 or more, and only one of length 4 or more.
Last edited on May 24, 2012 at 3:56am UTC
May 24, 2012 at 10:43am UTC
You can use standard algorithm std::search_n. Let consider your example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
const char a[] = "actttaattttactttcctta" ;
const char c = 't' ;
int count = 0;
auto pos = a;
auto end_pos = a + std::strlen( a );
int size;
std::cout << "Enter size of the subsequence: " ;
std::cin >> size;
do
{
auto start_pos = pos;
pos = std::search_n( start_pos, end_pos, size, c );
if ( pos != end_pos )
{
count++;
std::advance( pos, size );
}
} while ( pos != end_pos );
std::cout << "count = " << count << std::endl;
After marking some changes I hope that the code will work.
The code can be simplified by removing of variable pos.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
const char a[] = "actttaattttactttcctta" ;
const char c = 't' ;
int count = 0;
auto start_pos = a;
auto end_pos = a + std::strlen( a );
int size;
std::cout << "Enter size of the subsequence: " ;
std::cin >> size;
do
{
start_pos = std::search_n( start_pos, end_pos, size, c );
if ( start_pos != end_pos )
{
count++;
std::advance( start_pos, size );
}
} while ( start_pos != end_pos );
std::cout << "count = " << count << std::endl;
Last edited on May 24, 2012 at 11:02am UTC
May 24, 2012 at 11:52am UTC
@Vins3Xtreme
I'd like to note that in my opinion the statement
pos += n+1;
is incorrect. Shall be
pos += n;
Last edited on May 24, 2012 at 11:56am UTC
May 24, 2012 at 12:52pm UTC
Nice. Looks like a perfect case for a Scala one-liner:
"t+" .r.findAllIn("actttaattttactttcctta" ).filter(_.length >= 3).size
Or even shorter:
"t+{3,}" .r.findAllIn("actttaattttactttcctta" ).size
Last edited on May 24, 2012 at 3:23pm UTC
May 24, 2012 at 1:07pm UTC
It seems that I suggested incorrect method!:)
Let consider a string that contains 6 't':
sttd::string s = "tttttt";
And we are going to count the number of occurences of sequences of 3 't'. In fact the original string does not contain any sequence of 3 't'. It contains only one sequence of 6 't'. Meantime my code returns 2 sequences of 3 't'!
So some other method is required.
May 24, 2012 at 1:09pm UTC
It's not all that much longer in C++
1 2 3
string in = "actttaattttactttcctta" ;
cout << distance(sregex_iterator(in.begin(), in.end(), regex("t{3,}" )),
sregex_iterator()) << '\n' ;
Last edited on May 24, 2012 at 1:24pm UTC