Converting String to Numerical Value

Hi all,

This code snippet below

1
2
3
4
5
6
// skip previous process
  for (unsigned i = 0; i < DNAStrings.size(); ++i) { 
        // Do sth with DNAStrings[i]
        cout << DNAStrings[i] << endl; 
    }


prints the output exactly:

1
2
3
4
AAA
TTT
CCC
GCC


My question is that how can I write a function
that takes DNAStrings[i], creates an array
for each line where each position of string will contain
numerical version, A=0, C=1, T=2, G=3.
Thus yielding:

1
2
3
4
[0,] 0 0 0
[1,] 3 3 3
[2,] 1 1 1
[3,] 2 1 1
Simply create an array of the same size as the original (if the size of the array can vary, you'll need to use dynamic memory allocation). Then just use an if-else, or switch statement, to operate on each individual character. Here's an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
    for (unsigned i=0; i<DNAStrings.size(); i++) {
        cout << DNAStrings[i] << endl;
    }
    cout << endl;
    
    //Allocate memory for a new 2-d array
    unsigned int** newArray = new unsigned int*[DNAStrings.size()];
    for (int i=0; i<DNAStrings.size(); i++) newArray[i] = new unsigned int[DNAStrings[i].size()];

    /* Note: if this array is a fixed size every time (like 4x3 in this case), you can just simply do this:

       unsigned int newArray[4][3];

    */

    //Now fill the array with associated values
    for (int i=0; i<DNAStrings.size(); i++) {
        for (int j=0; j<DNAStrings[i].size(); j++) {
         
         /*This also works:
         if      (DNAStrings[i][j] == 'A') newArray[i][j] = 0;
         else if (DNAStrings[i][j] == 'C') newArray[i][j] = 1;
         else if (DNAStrings[i][j] == 'T') newArray[i][j] = 2;
         else                              newArray[i][j] = 3;
         */

            switch(DNAStrings[i][j]) {
         
               case 'A':
                   newArray[i][j] = 0;
               break;
  
               case 'C':
                   newArray[i][j] = 1;
               break;

               case 'T':
                   newArray[i][j] = 2;
               break;

               case 'G':
                   newArray[i][j] = 3;
               break;
               
            }
         }   
    }
    
    for (int i=0; i<DNAStrings.size(); i++) {
        for (int j=0; j<3; j++) cout << newArray[i][j] << "";
        cout << endl;
    }


And the resulting output is:

AAA
TTT
CCC
GCC

000
222
111
311



And then of course, if you are using dynamic memory, don't forget to use delete[] to free the memory from each array when you're done using it, as follows:

1
2
3
4
5
6
7
for (int i=0; i<DNAStrings.size(); i++) {
   delete [] newArray[i];
   newArray[i] = 0;
}

delete [] newArray;
newArray = 0;


If all of that was too little or too much, let me know if you're still having trouble.
Last edited on
It's no array, but this might help:
1
2
3
4
5
6
7
8
9
10
11
12
#include <map>
//...
map<char, int> lookup;
lookup['A'] = 0;
lookup['C'] = 1;
lookup['T'] = 2;
lookup['G'] = 3;
//...
for( int i = 0; i < DNAStrings.size(); ++i )
{ 
    cout << lookup[DNAStrings[i]] << endl;
}
Last edited on
Topic archived. No new replies allowed.