How i can learn to address particular issues in my software??

Hello. My name is Enrique. I am having issues in my source code. I am currently a novice software programmer. I am having issues with string handling.

I am making a raw translation engine, that just replaces words from one language to another. I am having a particular fail.

I want to know first, if there are ways to isolate issues. I don't feel confortable releasing my source ocde into internet, i just want a way to be able to isolate the problem and just share the particular code that is giving me problems.

Do you know any way to do so?? It is a very tiring task. I want to keep protected my rights over the application, i just want to share a particular point. The app is no something special, is just a way to start to build a real career in programming. Real companies don't release their source codes into internet, i should to avoid that.

My particular issue is the following.

I have a map container having all words of a dictionary english-spanish. That works fine, all words were sucessfully recovered.

Next. I separate words in text using strtok, and the string returned is right. But when i try to concatenate everything in each iteration the string only stores the last word. I have tried with a char array and i have the same behavior, i replaced it by a string to avoid mistakes and it behaves equal.

It is possible that a bad ending character is causing troubles?? Something is failing in the ocncatenation. That is my hypothesis. The string data is well, but a bad terminator can be causing problems.

Let me know, if a string has its own terminator, and later i add it to another, they fails??

How i can protect my source code?? I need real support for my programming career, i have no professional people near to me, no yet. I want to keep safe my projects, because they are important to me.

How can i be a great programmer by myself?? Specially, how i can debug potential bugs?? Do you have any book explained such kind of things??

Also, i am using Codeblocks. Codeblocks is a good choice?? I am also using MinGW. It is a proper choice?? I don't use Microsoft because it is paid, and i have no money to purchase things. I have notice reciently a free version but i have always problems running it into my computer.

Thanks for your help.

Give me books explaining how to isolate programming errors by myself.

EDIT. Here is a part of source code. Assume that dictionary map container was successfully loaded. I have tested it and it works properly.

char *current_word = NULL;
cout << "TRADUCCION INSTANTANEA..." << endl;
current_word = strtok(content, " ");
string traduccion_final = "";
while(current_word != NULL){
cout << "current_word: " << current_word << endl;

string palabra = current_word;
cout << "palabra = " << palabra << endl;

string contraparte_en_diccionario = (string) dictionary_corpus.find(palabra)->second;

cout << "contraparte_en_diccionario: = " << contraparte_en_diccionario << endl;
const char *const_string = dictionary_corpus.find(palabra)->second.c_str();
cout << "const_string..." << const_string;
traduccion_final = traduccion_final + contraparte_en_diccionario;
cout << " " << endl;
current_word = strtok(NULL, " ");
}

cout << "Traduccion resultante: " << endl;
cout << endl;
cout << traduccion_final;
cout << endl;

Last edited on
> Next. I separate words in text using strtok, and the string returned is right.
> But when i try to concatenate everything in each iteration the string only stores the last word.
OK, so I imagine you have a function like 'getNextWord' or 'splitLineIntoWords'.

So post this code.
1
2
3
4
5
6
7
8
9
? someFunction(params) {
    // your suspect code.
}

int main ( ) {
    //char line[] = "This is a line";
    //std::string line = "This is a line";
    result = someFunction(line);
}

Basically, you extract the code you think is causing a problem, and you create a smallest possible test case around it.

For one thing, it might actually help you figure out the answer to your question in the first place.
For another, if you can't figure it out, and it's still showing the problem, you have a nice simple program to post online.

The fact that you mentioned you're using strtok() in a C++ program tells me you're doing something wrong.
Also, i am using Codeblocks. Codeblocks is a good choice?? I am also using MinGW. It is a proper choice??
Yes, nothing wrong with that.

I don't use Microsoft because it is paid
Actually, there is a 'community' version of visual studio which is free, but there are not too many reasons to use it.

How can i be a great programmer by myself?? Specially, how i can debug potential bugs??
I guess the best way is learning by doing. Books may help, but this is limited since they can't know your specific problems.

Codeblocks comes with a debugger. Learn to step through your code line by line.
How i can protect my source code?? I need real support for my programming career, i have no professional people near to me, no yet. I want to keep safe my projects, because they are important to me.

Can you explain a bit more about what you mean by this? Do you mean that you want to make sure your code doesn't accidentally get deleted? That you want to stop other people from looking at it? Something else?

Codeblocks comes with a debugger. Learn to step through your code line by line.

Words to live by. 90% of the problems people ask for help with here, they could solve for themselves if they just learned to use their debugger.
Real companies don't release their source codes into internet

Intel and Microsoft have open source projects. Are they not real?
Red Hat's business is mostly based on open source. (Is yearly revenue of $3.4 billion "real"?)

I want to keep protected my rights over the application

Creator of code has copyright and thus right to decide the license terms that dictate what others can do with the program.

If someone violates the license under which they have access to your program(code), then you can sue them.
I edited the question and added the particular code that causes me problems.

When i talk about keep safe i am talking about the rights over the application. I don't want my app to get open sourced, i just only want to get support. I study at a university faculty but i don't have find people interested in help me privately in my source code, then i talk here. Here is a good place, but it is public. When i will build a real career with a real product of software i need to take care about my own source code, that in a future when i start my serious company of software.

Going back to source code. In each iteration the text words are properly replaced with its counterpart in dictionary. The only problem is where i try to concatenate them.

Why a concatenation can fail?? Strings are already in existance, but i don't know if they are properly terminated. The process of breaking in tokens can have lead me to potential bugs.

My main hypothesis is a bad concatenation due to bad null terminators.

I will to test it printing each value and see if a null terminator is bad placed. Let me a time to test it. Meanwhile give me ideas about how to address the issue.
https://imgur.com/a/M1ff071 Here the output of program
First, please use code tags. See http://www.cplusplus.com/articles/jEywvCM9/

The essential bits of your code:
1
2
3
4
5
6
7
8
9
char *current_word = strtok( content, " " );
std::string traduccion_final;
while ( current_word ) {
  std::string palabra = current_word;
  std::string contraparte_en_diccionario = (std::string) dictionary_corpus.find( palabra )->second;
  traduccion_final += contraparte_en_diccionario;
  current_word = strtok( nullptr, " " );
}
std::cout << "Traduccion resultante: \n\n" << traduccion_final << '\n';

1. Why do you use strtok and C-strings? Why not std::string for all steps?
2. What is the type of content?
3. What is the type of dictionary_corpus?
that is a lot of questions.
-- keeping your code safe is not too hard. Write your code on a machine that is not on the internet, and it cannot be hacked. That is the most basic form of security, and I dealt with this for many years on some very important code that had to be protected at all costs. Its annoying, but a side machine or phone can get you internet code snips and help etc, and a usb stick can copy the examples even, but as soon as a machine is connected, its at risk, and you have to go to (hard) work to keep nosey people out (which is a full career in and of itself!). Buy 2 portable hard drives, and back up to them daily, and swap them out frequently. One of them should be kept elsewhere such that its safe from a fire, flood, tornado, etc. There are other ways, but this one is assured to keep the code safe and is inexpensive, if a little hands-on. The only way to get your code is to steal the hardware, and if its also encrypted and locked down and your home/office is relatively secure, you should be good. This is an extreme paranoid approach, but you did ask, and again, its simple and cheap.

--debugging is half science and half art. Learn to use a debugger. Learn to log your code into a file (entering function x with parameters a,b,c ... exiting function x with results d,e,f, ... ) that you can trace. Learn to break code into small parts that can be tested completely and assemble those into larger pieces that, if they do not work, the fault is in the assembly not the bits you tested. Multithreaded debugging is a nightmare, so hone your single threaded skills before going there. Also, the key to finding a bug is, well, step 1 is to know you have a bug. how do you know its there? Testing carefully. Its hard to test your own code carefully, because people have blind spots, its better to have several friends look at it too.

-- releasing debug executables or libraries can expose your code. Always distribute only release, debug info free products.

stop using C in C++. If you want to write C, that is another story. There are places where that is appropriate, but if you can't defend your choice of using C, then use C++.



Last edited on
2.content uses char *
3.dictionary_corpus uses map<string, string>

I use C strings because i am not a profesional programmer and i have experimented before with these kind of strings. Do you have an alternative solution to the process of parsing tokens?? Always when i look on internet i find strtok, including a book that i have purchased already have them.

The problem can get solved if i use C++ approach?? Please guide me in this aspect. The logic of app is well done, i will replace it by a C++.

Thanks for your help.
I don't think there is a C++ single function to tokenize but you can re-create it in just a few lines. Ill let you try it first...
make a vector of strings.
take substrings using find against space or whatever delimiter you want.
push-back the substrings to the vector.
its probably doable in a line or two but even if you do it explicitly with 5 or 6 lines its not too hard. And you can tailor it to your use-case, if needed.

you can also use strtok on string.c_str() if you can't find or make a satisfactory answer.

there is no reason to rewrite a bunch of code that already works just to make it pretty or modern. Leave working code alone. If you need to do something else to it (besides how it looks), then clean it up at that point or re-write it with the skills you picked up since the last time you edited it.
Last edited on
One can read from a stream word by word:
http://www.cplusplus.com/reference/sstream/istringstream/istringstream/
the formatted input and strtok() do skip whitespace similarly.


Note that you don't handle the failure to find a word from dictionary.
The std::map::find() can return map::end()
http://www.cplusplus.com/reference/map/map/find/

The end() "shall not be dereferenced":
http://www.cplusplus.com/reference/map/map/end/

The dictionary_corpus.find( palabra )->second is thus extremely dangerous.
> you can also use strtok on string.c_str() if you can't find or make a satisfactory answer.
No (c++98) or maybe? (c++11).
http://www.cplusplus.com/reference/string/string/c_str/
The pointer returned is a pointer to const data, so you shouldn't really be altering the internal data of another object in this way.

You can build something around http://www.cplusplus.com/reference/string/string/find_first_of/ which is basically the same interface as strtok(), without the messy string altering behaviour.

If you add http://www.cplusplus.com/reference/string/string/substr/ and make a little tokeniser class around them, you should be able to create a nice usable interface for yourself.
good point, I havent used strtok in a while and forgot it modifies the original. About the only C functions I have not let go is strstr, because it can be treated as a boolean for 'yes, that is in this string' while find is unfriendly about it.
Thanks for everyone. My thesis was right. I Isolated my source code in the tokenizing of words from dictionary and i get a good results.

I added a new error, but i know where the things are going. I decided to null terminate my string by myself and it fixes it.

I need to handle special cases like lines empty in dictionary and so on. Thanks for your help. I decided to consider this question closed, at least in this case. I am releasing free code in my github github.com/enriquemesa8080 (Others code, not this)

Do you know where i can get a prepared dictionary english-spanish word-to-word??

I want to have a dictionary english to spanish word per word but it seems expensive to build. Do you know where i can download a free dictionary?? This can be considered another question.

Thanks for all for your help. I decided to remain the source code. A principle of programming is : If works , don't touch it.

You can visit my public website at enriquemesa8080.zz.com There you can download some interesting things. Let me know what do you think about my website.

i don't know if such a thing exists in a downloadable format. You would have to spend some time searching.

you can however use any of many free tools that can translate a single word to a word. then you could loop over an english dict and generate the file you need.

hopefully you understand that such 'translation' is nearly useless? One of my favorite examples of why this is awful is "the flesh is weak but the spirit is willing" -> the meat is bad but the vodka is available!

or, in spanish, como mucho? (wanted: how much? got: I eat too much).
Last edited on
Topic archived. No new replies allowed.