Linear Regression with gradient descent

Hi , I have a problem . I need to write C ++ code , where the program will download data X and Y from data.txt. Data.tx contains the dataset for our linear regression problem . The first column contains data relating to the sulfur concentration in alcohol, and the second column contains distribution data relating to alcohol. The purpose of the linear regression is to minimize the cost function . I totally don't know how to do it. Maybe I can find some good tutor ? :)
Monika
Your professor is Evil. (This is a higher-level, algorithms class problem.)

Here's some help understanding how it all works, plus a link to some Python code you can play around with before writing your own C++ version.
http://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/

Post back when you get stuck again.
nice link
I can't translate it on C ++
:((
Hello, looking at it quickly, the "most difficult part" to translate would be the load data, give you a hand here;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
std::vector< std::pair<double, double> > points;

std::ifstream in("./data.csv");
std::string row;

if (in.is_open()) {
    while(std::getline(in, row)) {
        double x, y;
        sscanf(row.c_str(), "%lf,%lf", &x, &y);
        points.push_back(std::make_pair(x, y));
    }
}

//
// points[n].first > x
// points[n].second > y
// 


1
2
3
4
5
6
7
8
9
10
11
12

// a prototype suggestion to start with 

std::pair<double, double> gradient_descent_runner
(
    const std::vector< std::pair<double, double> > & points,
    double starting_b,
    double starting_m,
    double learning_rate,
    std::size_t num_iterations
);


then do the same for the other functions (identifying type signatures), and then we'll see.

EDIT we solved this piece:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

// reading the file and storing the points into a data structure

std::vector< std::pair<double, double> > genfromtxt(const char * filepath)
{
    std::vector< std::pair<double, double> > points;
    std::ifstream in(filepath);
    std::string row;

    if (in.is_open()) {
        while(std::getline(in, row)) {
            double x, y;
            sscanf(row.c_str(), "%lf,%lf", &x, &y);
            points.push_back(std::make_pair(x, y));
        }
    }
    return points;
}

// let's saving copies

void genfromtxt(
    const std::string & filepath,
    std::vector< std::pair<double, double> > & out_points
)
{
    std::ifstream in(filepath);
    std::string row;

    if (in.is_open()) {
        while(std::getline(in, row)) {
            double x, y;
            sscanf(row.c_str(), "%lf,%lf", &x, &y);
            out_points.push_back(std::make_pair(x, y));
        }
    }
}
Last edited on
When I entered your code to CodeBloks this pop me such communications ... What's happening?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

mingw32-g++.exe -Wall -fexceptions -g  -c C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp -o obj\Debug\main.o
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:1:1: error: 'vector' in namespace 'std' does not name a type
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:20:11: error: 'string' in namespace 'std' does not name a type
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:20:25: error: ISO C++ forbids declaration of 'filepath' with no type [-fpermissive]
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:21:10: error: 'std::vector' has not been declared
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:21:16: error: expected ',' or '...' before '<' token
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp: In function 'void genfromtxt(const int&, int)':
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:24:5: error: 'ifstream' is not a member of 'std'
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:24:19: error: expected ';' before 'in'
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:25:5: error: 'string' is not a member of 'std'
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:25:17: error: expected ';' before 'row'
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:27:9: error: 'in' was not declared in this scope
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:28:15: error: 'getline' is not a member of 'std'
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:28:32: error: 'row' was not declared in this scope
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:30:50: error: 'sscanf' was not declared in this scope
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:31:13: error: 'out_points' was not declared in this scope
C:\Users\Monika\Desktop\MONIKA\PROJEKT\main.cpp:31:34: error: 'make_pair' is not a member of 'std'
Last edited on
8_D, we need the symbol definitions, e.g includes, I might have some typo too

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>

std::pair<double, double> gradient_descent_runner
(
    const std::vector< std::pair<double, double> > & points,
    double starting_b,
    double starting_m,
    double learning_rate,
    std::size_t num_iterations
)
{
    return std::make_pair(0.0, 0.0);
}

void genfromtxt(
    const std::string & filepath,
    std::vector< std::pair<double, double> > & out_points
)
{
 // c11
    std::ifstream in(filepath, std::ios_base::in|std::ios_base::binary);
 // c98-2003
 // std::ifstream in(filepath.c_str(), std::ios_base::in|std::ios_base::binary);

    std::string row;

    if (in.is_open()) {
        while(std::getline(in, row)) { // reading line by line CRLF
            double x, y;
            sscanf(row.c_str(), "%lf,%lf", &x, &y); // parsing [double],[double]
            out_points.push_back(std::make_pair(x, y)); // adding new point x, y
        }
    }
}

int main(int argc, char ** argv)
{
 // run from same root or pass fullpath
// e.g something like "C:/Users/Monika/Desktop/MONIKA/PROJEKT/data.csv"
    std::string filepath("./data.csv");
    std::vector< std::pair<double, double> > points;

    genfromtxt(filepath , points);

    std::cout.precision(17); // we want to see doubles
// c11
    for (auto point : points) { // point is a std::pair of doubles
        std::cout << point.first << " " << point.second << std::endl;
    }

// c98-2003
//  for (
//         std::vector< std::pair<double, double> >::iterator it = points.begin() ;
//         it != points.end() ;
//         ++it
//   ) {
//       std::cout << (*it).first << " " << (*it).second << std::endl;
//   }
}
    return 0;
}


so then what do we put in gradient_descent_runner ?
Last edited on
Okay, let's summarize . We have a code linear regression. I have a csv file extension .
My csv file contains data :


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
2548 ; 13495
2548 ; 16500
2823 ; 16500
2337 ; 13950
2824 ; 17450
2507 ; 15250
2844; 17710
2954 ; 18920
3086 ; 23875
3053 ; 23875
2395 ; 16430
2395 ; 16925
2710 ; 20970
2765 ; 21105
3055 ; 24565
3230 ; 30760
3380 ; 41315
3505 ; 36880
... 


When writing the code :
std :: ifstream in ( filepath , std :: :: ios_base in | ios_base std :: :: binary ) ; //whether there will be binary?
I received information
error : no matching function for call to ' std :: basic_ifstream < char > :: basic_ifstream (const string & , std :: _ Ios_Openmode ) '
How can I fix this?
Last edited on
I was going to respond earlier, but I could not spare the time. (I can't spare much now, either.)

Beware of answers that ask you to put away everything you've done and replace it with code containing a lot of stuff you don't understand. Your code to load the file was very good and clean. (I just wasn't sure about that ch bit. Was the first character in the file something to discard?)

It also used a very convenient data structure that you created (Point) which would serve your purposes much better than dealing with tuples.

@mmw
I assume you really are trying to help, but you are doing OP's homework without helping her understand it, which will hurt her later when she tries to do something a little more complicated.

Also, why are you using sscanf()? That's C stuff.
Why are you opening file in binary?
And, you know that you don't have to specify ios::in for an ifstream, right?
hello,

your data must be formatted the same way than the python's example program, else it's going nowhere as we try to do a translation from one language to another first, use the one from the python example.

for the error activate c11 or call the c98 prototype:
1
2
// ifstream(const char * filename, ios_base::openmode mode = ios_base::in);
std::ifstream in(filepath.c_str(), std::ios_base::in|std::ios_base::binary);


1
2
3
see documentation:

http://www.cplusplus.com/reference/fstream/ifstream/ifstream/ 


better get a linux-box
Last edited on
Hello @Duoas,

0- I try first to go on the identification of what we need, materials, then we'll see later on, for the logic, identifying the types and what data structures are available, is the first step,

even if it's a bit cryptic then we will be able to compare practically line by line the two programs (then understand the real thing e.g the impl of the indirection) ;

we are translating bluntly, python tuple to pair and vector, (I avoided on purpose the use of new objects and typedef, anyway a point_2d is a pair).

sub-zero: "but you are doing OP's homework", I don't, I peer program (and I have time still 136789 files going on, this thing rapes my cpu 8-) ), I don't give away everything, but the necessary base and material to enter the topic (I don't expect the op to digest the first steps in ten minutes, but maybe in two or three days):

std::pair<double, double> gradient_descent_runner

(@vobiscum you still need to do that for the other functions e.g providing their prototypes, I gave you the data-types-translation (parameter and return types) involved in the first-entry function, you should be able to figure out, this is an important step, even if you are not sure, give it a try)

and the fun will start,

we already moved from a
1
2
I can't translate it on C ++
:(( 
to a starting point, writing code, compiler issues, header inclusion issues et cetera, those problems need to be seen and experienced, even if not fully understood.

1- sscanf is perfectly fine in this case according to the data formatting and enough straight forward to understand than writing a template and use operator indirections because my clang is broken 8-D, I like c stuff from time to time 8-), BTW what's a c++ runtime? a giant wrapper to system-calls written in C (neerding out)

2- because the op is on windows, and I don't know what editor, anyway, does not hurt to read everything in bin.

3- yes it is to be obvious the op doesn't know about v & flag flag |= a 8-), so this is optional, then I can, because the inner impl will test anyway if the flag is up or not NAH! 8-O (1)

or 1,2,3 could be summed up with: to set the gossips' tongues wagging 8-D or simply because eueueueue.

(1) this is a santa klaus smiley, yes it is.

--- touch base part --------------------- you must be 21 (not sure)

6-* "Beware of answers that ask you to put away everything you've done and replace it with code containing a lot of stuff you don't understand" ( this is I 8-) )

morality: I did not replace anything, as I answered long before the OP came up with a solution (and sadly overwrite many times, instead of continuing in another message, but I don't have an issue with that, not a syco rigidimus ape kind, or simply "insecure male", I am used to work with young/green fellows) and that's because of my move, we got finally something presented in a public display, something positive, I might be fun, but don't like "the behind the curtain crapshit, I teach you life from my tiny-ego-centrist mental disease" ( jumping from the stars), then you got the clap, figure out, or simply present some kind of excuse, don't you think you crossed (read totally broke in a pathologic way) the line of idiocraty?, THX.

* yes we jumped to six, why? because I like it, negative self-centered one, less... don't know I might be ```special.

And if you are so busy why even bothering? 1+1 = 2 Thank you, keep your racism for yourself, I am busy since my birth time but I manage; I am not the only one hopefully.
Last edited on
Reported for your abusive personal insults and derogatory language. Wow...
Hello, can't help you, Woww, there is no insult or derogatory language*, just a simple and rational answer to your childish rants and crossing the line stuff, if you don't like it "read it" again and again until you get it "then you got the clap, figure out**" . THX, and still waiting for your public excuses, I don't know in/on which planet you are living in/on, but please quit it, thank you for reading me, you got spanked and you did deserve it*** ; no need to come back teasing people Woww I reported the bad guy.... seriously? this is ridiculous.

see documentation:
http://www.dullesglassandmirror.com/mirror.asp

houps, might be derogatory!, did the same, reported you as: delusional, e.g reality, step down, and if you don't like it, same all same? look at yourself punk.

* you might have a problem ; checkin' out your dictionary, is it offensive and cynic? yes I admit it, it is, but why? still don't get it? **, *** do you you think I answered this way as a free payload for fun? that I have any satisfaction to be forced to answer to your silly stand? seriously? again this is ridiculous. read yourself back one day you might get it and reach "the age of responsibility", but in your ```world that's someone else fault, gratuity (sic).

Best Regards.
Last edited on
Hello, I am happy, we are back in the topic, linear regression as Duoas reported me again (just for telling him, you know kid get lost, derogatory language used, BTW we don't know exactly what is the new Wow Wow Wow Wow (a Duaos trademark, don't mess with it), some creature from Hell certainly, or not? I have to ask to Dean Winchester to be sure) , this is an amazing example and opportunity; what is linear regression? bad mouths would say @Duoas* ; but I would say maybe, but it is close ; it is reaching the infinity of lala land, or as Riddick taught us all, in his second Epic, you should have have tak' the money toon*. After all this fun (sic)(but not a problem, this is a part of the educational process with children) can we be back to the topic? I will willingly remove all that public answer (but me-self don't care) to @Da Punk? Thank you, and surely I have no personal grief with you Duoas, but please learn how to behave in society (this is a really serious statement, in the middle of all this firework), get out out from your pit, thank you, what a show! πεζὸς θρίαμβος and beer!!!!!!!!!!!!!!!!!!!!!!
Last edited on
I do suggest forgetting the gradient descend with Python and choosing the analytical route.
See https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line
Particularly https://upload.wikimedia.org/math/0/d/d/0ddedb446f7520df577fcf48aa7012e2.png

You can read an array of x from the file.
You can read an array of y from the file.
You can calculate arrays of x*x and x*y from the previous arrays.
You can calculate average for each array.

1
2
double slope = (avgxy - avgx * avgy) / (avgxx - avgx * avgx );
double intercept = avgy - slope * avgx;



Well, I do assume that you can read each pair of (x,y) and ignore the ';' while doing so. Such I/O seems trivial.

By "array" I naturally mean the dynamic std::vector.
You don't strictly need arrays for x*x and x*y, it simplifies.
Topic archived. No new replies allowed.