Jun 18, 2020 at 3:00pm UTC
I'm given years x and y and each year's "X" and "Y" data. I have to calculate K and B and output them, but my program's always slightly wrong. I can't figure out why.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
#include <iomanip>
#include <iostream>
using namespace std;
int main() {
double a, b;
cin >> a >> b;
double xtotal, ytotal =0.0;
int diff = (int )(b-a);
double xarr[diff];
double yarr[diff];
//cout<<"diff was "<<diff<<" "<<endl;
for (int i=0; i<diff; i++){
string dummy; cin>>dummy;
double x, y; cin>>x>>y;
xarr[i] =x; yarr[i]=y;
xtotal+=x; ytotal+=y;
}
double avgx = xtotal/(b-a);
double avgy = ytotal/(b-a);
//cout<<"Xavg was "<<avgx<<" yavg was "<<avgy<<" ";
// ez, use x-mean squared and diff x * diff y to calc avgY - k*avgX, where k = sum(varx)/sum(vary).
double varx, vary=0.0;
for (int i=0; i<diff; i++){
varx+= (xarr[i]-avgx)*(xarr[i]-avgx);
vary+= (xarr[i]-avgx)*(yarr[i]-avgy);
//cout<<"At xarr[i]= "<<xarr[i]<<" Varx became "<<varx<<" Vary became "<< vary<<" ";
}
double var = vary/varx;
cout<<setprecision(12)<<var<<" " <<setprecision(12)<<avgy-var*avgx;
return 0;
}
input: 1920 2010
1920: 62 264
1921: 89 336
1922: 90 333
1923: 60 269
1924: 95 344
1925: 111 388
1926: 85 327
1927: 132 447
1928: 67 279
1929: 68 317
1930: 62 211
1931: 73 299
1932: 125 434
1933: 120 441
1934: 99 347
1935: 79 357
1936: 69 283
1937: 57 261
1938: 125 404
1939: 138 441
1940: 118 404
1941: 144 421
1942: 114 372
1943: 87 328
1944: 85 293
1945: 92 334
1946: 55 253
1947: 61 265
1948: 145 451
1949: 149 414
1950: 139 449
1951: 127 421
1952: 50 243
1953: 52 240
1954: 142 403
1955: 62 267
1956: 75 294
1957: 115 394
1958: 65 283
1959: 142 449
1960: 62 268
1961: 142 445
1962: 102 356
1963: 118 393
1964: 60 268
1965: 78 307
1966: 100 348
1967: 146 453
1968: 123 401
1969: 105 365
1970: 80 312
1971: 116 389
1972: 107 350
1973: 116 383
1974: 80 303
1975: 135 409
1976: 122 403
1977: 147 457
1978: 120 451
1979: 137 440
1980: 96 314
1981: 84 312
1982: 80 383
1983: 82 298
1984: 95 409
1985: 108 337
1986: 135 437
1987: 99 359
1988: 67 284
1989: 52 248
1990: 149 451
1991: 65 244
1992: 98 347
1993: 95 334
1994: 127 357
1995: 103 369
1996: 52 245
1997: 125 444
1998: 116 387
1999: 98 347
2000: 127 412
2001: 132 420
2002: 78 322
2003: 125 405
2004: 136 390
2005: 143 451
2006: 129 422
2007: 66 282
2008: 52 279
2009: 57 264
2010: 67 283
output: 2.12395042531 141.282641951
expected: 2.12418318843 141.253016439
any insight?
Last edited on Jun 18, 2020 at 3:01pm UTC
Jun 18, 2020 at 3:11pm UTC
1 2
varx+= (xarr[i]-avgx)*(xarr[i]-avgx);
vary+= (xarr[i]-avgx)*(yarr[i]-avgy);
> vary+= (xarr[i]-avgx)
Looks like an artifact of copy-pasting the same line? Or maybe that's intentional.
Edit: In case that isn't the only issue, more importantly, you should turn on compiler warnings.
1 2 3
double varx, vary=0.0;
for (int i=0; i<diff; i++){
varx+= (xarr[i]-avgx)*(xarr[i]-avgx);
What is the initial value of varx?
Turning on compiler warnings will give you a warning that varx is being used uninitialized.
Last edited on Jun 18, 2020 at 3:13pm UTC
Jun 18, 2020 at 3:16pm UTC
int diff = (int )(b-a)
for (int i=0; i<diff; i++)
Almost certainly not. If a=1920 and b=2010 you should have 91 pieces of data, NOT 90. In code:
b-a+1
The above is what is giving you your cited error.
However, ... in addition:
1 2 3
int diff = (int )(b-a);
double xarr[diff];
double yarr[diff];
This is illegal in standard c++. For a "standard" array its size must be known at run time. Use vectors, or, at least, arrays with some surplus size.
double xtotal, ytotal =0.0;
Nope, this only initialises ytotal, not xtotal.
double varx, vary=0.0;
Same problem.
You do know that there are
simpler formulae to do linear regression , don't you? You shouldn't have to use two separate loops.
Slope, m = (N Sxy - Sx Sy ) / ( N Sxx - Sx Sx )
Intercept, c = ( Sy - m Sx ) / N
where
Sx = sum(x), Sy =sum(y), Sxx =sum(x * x), Sxy =sum(x * y)
and these sums can be incremented in a single loop as you read in x and y.
Last edited on Jun 18, 2020 at 3:41pm UTC
Jun 18, 2020 at 3:48pm UTC
.
Last edited on Jun 18, 2020 at 4:00pm UTC
Jun 18, 2020 at 3:51pm UTC
You are still getting the number of data points wrong when you divide by (b-a). It should be (b-a+1) on lines 19 and 20.
You don't need arrays.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#include <iostream>
#include <string>
using namespace std;
int main()
{
int a, b;
cin >> a >> b;
int N = b - a + 1;
double Sx = 0, Sy = 0, Sxx = 0, Sxy = 0;
for ( int i = 1; i <= N; i++ )
{
string dummy;
double x, y;
cin >> dummy >> x >> y;
Sx += x;
Sy += y;
Sxx += x * x;
Sxy += x * y;
}
double m = ( N * Sxy - Sx * Sy ) / ( N * Sxx - Sx * Sx );
double c = ( Sy - m * Sx ) / N;
cout << "Slope: " << m << " intercept: " << c << '\n' ;
}
Slope: 1.91101 intercept: 78.2802
Last edited on Jun 18, 2020 at 3:56pm UTC
Jun 18, 2020 at 4:00pm UTC
also, the first loop is to input data, the second loop is for actual calculation. Thanks for the help guys.
Last edited on Jun 18, 2020 at 4:00pm UTC