WinSock - getting source code from website

Pages: 12
So today was my first time using winsock, and I'm trying to make a program to display the source code of a webpage, but its not working. Here's my code,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")

using namespace std;

int main (){
	WSADATA wsaData;

    if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
		cout << "WSAStartup failed.\n";
        system("pause");
		return 1;
    }

	SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);

	struct hostent *host;
	host = gethostbyname("www.google.com");

	SOCKADDR_IN SockAddr;
	SockAddr.sin_port=htons(8888);
	SockAddr.sin_family=AF_INET;
	SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);

	connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr));

	char buffer[1000];
	int nDataLength = recv(Socket,buffer,1000,0);
	cout << buffer;

	closesocket(Socket);
    WSACleanup();

	system("pause");
	return 0;
}


What's the problem?
Port 80 , not 8888.
Oh.. Right. Thanks, lol.

Ok, now it connects. Do you know how to get the source code for the web page? I couldn't find any examples in C++...
Last edited on
It might be sending more than just source code at first. Namely, the header and whatnot. I haven't done work on webpages for a while, however, so I might be way off.
Here's a working solution!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")
using namespace std;

int main (){
	WSADATA wsaData;

    if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
		cout << "WSAStartup failed.\n";
        system("pause");
		return 1;
    }

	SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);

	struct hostent *host;
	host = gethostbyname("www.google.com");//change this to the host!

	SOCKADDR_IN SockAddr;
	SockAddr.sin_port=htons(80);
	SockAddr.sin_family=AF_INET;
	SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);

	connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr));
send(Socket,"GET  HTTP/1.0\r\n\r\n", strlen( "GET  HTTP/1.0\r\n\r\n" ),0);//the space is empty..if you want put some address within the host there(the site booby-traps index.htm(l) so i used nothing...)
	char buffer[100000];
	
	int nDataLength = recv(Socket,buffer,100000,0);
	cout << buffer;

	closesocket(Socket);
    WSACleanup();

	system("pause");
	return 0;
}

It goes to the google site.seems to go into a endless loop...by putting redirects!
Last edited on
Thanks for the reply. I'm getting some source code now atleast, but it's not the google source code. Here's the output,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
HTTP/1.0 404 Not Found
Date: Sat, 12 Dec 2009 00:43:43 GMT
Content-Type: text/html; charset=UTF-8
Server: gws
Content-Length: 1357
X-XSS-Protection: 0



<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>404 Not Found</title>
<style><!--
body {font-family: arial,sans-serif}
div.nav {margin-top: 1ex}
div.nav A {font-size: 10pt; font-family: arial,sans-serif}
span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight: bold}
div.nav A,span.big {font-size: 12pt; color: #0000cc}
div.nav A {font-size: 10pt; color: black}
A.l:link {color: #6f6f6f}
A.u:link {color: green}
//--></style>
<script><!--
var rc=404;
//-->
</script>
</head>
<body text=#000000 bgcolor=#ffffff>
<table border=0 cellpadding=2 cellspacing=0 width=100%><tr><td rowspan=3 width=1
% nowrap>
<b><font face=times color=#0039b6 size=10>G</font><font face=times color=#c41200
 size=10>o</font><font face=times color=#f3c518 size=10>o</font><font face=times
 color=#0039b6 size=10>g</font><font face=times color=#30a72f size=10>l</font><f
ont face=times color=#c41200 size=10>e</font>&nbsp;&nbsp;</b>
<td>&nbsp;</td></tr>
<tr><td bgcolor="#3366cc"><font face=arial,sans-serif color="#ffffff"><b>Error</
b></td></tr>
<tr><td>&nbsp;</td></tr></table>
<blockquote>
<H1>Not Found</H1>
The requested URL <code>/1.1</code> was not found on this server.

<p>
</blockquote>
<table width=100% cellpadding=0 cellspacing=0><tr


So it's not connecting to the website and it seems to be cutoff since it ends in <tr and the tag isn't closed (its not a problem with the array size). Do you know what the problem is?
Why do you want the source code?
First of all, just because I am curious and I like to learn this stuff and get better at it. Secondly, I have a few programs that I would like to incorporate this into, where I need to connect to a website. In this case, if the fact that I'm not getting the source code is a problem because I'm not connecting to the right website.
The apparent cut off because of fragmentation. You need to keep reading.

These matters have been discussed in a number of threads. This one was my last attempt to discuss it.
http://www.cplusplus.com/forum/general/16659/
You don't need Winsock
Just use Inet or COM (1 (URLDtoF) to 6 lines of code)
Last edited on
Don't do that. It's non-portable and ties you into using Internet Explorer technology.
closed account (S6k9GNh0)
george135, please give more of a third party view of your suggestions. Don't tell someone to do something that might have structural harm to their program. If I didn't know any better though, I'd think you were a Microsoft Windows representative advertising their crappy software for them.
Last edited on
i donĀ“t want to take sides with someone, but this still is the "Windows Programming" forum... with emphasis on windows, i guess^^....
An http request usually has headers associated with it. I suggest you use something like httpfox and have a look at the headers, then replicate these headers in your request.
He has the header, it's in his post.
i'm really tired right now, but i think someone already mentioned about continuing to read data, since the part you posted was cut off, that will solve part of your problem...
now, for the other part. some google searches about http GET requests should solve the rest. looking at your code, you seem to omit the URI. the first line of a GET request works like this:
GET <URI> [HTTP version] <crlf>
since your GET request omits the URI, i'm assuming you just want the root directory... it's been a while since i did socket programming, but if i recall correctly, you still have to put "/" in as the URI if you just want the root directory of the main URL you're connecting to.
so, overall, the first line of your request would look something like:
GET / HTTP/1.1\r\n
hopefully that helps fix your problem!
Hi everyone and thanks for the replies. I actually took a break from this for a while, but now I'm back and I've been reading your replies. First of all, Mal Reynolds suggested doing
GET / HTTP/1.1\r\n
with the slash in between the GET and HTTP and now I get the correct header,
1
2
3
4
5
6
7
8
9
10
11
12
13
HTTP/1.1 200 OK
Date: Sun, 20 Dec 2009 17:07:12 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=7001ba1594cb1416:TM=1261328832:LM=1261328832:S=r79A_PQs0OqdK
t5M; expires=Tue, 20-Dec-2011 17:07:12 GMT; path=/; domain=.google.com
Set-Cookie: NID=30=d8IsEDuvj07cFctzKyUq5ry-O9_HfZGJ9tNl3sx_hoHvFjg8dh5K0b_Uf4UX6
ShIcTN9_mciC-01VFgjHaJ-pVhe7oM0zty2V0HNQKCE-cmqxz3KvfJBXVpVC_ez0-4L; expires=Mon
, 21-Jun-2010 17:07:12 GMT; path=/; domain=.google.com; HttpOnly
Server: gws
X-XSS-Protection: 0
Transfer-Encoding: chunked

But I don't get any body content.

Here's the code I'm using again, (Edited after later posts too).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")

using namespace std;

int main (){
	WSADATA wsaData;

    if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
		cout << "WSAStartup failed.\n";
        system("pause");
		return 1;
    }

	SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);

	struct hostent *host;
	host = gethostbyname("www.cplusplus.com");

	SOCKADDR_IN SockAddr;
	SockAddr.sin_port=htons(80);
	SockAddr.sin_family=AF_INET;
	SockAddr.sin_addr.s_addr = *((unsigned long*)host->h_addr);

	cout << "Connecting...\n";
	if(connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr)) != 0){
		cout << "Could not connect";
		system("pause");
		return 1;
	}
	cout << "Connected.\n";

	send(Socket,"GET / HTTP/1.1\r\nHost: www.cplusplus.com\r\nConnection: close\r\n\r\n", strlen("GET / HTTP/1.1\r\nHost: www.cplusplus.com\r\nConnection: close\r\n\r\n"),0);
	char buffer[10000];

	int nDataLength;
	while ((nDataLength = recv(Socket,buffer,10000,0)) > 0){		
		int i = 0;
		while (buffer[i] >= 32 || buffer[i] == '\n' || buffer[i] == '\r') {
			cout << buffer[i];
			i += 1;
		}
	}

	closesocket(Socket);
        WSACleanup();

	system("pause");
	return 0;
}

Last edited on
You're still not reading in a loop.
I'm confused on how to do this. What condition should terminate the loop?
1
2
while (something)
      recv(Socket,buffer,1000,0);
This is what I did and it seems to work. Is this what you were talking about?
1
2
3
4
while (nDataLength != 0){
		nDataLength = recv(Socket,buffer,10000,0);
		cout << buffer;
	}
Pages: 12