So today was my first time using winsock, and I'm trying to make a program to display the source code of a webpage, but its not working. Here's my code,
It might be sending more than just source code at first. Namely, the header and whatnot. I haven't done work on webpages for a while, however, so I might be way off.
#include <winsock2.h>
#include <windows.h>
#include <iostream>
#pragma comment(lib,"ws2_32.lib")
usingnamespace std;
int main (){
WSADATA wsaData;
if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
cout << "WSAStartup failed.\n";
system("pause");
return 1;
}
SOCKET Socket=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
struct hostent *host;
host = gethostbyname("www.google.com");//change this to the host!
SOCKADDR_IN SockAddr;
SockAddr.sin_port=htons(80);
SockAddr.sin_family=AF_INET;
SockAddr.sin_addr.s_addr = *((unsignedlong*)host->h_addr);
connect(Socket,(SOCKADDR*)(&SockAddr),sizeof(SockAddr));
send(Socket,"GET HTTP/1.0\r\n\r\n", strlen( "GET HTTP/1.0\r\n\r\n" ),0);//the space is empty..if you want put some address within the host there(the site booby-traps index.htm(l) so i used nothing...)
char buffer[100000];
int nDataLength = recv(Socket,buffer,100000,0);
cout << buffer;
closesocket(Socket);
WSACleanup();
system("pause");
return 0;
}
It goes to the google site.seems to go into a endless loop...by putting redirects!
HTTP/1.0 404 Not Found
Date: Sat, 12 Dec 2009 00:43:43 GMT
Content-Type: text/html; charset=UTF-8
Server: gws
Content-Length: 1357
X-XSS-Protection: 0
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>404 Not Found</title>
<style><!--
body {font-family: arial,sans-serif}
div.nav {margin-top: 1ex}
div.nav A {font-size: 10pt; font-family: arial,sans-serif}
span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight: bold}
div.nav A,span.big {font-size: 12pt; color: #0000cc}
div.nav A {font-size: 10pt; color: black}
A.l:link {color: #6f6f6f}
A.u:link {color: green}
//--></style>
<script><!--
var rc=404;
//-->
</script>
</head>
<body text=#000000 bgcolor=#ffffff>
<table border=0 cellpadding=2 cellspacing=0 width=100%><tr><td rowspan=3 width=1
% nowrap>
<b><font face=times color=#0039b6 size=10>G</font><font face=times color=#c41200
size=10>o</font><font face=times color=#f3c518 size=10>o</font><font face=times
color=#0039b6 size=10>g</font><font face=times color=#30a72f size=10>l</font><f
ont face=times color=#c41200 size=10>e</font> </b>
<td> </td></tr>
<tr><td bgcolor="#3366cc"><font face=arial,sans-serif color="#ffffff"><b>Error</
b></td></tr>
<tr><td> </td></tr></table>
<blockquote>
<H1>Not Found</H1>
The requested URL <code>/1.1</code> was not found on this server.
<p>
</blockquote>
<table width=100% cellpadding=0 cellspacing=0><tr
So it's not connecting to the website and it seems to be cutoff since it ends in <tr and the tag isn't closed (its not a problem with the array size). Do you know what the problem is?
First of all, just because I am curious and I like to learn this stuff and get better at it. Secondly, I have a few programs that I would like to incorporate this into, where I need to connect to a website. In this case, if the fact that I'm not getting the source code is a problem because I'm not connecting to the right website.
george135, please give more of a third party view of your suggestions. Don't tell someone to do something that might have structural harm to their program. If I didn't know any better though, I'd think you were a Microsoft Windows representative advertising their crappy software for them.
An http request usually has headers associated with it. I suggest you use something like httpfox and have a look at the headers, then replicate these headers in your request.
i'm really tired right now, but i think someone already mentioned about continuing to read data, since the part you posted was cut off, that will solve part of your problem...
now, for the other part. some google searches about http GET requests should solve the rest. looking at your code, you seem to omit the URI. the first line of a GET request works like this: GET <URI> [HTTP version] <crlf>
since your GET request omits the URI, i'm assuming you just want the root directory... it's been a while since i did socket programming, but if i recall correctly, you still have to put "/" in as the URI if you just want the root directory of the main URL you're connecting to.
so, overall, the first line of your request would look something like: GET / HTTP/1.1\r\n
hopefully that helps fix your problem!
Hi everyone and thanks for the replies. I actually took a break from this for a while, but now I'm back and I've been reading your replies. First of all, Mal Reynolds suggested doing GET / HTTP/1.1\r\n
with the slash in between the GET and HTTP and now I get the correct header,