read info from website table

Pages: 12

hey guys, new to the site, not new to forums. so howdy everyone! sorry if this is a noob question, we havent covered anything in class on how to access the internet at all in an application, so im just curious if this is too difficult a task for someone with roughly a years experience...

I'm looking to pull info from this site (http://mlb.mlb.com/mlb/standings/?tcid=nav_mlb_standings) to update my own records for use in the application.

How would someone go about this?

Computergeek01 (5613)

I'm willing to bet that the website has what is called an RSS feed, I would start with this and go from there. That's just off the top of my head though.

jcrabb13 (9)

mlb.com does but not with the information i need from that table. and even if it did, i wouldnt know how to get that stored in my application, just an ifstream like a text file?

Thumper (918)

Uhm, the SFML library has support for http web requests somewhat like C# does. You chould check it out here
http://sfml-dev.org/

and the http tutorials here
http://sfml-dev.org/tutorials/1.6/network-http.php

blackcoder41 (1426)

+1 Thumper

If you know how to request the data from the site it would be easy but if not you have to parse the html (remember that every website can change layout anytime so parsing is not a good idea at least in my opinion).

jcrabb13 (9)

ya the mlb.com website changes very rarely. what is this parsing you speak of? i think thats what i need. and thanks to thumper, that definitely looks like it will be helpful, i just need a few more puzzle pieces.

PanGalactic (1658)

curlpp is nice as well.

http://curlpp.org/

Thumper (918)

Parsing... sort of like a web browser sorts through HTML code, and turns it into images, layout, etc. Your program would 'parse' through a string that's sent to you when you receive something. This string would contain the entire HTML code for the web page. You'd search through that string and find HTML tags like '<table>'. After your program finds a '<table>' tag, it knows that the following characters are going to be data that it needs to record. You'd record all of the following characters until you reach another HTML tag.
Parsing takes a lot of work, and if even the smallest thing in the web page changes it could throw your program off, which is why it's not a good idea.

I'll make a quick example in a few, i need to finish up some course work.

Last edited on

jcrabb13 (9)

alright awesome im subscribed so ill await your reply! thanks a ton guys, yall are really helpful!

Galik (2254)

If there is no XML feed I would use cURL and boost::regex for this.

http://curl.haxx.se/

http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/index.html

Last edited on

Thumper (918)

Finished my example. It looks really long and ugly, but really it's just a bunch of nested for loops. Nothing too complicated if you look at it.

#include <iostream>
#include <string>
#include <vector>
#include <limits>

using namespace std;

int main()
{
	//Our program will 'parse' through this string, and pick out emails, and website addresses.
	string parse_string = "Here's an email: yourname123@yahoo.com \n Here's a website: http://www.google.com/ ";
		   parse_string += "Now there doesn't necessarily have to be a new line after every thing. You'll notice";
		   parse_string += " the program won't pick up any of this other text... But here's another email: ";
		   parse_string += "somethingstupid@hotmail.com and what the heck, another website http://www.cplusplus.com/ ";

	//I'm going to make two vectors, one to hold emails, and another to hold websites
    vector<string> websites;
	vector<string> emails;

	//i'm also going to make a variable to hold the size of parse_string, for easy access.
	const string::size_type parse_string_len = parse_string.size();
	
	string buffer; //temporary string

	//the next logical step is to create a for loop to go through our string
	for(string::size_type i = 0; i < parse_string_len; ++i)
	{
		//parse through until you find a '@'
		if(parse_string[i] == '@')
		{
			cout << "I found an @!" << endl;
			for(string::size_type j = i; j > 0; --j)
			{
				//parse backwards until you get to the white space that starts the email
				if(parse_string[j] == ' ')
				{
					cout << "I moved back to a white space!" << endl;

					for(string::size_type y = j+1; y < parse_string_len; ++y)
					{
						//once again move forward, and collect all the chars until the ending white space

						//if we reached the white space, add the email to the vector, clear the buffer, and break
						if(parse_string[y] == ' ')
						{
							cout << "I've recorded the whole email!" << endl;
							emails.push_back(buffer);
							buffer.clear();
							break;
						}

						//otherwise add the char to the buffer
						buffer += parse_string[y];
					}
					break;
				}
			}
		}

		//now we'll do websites. we're gonna look for 4 characters in a row. 'http'
		if(parse_string[i] == 'h')
		{
			if(parse_string[i+1] == 't')
			{
				if(parse_string[i+2] == 't')
				{
					if(parse_string[i+3] == 'p')
					{
						//if 4 characters in a row = http, then we've got ourselves a website url!
						//start recording characters on i (since its currently on the h)
						//and keep recording until a whitespace is reached.
						for(string::size_type j = i; j < parse_string_len; ++j)
						{
							//if we reach the white space, add the url to the vector, clear the buffer, and break
							if(parse_string[j] == ' ')
							{
								cout << "i've recorded a whole URL!" << endl;
								websites.push_back(buffer);
								buffer.clear();
								break;
							}

							//otherwise add the char to the buffer
							buffer += parse_string[j];
						}
					}
				}
			}
		}
	}


	//finally, output the data.
	cout << "\n\n\nRecorded emails:\n";
	for(int i = 0; i < emails.size(); ++i)
		cout << "\t" << emails[i] << endl;

	cout << "\nRecorded URLs:\n";
	for(int i = 0; i < websites.size(); ++i)
		cout << "\t" << websites[i] << endl;

	cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
	return 0;
}

You can change parse string to whatever you like, the program will always pull out emails and urls. (:
All parsing through strings is, is looking for significant characters, like the @ symbol in emails, and pulling out the characters around them.

Last edited on

jcrabb13 (9)

oh okay easy enough, ill definitely be able to do that once i figure out all this HTTP requests stuff. I've been attempting to read through some primers

PanGalactic (1658)

Here's code using curlpp to read "URL" into a stringstream, then output the results to cout:

int main()
{
    std::ostringstream os;
    curlpp::Cleanup myCleanup;
    os << curlpp::options::Url(URL);

    std::cout << os.str() << std::endl;

    return 0;
}

jcrabb13 (9)

int main()
{
    std::ostringstream os;
    curlpp::Cleanup myCleanup;
    os << curlpp::options::Url("http://www.example.com");
    //im assuming you mean like this?

    std::cout << os.str() << std::endl;

    return 0;
}

**EDIT:: I got libcurl running on my application and tried this and it didnt work...3 errors:

curlpp hasnt been declared
expected ; before 'myCleanup'
curlpp hasnt been declared

Last edited on

PanGalactic (1658)

libcurl is not cURLpp. cURLpp is built on top of cURL. you will need the following headers:

#include <curlpp/cURLpp.hpp>
#include <curlpp/Easy.hpp>
#include <curlpp/Options.hpp>

#include <string>
#include <iostream>

See the link I posted earlier.

jcrabb13 (9)

oh okay, i cant find a process to run curlpp in xcode...sorry to sound so new, coding isnt anything tough to me, but linking and all that is still foreign to me. i may just toy with this curl thing and try and get it to work...

Thumper (918)

i may just toy with this curl thing and try and get it to work...

Good solution. That's how many of the worlds best things were invented, including the lightbulb. You'll not only learn how to do it, but many many (many) ways not to xD

jcrabb13 (9)

currently i have the info read into a string and i have parsed out the team names...working on getting the win/loss record right now.

Galik (2254)

Regular expressions are great for this kind of thing. They are not exactly trivial to write though, they take some time to master. But if you are planning on doing a lot of this kind of work it could be well worth the effort.

jcrabb13 (9)

never even heard of that. Ill look into it if i find myself in another project of this nature!

Pages: 12

read info from website table

C++

Forum