hey guys, new to the site, not new to forums. so howdy everyone! sorry if this is a noob question, we havent covered anything in class on how to access the internet at all in an application, so im just curious if this is too difficult a task for someone with roughly a years experience...
I'm willing to bet that the website has what is called an RSS feed, I would start with this and go from there. That's just off the top of my head though.
mlb.com does but not with the information i need from that table. and even if it did, i wouldnt know how to get that stored in my application, just an ifstream like a text file?
If you know how to request the data from the site it would be easy but if not you have to parse the html (remember that every website can change layout anytime so parsing is not a good idea at least in my opinion).
ya the mlb.com website changes very rarely. what is this parsing you speak of? i think thats what i need. and thanks to thumper, that definitely looks like it will be helpful, i just need a few more puzzle pieces.
Parsing... sort of like a web browser sorts through HTML code, and turns it into images, layout, etc. Your program would 'parse' through a string that's sent to you when you receive something. This string would contain the entire HTML code for the web page. You'd search through that string and find HTML tags like '<table>'. After your program finds a '<table>' tag, it knows that the following characters are going to be data that it needs to record. You'd record all of the following characters until you reach another HTML tag.
Parsing takes a lot of work, and if even the smallest thing in the web page changes it could throw your program off, which is why it's not a good idea.
I'll make a quick example in a few, i need to finish up some course work.
#include <iostream>
#include <string>
#include <vector>
#include <limits>
usingnamespace std;
int main()
{
//Our program will 'parse' through this string, and pick out emails, and website addresses.
string parse_string = "Here's an email: yourname123@yahoo.com \n Here's a website: http://www.google.com/ ";
parse_string += "Now there doesn't necessarily have to be a new line after every thing. You'll notice";
parse_string += " the program won't pick up any of this other text... But here's another email: ";
parse_string += "somethingstupid@hotmail.com and what the heck, another website http://www.cplusplus.com/ ";
//I'm going to make two vectors, one to hold emails, and another to hold websites
vector<string> websites;
vector<string> emails;
//i'm also going to make a variable to hold the size of parse_string, for easy access.
const string::size_type parse_string_len = parse_string.size();
string buffer; //temporary string
//the next logical step is to create a for loop to go through our string
for(string::size_type i = 0; i < parse_string_len; ++i)
{
//parse through until you find a '@'
if(parse_string[i] == '@')
{
cout << "I found an @!" << endl;
for(string::size_type j = i; j > 0; --j)
{
//parse backwards until you get to the white space that starts the email
if(parse_string[j] == ' ')
{
cout << "I moved back to a white space!" << endl;
for(string::size_type y = j+1; y < parse_string_len; ++y)
{
//once again move forward, and collect all the chars until the ending white space
//if we reached the white space, add the email to the vector, clear the buffer, and break
if(parse_string[y] == ' ')
{
cout << "I've recorded the whole email!" << endl;
emails.push_back(buffer);
buffer.clear();
break;
}
//otherwise add the char to the buffer
buffer += parse_string[y];
}
break;
}
}
}
//now we'll do websites. we're gonna look for 4 characters in a row. 'http'
if(parse_string[i] == 'h')
{
if(parse_string[i+1] == 't')
{
if(parse_string[i+2] == 't')
{
if(parse_string[i+3] == 'p')
{
//if 4 characters in a row = http, then we've got ourselves a website url!
//start recording characters on i (since its currently on the h)
//and keep recording until a whitespace is reached.
for(string::size_type j = i; j < parse_string_len; ++j)
{
//if we reach the white space, add the url to the vector, clear the buffer, and break
if(parse_string[j] == ' ')
{
cout << "i've recorded a whole URL!" << endl;
websites.push_back(buffer);
buffer.clear();
break;
}
//otherwise add the char to the buffer
buffer += parse_string[j];
}
}
}
}
}
}
//finally, output the data.
cout << "\n\n\nRecorded emails:\n";
for(int i = 0; i < emails.size(); ++i)
cout << "\t" << emails[i] << endl;
cout << "\nRecorded URLs:\n";
for(int i = 0; i < websites.size(); ++i)
cout << "\t" << websites[i] << endl;
cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return 0;
}
You can change parse string to whatever you like, the program will always pull out emails and urls. (:
All parsing through strings is, is looking for significant characters, like the @ symbol in emails, and pulling out the characters around them.
oh okay easy enough, ill definitely be able to do that once i figure out all this HTTP requests stuff. I've been attempting to read through some primers
int main()
{
std::ostringstream os;
curlpp::Cleanup myCleanup;
os << curlpp::options::Url("http://www.example.com");
//im assuming you mean like this?
std::cout << os.str() << std::endl;
return 0;
}
**EDIT:: I got libcurl running on my application and tried this and it didnt work...3 errors:
curlpp hasnt been declared
expected ; before 'myCleanup'
curlpp hasnt been declared
oh okay, i cant find a process to run curlpp in xcode...sorry to sound so new, coding isnt anything tough to me, but linking and all that is still foreign to me. i may just toy with this curl thing and try and get it to work...
i may just toy with this curl thing and try and get it to work...
Good solution. That's how many of the worlds best things were invented, including the lightbulb. You'll not only learn how to do it, but many many (many) ways not to xD
Regular expressions are great for this kind of thing. They are not exactly trivial to write though, they take some time to master. But if you are planning on doing a lot of this kind of work it could be well worth the effort.