You want to use some data that is provided by some site, but they doesn’t expose any API to get that data. Well, a possible (and a very dirty) trick is to get the page HTML and rip data off it for our use.
Not so easy dude… there are many hurdles:
HTML page is full of Garbage
Well, most of the pages has similar construct:
<doctype.... <html> <head> .. .. <script>..</script> . </head> <body ....> ... .. <<<<<Tags jungle>>>> .. <some sweet tag> <3 Your Data <3 </some tag> .. .<more tags> </body> </html>
tags. Then we will run a wild replace using regular expression that will make any tag <* into <div tag
You can further continue this chopping and transformation of data to get closer to pure data. Once we have the HTML gone through the above process, we will have something that will look like this:
<div> <div> . . </div> <div> <div> <3 Your Data <3 </div> <div> .. .
which we will host(assign to innerHTML property) into an invisible div, which will force the browser to parse and render the HTML and make this HTML part of your DOM object.
Later you can query this data, using something like…
var data = docuemnt.getElementById("someId").innerText;
where someID is the parent element, containing required data.
Lets take an example of NewYork weather data page from AccuWeather.com:
if you inspect the page in IE developer tools, we will observe following:
if(src === "")
var i = src.indexOf(" src = src.substr(i); // chopped the Head part
tag = src.replace(/<[a-z]+/gi,"
this should give us the output as shown below:
document.getElementById("host").innerHTML = tag;
data = $(".info").find(".cond").find(".temp").innerText;
data = data.substr(0,3);
alert("Today's Temp: "+data);
// host here is a div element
document.getElementById("host").innerHTML = "Today's Temp: "+data+"°";
Result will give us the temperature.
Here is the full source code:
Which will give us the following result:
Cons: this approach is way dirty itself and will fail someday, as soon as the page layout changes or somebody alters the source page HTML layout. Not to mention the browser compatibility as different browser treats things differently.
With the recent and continuing revolution going around on the web, things are getting easy with sites exposing their data through APIs or at least through Microdata/Microformats
Dude the chances of cons are more..but its a good to reduce the burden of parsing the junk html and filtering it before hand..
dont u think at times its easier to google the api(search and dload) rather than coding for the filter..isnt it??
I said, when there is no API.
of course, when you have an API you must use it as it is the best option.