Yahoo! recently released a new Fetch Page module which dramatically increases the number of useful things that Pipes can do. With this new "pipe input" module we're no longer restricted to working with well-organised data sets in supported formats such as CSV, RSS, Atom, XML, JSON, iCal or KML. Now we can grab any HTML page we like and use the power of the Regex module to slice and dice the raw text into shape.
In a nutshell, the Fetch Page module turns Yahoo! Pipes into a fully fledged web scraping IDE!
As it happens, I already have a web scraping project which has been broken for some time now. I don't have the energy to check out the hacky old PHP scrapers and debug the problem. But with Yahoo! Pipes and the Fetch Page module to hand, I can throw away my PHP scripts and their associated libraries, delete the cron jobs and free my overloaded webserver from the onerous responsibility. Time to get cracking. ...read more »