pipes

Yahoo! Pipes, Caching and robots.txt

Every time I create or use a pipe, I'm indirectly causing hits on some third party's website.  So I was curious to learn how the Yahoo! Pipes backend behaves.  What caching does it do, is it a good and well behaved web citizen in general?

There isn't much official documentation to go on.  The Pipes Troubleshooting guide has some notes on how to stop Pipes from downloading a feed too frequently and how to stop Pipes from using feeds at all.

So, I put myself in the shoes of the third party website that the pipes hit upon to find out more.  ...read more »

Yahoo! Pipes Tutorial - An example using the Fetch Page module to make a web scraper

Yahoo! recently released a new Fetch Page module which dramatically increases the number of useful things that Pipes can do.  With this new "pipe input" module we're no longer restricted to working with well-organised data sets in supported formats such as CSV, RSS, Atom, XML, JSON, iCal or KML.  Now we can grab any HTML page we like and use the power of the Regex module to slice and dice the raw text into shape.

In a nutshell, the Fetch Page module turns Yahoo! Pipes into a fully fledged web scraping IDE!

Yahoo! Pipes is a web scraping IDE in a nutshell

As it happens, I already have a web scraping project which has been broken for some time now.  I don't have the energy to check out the hacky old PHP scrapers and debug the problem.  But with Yahoo! Pipes and the Fetch Page module to hand, I can throw away my PHP scripts and their associated libraries, delete the cron jobs and free my overloaded webserver from the onerous responsibility.  Time to get cracking.  ...read more »

Geotagging Traffic Jams with Yahoo Pipes

The Highways Agency has been publishing traffic information as RSS feeds for a while now, but they contain too much noise to be useful for me. My daily commute is just 28 miles to work and back, I certainly don't cover the entire East of England so a lot of the alerts are of no interest to me.

With Yahoo Pipes, I can filter out the noise to leave just the incidents that I might be interested in. To start with, here's a simple pipe to cut down the alerts to just those with "A14" in the title: A14 traffic news.  ...read more »

Syndicate content