Ruby Code to Scrape Images from TwitPic URLs

Twitpic logo

I’ve been trying to find an easy way to re-grab my TwitPic images back off TwitPic to be re-syndicated on a communal family website I am trying to create.  I found a great site which had some Rails code that scraped the image, so I took the code and modified it to make it compatible for pure Ruby (too be honest it didn’t need much modifying at all) .

Here’s my modified snippet of code that I’ve been using to grab the image from Twitpic with the Hpricot gem:

require 'rubygems'
require 'net/http'
require 'hpricot'

def rip_twitpic(url)
  begin
    code=url.match(/[w]+$/).to_s
    unless code.empty?
      uri=URI.parse(url)
      resp=Net::HTTP.get_response(uri)
      html=Hpricot(resp.body)
      html.at("#photo-display")['src']
    end
  rescue Exception => e
    puts "Error extracting twitpic: #{e}"
    url
  end
end

Just as it appears, this method will return the URL of the image embedded on the page that the TwitPic URL points too.

This will form the core part of a little project which will allow you to scrape TwitPic images and send them to a server of your choosing. I’ll try to release this ASAP but in the meantime, I thought a few of you might find this useful. I know I do.