Hpricot and utf-8

I tried to use Hpricot to parse a page with special characters in a utf-8 encoding. The docs tell you to do this:

require 'rubygems'
require 'open-uri'
require 'hpricot'
 
doc = Hpricot(open("http://url/"))

However, this won’t give you the output you want. The open method on Open-URI leaves the output in the default character set of the page. If you want to convert it to utf-8, you need to use the iconv library:

require 'rubygems'
require 'iconv'
require 'open-uri'
require 'hpricot'
 
f = open("http://url")
f.rewind
doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))

Post to Twitter Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to Reddit Reddit This Post

No related posts.

blog comments powered by Disqus