I tried to use Hpricot to parse a page with special characters in a utf-8 encoding. The docs tell you to do this:
require 'rubygems' require 'open-uri' require 'hpricot' doc = Hpricot(open("http://url/"))
However, this won’t give you the output you want. The open method on Open-URI leaves the output in the default character set of the page. If you want to convert it to utf-8, you need to use the iconv library:
require 'rubygems' require 'iconv' require 'open-uri' require 'hpricot' f = open("http://url") f.rewind doc = Hpricot(Iconv.conv('utf-8', f.charset, f.readlines.join("\n")))
Delicious
Digg This Post
Facebook
Reddit This Post
No related posts.





Thanks for this!
N.B. you’re missing a close parenthesis on the end of the last line.
Fantastic! Thanks so much. This solved my problem – on which I have been researching the whole day – within 5 seconds!
Thanks!
Thanks!
Now I wonder why open-uri doesn’t have an straight forward way of doing this.
Thanks so much! ;D
You are ma saviour. Thanks a lot.
You are ma saviour. Thanks a lot.
Just wondering, what does the .rewind method do? Can’t really find it in the open-uri doc.
Thanks for posting this
#rewind places the current line input back at the beginning of the file. For more information check out the class docs here: http://ruby-doc.org/core/classes/IO.html#M002281