po-ru.com: Fixing invalid UTF-8 in Ruby, revisited - 0 views
-
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8') valid_string = ic.iconv(untrusted_string)
-
When working with UTF-8-encoded text from an untrusted source like a web form, it’s a good idea to fix any invalid byte sequences at the first stage, to avoid breaking later processing steps that depend on valid input. For a long while, the Ruby idiom that I’ve been using and recommending to others is this: