When working with UTF-8-encoded text from an untrusted source like a web form, it’s a good idea to fix any invalid byte sequences at the first stage, to avoid breaking later processing steps that depend on valid input.
For a long while, the Ruby idiom that I’ve been using and recommending to others is this: