Suppose you have a string like "€fooxA0"
, encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo"
)
In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€fooxA0")
but that is now deprecated. "€fooxA0".encode('UTF-8')
doesn't do anything, since it is already UTF-8. I tried:
"€fooxA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '')
which yields
"foo"
But that also loses the valid multibyte character €
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…