Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
553 views
in Technique[技术] by (71.8m points)

encoding - Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

Suppose you have a string like "€fooxA0", encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo" )

In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€fooxA0") but that is now deprecated. "€fooxA0".encode('UTF-8') doesn't do anything, since it is already UTF-8. I tried:

"€fooxA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '')

which yields

"foo"

But that also loses the valid multibyte character €

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
"€fooxA0".encode('UTF-16le', invalid: :replace, replace: '').encode('UTF-8')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...