Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
293 views
in Technique[技术] by (71.8m points)

java - Convert UTF-8 Unicode string to ASCII Unicode escaped String

I need to convert unicode string to string which have non-ascii characters encoded in unicode. For example, string "漢字 Max" should be presented as "u6F22u5B57 Max".

What I have tried:

  1. Differenct combinations of

    new String(sourceString.getBytes(encoding1), encoding2)

  2. Apache StringEscapeUtils which escapes also ascii chars like double-quote

    StringEscapeUtils.escapeJava(source)

Is there an easy way to encode such string? Ideally only Java 6 SE or Apache Commons should be used to achieve desired result.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is the kind of simple code Jon Skeet had in mind in his comment:

final String in = "????asdf";
final StringBuilder out = new StringBuilder();
for (int i = 0; i < in.length(); i++) {
  final char ch = in.charAt(i);
  if (ch <= 127) out.append(ch);
  else out.append("\u").append(String.format("%04x", (int)ch));
}
System.out.println(out.toString());

As Jon said, surrogate pairs will be represented as a pair of u escapes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...