What does it mean?
From W3C:
In Unicode it is possible to produce
the same text with different sequences
of characters. For example, take the
Hungarian word világ. The fourth
letter could be stored in memory as a
precomposed U+00E1 LATIN SMALL LETTER A WITH ACUTE (a single
character) or as a decomposed
sequence of U+0061 LATIN SMALL LETTER
A followed by U+0301 COMBINING ACUTE
ACCENT (two characters).
világ = világ
The Unicode Standard allows either of
these alternatives, but requires that
both be treated as identical. To
improve efficiency, an application
will usually normalize text before
performing searches or comparisons.
Normalization, in this case, means
converting the text to use all
precomposed or all decomposed
characters.
There are four normalization forms
specified by the Unicode Standard:
NFC, NFD, NFKC and NFKD. The C stands
for (pre-)composed, and the D for
decomposed. The K stands for
compatibility. To improve
interoperability, the W3C recommends
the use of NFC normalized text on
the Web.
Besides "to improve interoperability", precomposed text usually looks better than decomposes text.
How can I fix this with free tools
By using the function equivalent to Python's text = unicodedata.normalize('NFC', text)
in your favorite programming language.
(Or, if you weren't planning to write a program, your question should be moved to superuser or webmasters.)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…