Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
319 views
in Technique[技术] by (71.8m points)

In Unicode, why are there two representations for the Arabic digits?

I was reading the specification of Unicode @ Wikipedia (Arabic Unicode) and I see that each of the Arabic digits has 2 Unicode code points. For example 1 is defined as U+0661 and as U+06F1.

Which one should I use?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

According to the code charts, U+0660 .. U+0669 are ARABIC-INDIC DIGIT values 0 through 9, while U+06F0 .. U+06F9 are EXTENDED ARABIC-INDIC DIGIT values 0 through 9.

In the Unicode 3.0 book (5.2 is the current version, but these things don't change much once set), the U+066n series of glyphs are marked 'Arabic-Indic digits' and the U+06Fn series of glyphs are marked 'Eastern Arabic-Indic digits (Persian and Urdu)'. It also notes:

  • U+06F4 - 'different glyphs in Persian and Urdu'
  • U+06F5 - 'Persian and Urdu share glyph different from Arabic'
  • U+06F6 - 'Persian glyph different from Arabic'
  • U+06F7 - 'Urdu glyph different from Arabic'

For comparison:

  • U+066n: ??????????
  • U+06Fn: ??????????

Or, enlarged by making the information into a title:

U+066n: ??????????

U+06Fn: ??????????

Or:

     U+066n    U+06Fn
0      ?         ?
1      ?         ?
2      ?         ?
3      ?         ?
4      ?         ?
5      ?         ?
6      ?         ?
7      ?         ?
8      ?         ?
9      ?         ?

(Whether you can see any of those, and how clearly they are differentiated may depend on your browser and the fonts installed on your machine as much as anything else. I can see the difference on 4 and 6 clearly; 5 looks much the same in both.)

Based on this information, if you are working with Arabic from the Middle East, use the U+066n series of digits; if you are working with Persian or Urdu, use the U+06Fn series of digits. As a Unicode application, you should accept either set of codes as valid digits (but you might look askance at a sequence that mixed the two sets of digits - or you might just leave well alone).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...