This page contains the definitive listing of all errata of record
since the publication of The Unicode Standard, Version 5.0 and
considered resolved by the release of Unicode Version 5.1. These
errata are listed by date in the table below. For prior errata
resolved in Unicode 5.0 and earlier, see
Errata Fixed in Unicode 5.0.
For errata still pending subsequent to the release of Unicode
5.1.0, see the list of current
Updates and Errata.
Date |
Summary |
2008-March-07 |
The representative glyph for U+1D81 in the Unicode 5.0 chart has an
extraneous line running from the lower right to upper left side
of the glyph. It is most visible at high resolutions. The
incorrect glyph is shown on the left, and a corrected glyph on
the right.
|
2007-November-20 |
The representative glyph for U+1E9A LATIN SMALL LETTER A WITH
RIGHT HALF RING in Unicode 2.0 has the ring well to the right.
The representative glyph in Unicode 3.0 and later incorrectly
had the right half ring over the base letter. Below are shown
the incorrect glyph on the left, and the corrected glyph on the
right:
|
2007-August-23 |
In the code charts for Unicode Version 5.0, the glyphs for U+0333
and U+0347 are incorrect. The glyph for U+0333 should be longer.
The glyph for U+0347 should be shorter. The glyphs were not
merely swapped: the correct glyph for U+0333 should be longer
than the incorrect glyph for U+0347. Below are shown incorrect
glyphs on the left and corrected glyphs on the right:
|
2007-July-30 |
In the code charts for Unicode Versions 5.0 and earlier,
the representative glyphs for U+0460 and U+047E are shown
with "broad omega" shaped glyphs. These are being corrected
to show "W"-shaped glyphs for the uppercase letters, matching
the shapes of their lowercase counterparts. The incorrect glyphs
are shown on the left; the corrected glyphs are shown on
the right.
|
2007-June-7 |
In the 5.0 code charts, the names for U+075E and U+075F are
correct, but the glyphs should be swapped. |
2007-April-19 |
In the code charts for Unicode Versions 5.0 and earlier,
the representative glyphs for U+0478 and U+0479 are shown
in an Old Church Slavonic (OCS) style typeface. The decision
to encode a monograph uk character for OCS has made that style
choice inappropriate for these characters. The incorrect glyphs
are shown on the left; the corrected glyphs are shown on
the right.
|
2007-April-12 |
In the code charts for Unicode Versions 5.0 and earlier, the
lower bar on the glyph for U+2626 ORTHODOX CROSS is slanted
downward in the wrong direction. The incorrect glyph is shown on
the left; the corrected glyph is shown on the right.
|
2007-March-14 |
In UAX #15, Unicode Normalization Forms, for Unicode 5.0, there is an erroneous statement in the last paragraph of Section 14, Detecting Normalization Forms. The text currently states:
"...that no string when decomposed with NFD expands to more than 3x in length (measured in code units)."
That text should be corrected to state:
"...that no string when normalized to NFC expands to more than 3x in length (measured in code units)."
|
2007-February-14 |
In the code charts for Unicode Versions 5.0 and earlier, the
representative glyphs for U+047C and U+047D represent an
incorrect understanding of the nature of the character that was
encoded ("beautiful omega"). The incorrect glyphs are shown on
the left; the corrected glyphs are shown on the right.
|
2007-February-02 |
The sample code in Section 7 of UAX#14 does not handle leading
spaces correctly. Adding the following code before the loop
provides a fix:
// treat SP at start of input as if it followed WJ
if (cls == SP)
cls = WJ;
|
2007-January-25 |
In the file DerivedCoreProperties.txt in the Version 5.0 Unicode Character Database, the stated rule in the comments for the generation of the Default_Ignorable_Code_Point property is incomplete. The rule should include all characters with the Variation_Selector property, so that the complete statement of the rule is:
Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
+ Noncharacter_Code_Point + Variation_Selector - White_Space
- FFF9..FFFB (Annotation Characters)
The actual listing of characters in the data file with the Default_Ignorable_Code_Point
property is correct.
Note that the stated rule was further updated for Version 5.1
of the standard, so the correction in this erratum notice applies
only to the Version 5.0 data file. |
2007-January-22 |
The code point U+00A0 was supposed to have the Sentence_Break property value Sp in the Unicode Character Database for Version 5.0, but that change was overlooked in the updating of SentenceBreakProperty.txt. This will be corrected in a subsequent version of the standard.
|
2006-September-11 |
In the code charts for Unicode Version 5.0, the representative glyphs for
U+1031 was incorrectly imaged on the wrong side
of the dotted circle. The incorrect glyph is shown on the left; the
corrected glyph is shown on the right.
|
2006-September-10 |
The Index.txt file in version 5.0.0 of the Unicode
Character Database is not valid UTF-8. The following
substitutions will fix the file:
Replace byte 0x92 in line 74 by U+00FC [ü]
LATIN SMALL LETTER U WITH DIAERESIS. Replace byte 0xe1 in lines 854 and 1549 by a space. |