Corrigendum #5: Normalization Idempotency
Corrigendum |
Effective Date |
Applicable Versions |
Fixed Version |
Result Documented In: |
Corrigendum #5: Normalization Idempotency |
2005-Feb-07 [102-C3, PRI #61, PRI #29] |
3.0.0 to 4.0.1 |
4.1.0 2005-March |
UAX #15 |
Background
The language of the of the specification of
UAX #15: Unicode Normalization Forms
(citing Version 4.0) for forms NFC and NFKC is not logically self-consistent
in The Unicode Standard, Versions 3.0 through 4.0.1. Programs that depend
on such logical consistency could be subject to security problems until
fixed, although as yet no realistic scenarios are known that would present
such problems. The problem text occurs in Definition
D2, which defines what it means for a character to be blocked. This
corrigendum provides a textual fix for this problem.
The change will not have an impact on real data found in practice (with the possible
exception of test cases for the algorithm itself), because the affected sequences do not
constitute well-formed text in any known language.
For more background information, see
Public Review Issue #29, Normalization Issue.
Changes to the Text of UAX #15
Whenever this corrigendum is applied to a version of Unicode from Unicode
3.0.0 to Unicode 4.0.1, the text for definition D2 in
UAX #15 is changed by adding two words
(underlined here), so that it has the following wording:
D2. In any character sequence beginning with a starter S, a character C is
blocked from S if and only if there is some character B between S and C, and either B
is a starter or it has the same or higher combining class as C.
Explanatory text on the implications of this corrigendum for
implementations can be found in
UAX #15: Unicode Normalization Forms in Section 3.3,
Guaranteeing Process Stability and Section
20,
Corrigendum 5 Sequences.