Unicode 9.0 Web Bookmarks About this page This page contains hyperlinks to The Unicode Standard, Version 9.0. The Unicode 9.0.0 page lists the contents with links to each PDF file. Preface Why Unicode? What’s New? Support for Languages and Symbol Sets Property and Behavioral Updates Detailed Change Information Organization of This Standard Concepts, Architecture, Conformance, and Guidelines Character Block Descriptions Code Charts Appendices References and Index Glossary and Character Index Unicode Standard Annexes The Unicode Character Database Unicode Code Charts Unicode Technical Standards and Unicode Technical Reports Updates and Errata Acknowledgements 1 Introduction Figure 1-1. Wide ASCII 1.1 Coverage Standards Coverage New Characters 1.2 Design Goals Figure 1-2. Unicode Compared to the 2022 Framework 1.3 Text Handling Characters and Glyphs Text Elements 2 General Structure 2.1 Architectural Context Basic Text Processes Text Elements, Characters, and Text Processes Figure 2-1. Text Elements and Characters Text Processes and Encoding Character Identity 2.2 Unicode Design Principles Table 2-1. The 10 Unicode Design Principles Universality Efficiency Characters, Not Glyphs Figure 2-2. Characters Versus Glyphs Table 2-2. User-Perceived Characters with Multiple Code Points Figure 2-3. Unicode Character Code to Rendered Glyphs Semantics Plain Text Logical Order Figure 2-4. Bidirectional Ordering Figure 2-5. Writing Direction and Numbers Unification Figure 2-6. Typeface Variation for the Bone Character Dynamic Composition Figure 2-7. Dynamic Composition Equivalent Sequences Stability Convertibility 2.3 Compatibility Characters Usage Allocation Compatibility Variants Compatibility Decomposable Characters Compatibility Character Versus Compatibility Decomposable Character 2.4 Code Points and Characters Figure 2-8. Abstract and Encoded Characters Types of Code Points Table 2-3. Types of Code Points Control Codes Noncharacters Private Use Surrogates Restricted Interchange Code Point Semantics 2.5 Encoding Forms Non-overlap Figure 2-9. Overlap in Legacy Mixed-Width Encodings Figure 2-10. Boundaries and Interpretation Conformance Examples Figure 2-11. Unicode Encoding Forms UTF-32 Fixed Width Preferred Usage UTF-16 Optimized for BMP Supplementary Characters and Surrogates Preferred Usage Origin Collation UTF-8 Byte-Oriented Variable Width ASCII Transparency Preferred Usage Self-synchronizing Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 UTF-32 Versus UTF-16 Characters Versus Code Points UTF-8 Binary Sorting 2.6 Encoding Schemes Byte Order Table 2-4. The Seven Unicode Encoding Schemes Encoding Scheme Versus Encoding Form Examples Figure 2-12. Unicode Encoding Schemes 2.7 Unicode Strings 2.8 Unicode Allocation Planes Basic Multilingual Plane Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Private Use Planes Allocation Areas and Character Blocks Allocation Areas Blocks Allocation Order Assignment of Code Points 2.9 Details of Allocation Figure 2-13. Unicode Allocation Plane 0 (BMP) Figure 2-14. Allocation on the BMP ASCII and Latin-1 Compatibility Area General Scripts Area Punctuation and Symbols Area Supplementary General Scripts Area CJK Miscellaneous Area CJKV Ideographs Area General Scripts Area (Asia and Africa) Hangul Area Surrogates Area Private Use Area Compatibility and Specials Area Plane 1 (SMP) Figure 2-15. Allocation on Plane 1 General Scripts Areas General Scripts Areas (RTL) Cuneiform and Hieroglyphic Area Ideographic Scripts Area Symbols Areas Plane 2 (SIP) Other Planes 2.10 Writing Direction Figure 2-16. Writing Directions Bidirectional Vertical Boustrophedon Other Historical Directionalities 2.11 Combining Characters Combining Characters Diacritics Symbol Diacritics Enclosing Combining Marks Figure 2-17. Combining Enclosing Marks for Symbols Script-Specific Combining Characters Sequence of Base Characters and Diacritics Figure 2-18. Sequence of Base Characters and Diacritics Ordering Indic Vowel Signs Figure 2-19. Reordered Indic Vowel Signs Properties Figure 2-20. Properties and Combining Character Sequences Multiple Combining Characters Figure 2-21. Stacking Sequences Table 2-5. Interaction of Combining Characters Table 2-6. Nondefault Stacking Ligated Multiple Base Characters Figure 2-22. Ligated Multiple Base Characters Exhibiting Nonspacing Marks in Isolation “Characters” and Grapheme Clusters 2.12 Equivalent Sequences Figure 2-23. Equivalent Sequences Normalization Figure 2-24. Canonical Ordering Decompositions Types of Decomposables Examples Figure 2-25. Types of Decomposables Non-decomposition of Certain Diacritics Overlaid and Attached Diacritics Other Diacritics Character Names and Decomposition Simulated Decomposition in Processing Security Issue 2.13 Special Characters Special Noncharacter Code Points Byte Order Mark (BOM) Unicode Signature Layout and Format Control Characters The Replacement Character Control Codes 2.14 Conforming to the Unicode Standard Characteristics of Conformant Implementations Unacceptable Behavior Acceptable Behavior Supported Subsets 3 Conformance 3.1 Versions of the Unicode Standard Stability Version Numbering Major and Minor Versions Update Version Scheduling of Versions Errata and Corrigenda Errata Corrigenda References to the Unicode Standard Precision in Version Citation References to Unicode Character Properties References to Unicode Algorithms 3.2 Conformance Requirements Code Points Unassigned to Abstract Characters Interpretation Modification Character Encoding Forms Character Encoding Schemes Bidirectional Text Normalization Forms Normative References Unicode Algorithms Default Casing Algorithms Unicode Standard Annexes 3.3 Semantics Definitions Character Identity and Semantics 3.4 Characters and Encoding Table 3-1. Named Unicode Algorithms 3.5 Properties Types of Properties Property Values Default Property Values Classification of Properties by Their Values Property Status Table 3-2. Normative Character Properties Table 3-3. Informative Character Properties Context Dependence Stability of Properties Simple and Derived Properties Property Aliases Private Use 3.6 Combination Combining Character Sequences Grapheme Clusters Application of Combining Marks Figure 3-1. Enclosing Marks Combining Marks and Korean Syllables 3.7 Decomposition Compatibility Decomposition Canonical Decomposition 3.8 Surrogates 3.9 Unicode Encoding Forms Table 3-4. Examples of Unicode Encoding Forms UTF-32 UTF-16 Table 3-5. UTF-16 Bit Distribution UTF-8 Table 3-6. UTF-8 Bit Distribution Table 3-7. Well-Formed UTF-8 Byte Sequences Encoding Form Conversion Constraints on Conversion Processes Best Practices for Using U+FFFD Table 3-8. Use of U+FFFD in UTF-8 Conversion 3.10 Unicode Encoding Schemes Table 3-9. Summary of UTF-16BE, UTF-16LE, and UTF-16 Table 3-10. Summary of UTF-32BE, UTF-32LE, and UTF-32 3.11 Normalization Forms Normalization Stability Combining Classes Specification of Unicode Normalization Forms Starters Table 3-11. Combining Marks and Starter Status Canonical Ordering Algorithm Table 3-12. Reorderable Pairs Canonical Composition Algorithm Definition of Normalization Forms 3.12 Conjoining Jamo Behavior Definitions Hangul Syllable Decomposition Table 3-13. Hangul Characters Used in Examples Common Constants Syllable Index Arithmetic Decomposition Mapping Full Canonical Decomposition Example Hangul Syllable Composition Arithmetic Primary Composite Mapping Example Hangul Syllable Name Generation Full Canonical Decomposition Jamo Short Name Mapping Name Concatenation Example Sample Code for Hangul Algorithms Common Constants Hangul Decomposition Hangul Composition Hangul Character Name Generation Additional Transformations for Hangul Jamo 3.13 Default Case Algorithms Tailoring Definitions Table 3-14. Context Specification for Casing Default Case Conversion Default Case Folding Default Case Detection Table 3-15. Case Detection Examples Default Caseless Matching 4 Character Properties Status and Attributes Consistency of Properties 4.1 Unicode Character Database Unihan Database Stability Aliases UCD in XML Online Availability 4.2 Case Definitions of Case and Casing Table 4-1. Relationship of Casing Definitions Table 4-2. Case Function Values for Strings Case Mapping Table 4-3. Sources for Case Mapping Information 4.3 Combining Classes Figure 4-1. Positions of Common Combining Marks Reordrant, Split, and Subjoined Combining Marks Reordrant Class Zero Combining Marks Table 4-4. Class Zero Combining Marks—Reordrant Table 4-5. Thai, Lao, and Other Logical Order Exceptions Split Class Zero Combining Marks Table 4-6. Class Zero Combining Marks—Split Subjoined Class Zero Combining Marks Table 4-7. Class Zero Combining Marks—Subjoined Strikethrough Class Zero Combining Marks Table 4-8. Class Zero Combining Marks—Strikethrough 4.4 Directionality 4.5 General Category Table 4-9. General Category 4.6 Numeric Value Decimal Digits Script-Specific Digits Ideographic Numeric Values Table 4-10. Primary Numeric Ideographs Table 4-11. Ideographs Used as Accounting Numbers 4.7 Bidi Mirrored Related Properties 4.8 Name Stability Character Name Syntax Names as Identifiers Character Name Matching Named Character Sequences Character Name Aliases Table 4-12. Types of Character Name Aliases Unicode Name Property Formal Definition of the Name Property Table 4-13. Name Derivation Rule Prefix Strings Name Uniqueness Interpretation of Field 1 of UnicodeData.txt Control Codes Code Point Labels Table 4-14. Construction of Code Point Labels Use of Character Names in APIs and User Interfaces Use in APIs User Interfaces 4.9 Unicode 1.0 Names 4.10 Letters, Alphabetic, and Ideographic Letters and Syllables Alphabetic Ideographic 4.11 Properties Related to Text Boundaries 4.12 Characters with Unusual Properties Table 4-15. Unusual Properties 5 Implementation Guidelines 5.1 Data Structures for Character Conversion Issues Multistage Tables Flat Tables. Ranges Two-Stage Tables Figure 5-1. Two-Stage Tables Optimized Two-Stage Table Multistage Table Tuning 5.2 Programming Languages and Data Types Unicode Data Types for C ANSI/ISO C wchar_t 5.3 Unknown and Missing Characters Reserved and Private-Use Character Codes Interpretable but Unrenderable Characters Default Ignorable Code Points Interacting with Downlevel Systems 5.4 Handling Surrogate Pairs in UTF-16 Strategies for Surrogate Pair Support 5.5 Handling Numbers 5.6 Normalization Alternative Spellings Normalization Figure 5-2. Normalization 5.7 Compression 5.8 Newline Guidelines Definitions Table 5-1. Hex Values for Acronyms Encoding Notation EBCDIC Newline Function Table 5-2. NLF Platform Correlations Line Separator and Paragraph Separator Recommendations Converting from Other Character Code Sets Interpreting Characters in Text Converting to Other Character Code Sets Input and Output Page Separator 5.9 Regular Expressions 5.10 Language Information in Plain Text Requirements for Language Tagging Language Tags and Han Unification Typical Scenarios 5.11 Editing and Selection Consistent Text Elements Cluster Boundaries Figure 5-3. Consistent Character Boundaries Stacked Boundaries Atomic Character Boundaries. Linear Boundaries Nonlinear Boundaries 5.12 Strategies for Handling Nonspacing Marks Rendering Other Processes Keyboard Input Figure 5-4. Dead Keys Versus Handwriting Sequence Truncation Figure 5-5. Truncating Grapheme Clusters 5.13 Rendering Nonspacing Marks Figure 5-6. Inside-Out Rule Fallback Rendering Figure 5-7. Fallback Rendering Bidirectional Positioning Figure 5-8. Bidirectional Placement Justification Figure 5-9. Justification Canonical Equivalence Table 5-3. Typing Order Differing from Canonical Order Table 5-4. Permuting Combining Class Weights Positioning Methods Positioning with Ligatures Figure 5-10. Positioning with Ligatures Positioning with Contextual Forms Figure 5-11. Positioning with Contextual Forms Positioning with Enhanced Kerning Figure 5-12. Positioning with Enhanced Kerning 5.14 Locating Text Element Boundaries 5.15 Identifiers 5.16 Sorting and Searching Culturally Expected Sorting and Searching Language-Insensitive Sorting Searching Sublinear Searching Figure 5-13. Sublinear Searching 5.17 Binary Order UTF-8 in UTF-16 Order UTF-16 in UTF-8 Order 5.18 Case Mappings Titlecasing Complications for Case Mapping Change in Length Greek iota subscript Context-dependent Case Mappings Locale-dependent Case Mappings Figure 5-14. Uppercase Mapping for Turkish I Figure 5-15. Lowercase Mapping for Turkish I Caseless Characters German sharp s Figure 5-16. Casing of German Sharp S Reversibility Caseless Matching Stability Normalization and Casing Table 5-5. Casing and Normalization in Strings 5.19 Mapping Compatibility Variants Confusables 5.20 Unicode Security Alternate Encodings Spoofing 5.21 Ignoring Characters in Processing Characters Ignored in Text Segmentation Characters Ignored in Line Breaking Characters Ignored in Cursive Joining Characters Ignored in Identifiers Characters Ignored in Searching and Sorting Characters Ignored for Display Normal Rendering Fallback Rendering Default Ignorable Code Point 5.22 Best Practice for U+FFFD Substitution 6 Writing Systems and Punctuation Scripts and Blocks Scripts and Writing Systems Punctuation 6.1 Writing Systems Alphabets Abjads Syllabaries Abugidas Figure 6-1. Overriding Inherent Vowels Logosyllabaries Typology of Scripts in the Unicode Standard Table 6-1. Typology of Scripts in the Unicode Standard Notational Systems 6.2 General Punctuation Use and Interpretation Rendering Writing Direction Figure 6-2. Forms of CJK Punctuation Layout Controls Encoding Characters with Multiple Semantic Values Blocks Devoted to Punctuation Format Control Characters Space Characters Table 6-2. Unicode Space Characters No-Break Space Narrow No-Break Space Dashes and Hyphens Table 6-3. Unicode Dash Characters Soft Hyphen Tilde. Dictionary Abbreviation Symbols Paired Punctuation Mirroring of Paired Punctuation. Quotation Marks and Brackets Language-Based Usage of Quotation Marks European Usage Figure 6-3. European Quotation Marks Glyph Variation in Curly Quotes Table 6-4. Models of Visual Relationship between Quote Glyphs East Asian Usage Table 6-5. East Asian Quotation Marks Glyph Variation in East Asian Usage. Figure 6-4. Asian Quotation Marks Table 6-6. Opening and Closing Forms Overloaded Character Codes Consequences for Semantics Apostrophes Letter Apostrophe Punctuation Apostrophe Other Punctuation Hyphenation Point Word Separator Middle Dot Fraction Slash Spacing Overscores and Underscores Doubled Punctuation Period or Full Stop Ellipsis Vertical Ellipsis Leader Dots Other Basic Latin Punctuation Marks Canonical Equivalence Issues for Greek Punctuation Bullets Paragraph Marks Numeric Separators. Obelus Commercial Minus At Sign Table 6-7. Names for the @ Archaic Punctuation and Editorial Marks Archaic Punctuation Editorial Marks New Testament Editorial Marks Ancient Greek Editorial Marks Figure 6-5. Examples of Ancient Greek Editorial Marks Figure 6-6. Use of Greek Paragraphos Double Oblique Hyphen Indic Punctuation Dandas Table 6-8. Unicode Danda Characters CJK Punctuation Figure 6-7. CJK Parentheses Wave Dash Sesame Dots Unknown or Unavailable Ideographs CJK Compatibility Forms Vertical Forms Styled Overscores and Underscores Small Form Variants Fullwidth and Halfwidth Variants 7 Europe-I 7.1 Latin Languages Diacritical Marks. Alternative Glyphs. Figure 7-1. Alternative Glyphs in Latin Variations in Diacritical Marks Table 7-1. Preferred Rendering of Cedilla versus Comma Below Latvian Cedilla Cedilla and Comma Below in Turkish and Romanian Exceptional Case Pairs Diacritics on i and j Figure 7-2. Diacritics on i and j Vietnamese Figure 7-3. Vietnamese Letters and Tone Marks Standards. Related Characters Letters of Basic Latin: U+0041–U+007A Letters of the Latin-1 Supplement: U+00C0–U+00FF Languages Ordinals Latin Extended-A: U+0100–U+017F Compatibility Digraphs Languages Latin Extended-B: U+0180–U+024F Arrangement Croatian Digraphs Matching Serbian Cyrillic Letters Pinyin Diacritic–Vowel Combinations Case Pairs Caseless Letters Glottal Stop IPA Extensions: U+0250–U+02AF Standards Unifications IPA Alternates Case Pairs Typographic Variants Affricate Digraph Ligatures Arrangement Phonetic Extensions: U+1D00–U+1DBF Typographic Features of the UPA. Other Phonetic Extensions Digraph for th Latin Extended Additional: U+1E00–U+1EFF Capital Sharp S Vietnamese Vowel Plus Tone Mark Combinations Latin Extended-C: U+2C60–U+2C7F Uyghur Claudian Letters Latin Extended-D: U+A720–U+A7FF Egyptological Transliteration Historic Mayan Letters European Medievalist Letters Insular and Celticist Letters Orthographic Letter Additions Sinological Dot Latvian Letters Ancient Roman Epigraphic Letters Latin Extended-E: U+AB30–U+AB6F Latin Ligatures: U+FB00–U+FB06 7.2 Greek Greek: U+0370–U+03FF Standards Polytonic Greek Nonspacing Marks Table 7-2. Nonspacing Marks Used with Greek Iota Variant Letterforms Figure 7-4. Variations in Greek Capital Letter Upsilon Representative Glyphs for Greek Phi Greek Letters as Symbols Symbols Versus Numbers Compatibility Punctuation Historic Letters Coptic-Unique Letters Related Characters Greek Extended: U+1F00–U+1FFF Spacing Diacritics Table 7-3. Greek Spacing and Nonspacing Pairs Ancient Greek Numbers: U+10140–U+1018F Acrophonic Numerals Other Numerical Symbols Symbol for Zero 7.3 Coptic Development of the Coptic Script Casing Font Styles Characters for Cryptogrammic Use Crossed Shei Supralineation Combining Diacritical Marks Punctuation Numerical Use of Letters Figure 7-5. Coptic Numerals 7.4 Cyrillic Structure Historic Letterforms Glagolitic Cyrillic: U+0400–U+04FF Standards Extended Cyrillic Abkhasian Palochka Broad Omega Digraph Onik and Monograph Uk Palatalization Combining Titlo Cyrillic Supplement: U+0500–U+052F Komi Kurdish Letters Cyrillic Extended-A: U+2DE0–U+2DFF Titlo Letters Figure 7-6. Combination of Titlo Letters Cyrillic Extended-B: U+A640–U+A69F Numeric Enclosing Signs Titlo Letters Old Abkhasian Letters 7.5 Glagolitic Glyph Forms Ordering Punctuation and Diacritics Numerical Use of Letters 7.6 Armenian Orthography User Community Punctuation Preferred Characters Ligatures 7.7 Georgian Script Forms Case Forms Figure 7-7. Georgian Scripts and Casing Mtavruli Style Punctuation Historic Punctuation 7.8 Modifier Letters Case and Modifier Letters General Category Blocks Character Names Spacing Modifier Letters: U+02B0–U+02FF Phonetic Usage Encoding Principles Superscript Letters Spacing Clones of Diacritics Rhotic Hook Tone Letters Figure 7-8. Tone Letters Modifier Tone Letters: U+A700–U+A71F 7.9 Combining Marks Sequence of Base Letters and Combining Marks Multiple Semantics Glyphic Variation Overlaid Diacritics Marks as Spacing Characters Spacing Clones of Diacritical Marks Relationship to ISO/IEC 8859-1 Diacritics Positioned Over Two Base Characters Figure 7-9. Double Diacritics Figure 7-10. Positioning of Double Diacritics Figure 7-11. Use of CGJ with Double Diacritics Diacritics Positioned Over Three or More Base Characters Subtending Marks Combining Marks with Ligatures Figure 7-12. Interaction of Combining Marks with Ligatures Combining Diacritical Marks: U+0300–U+036F Standards Underlining and Overlining Combining Diacritical Marks Extended: U+1AB0–U+1AFF Combining Parentheses Figure 7-13. Positioning of Combining Parentheses Combining Diacritical Marks Supplement: U+1DC0–U+1DFF Combining Marks for Symbols: U+20D0–U+20FF Figure 7-14. Use of Vertical Line Overlay for Negation Enclosing Marks Combining Half Marks: U+FE20–U+FE2F Figure 7-15. Double Diacritics and Half Marks Combining Marks in Other Blocks 8 Europe-II 8.1 Linear A Encoding Structure Character Names Directionality Numbers 8.2 Linear B Linear B Syllabary: U+10000–U+1007F Standards Linear B Ideograms: U+10080–U+100FF Aegean Numbers: U+10100–U+1013F 8.3 Cypriot Syllabary Table 8-1. Similar Characters in Linear B and Cypriot 8.4 Ancient Anatolian Alphabets Lycian: U+10280–U+1029F Carian: U+102A0–U+102DF Lydian: U+10920–U+1093F Lycian Carian Lydian 8.5 Old Italic Directionality. Punctuation. Numerals. Glyphs. Figure 8-1. Distribution of Old Italic 8.6 Runic The Runic Alphabet Direction Representative Glyphs Unifications Long-Branch and Short-Twig Staveless Runes Punctuation Marks Golden Numbers Encoding 8.7 Old Hungarian Structure Directionality Punctuation and Numbers 8.8 Gothic Diacritics. Numerals. Punctuation. 8.9 Elbasan Structure Accents and Other Marks Names Numerals and Punctuation 8.10 Caucasian Albanian Structure Abbreviations Numerals Punctuation 8.11 Old Permic Structure Combining Letters Combining Marks Table 8-2. Combining Marks Used in Old Permic Numerals Punctuation 8.12 Ogham Structure. Rendering. Forfeda (Supplementary Characters) 8.13 Shavian Structure. Collation 9 Middle East-I 9.1 Hebrew Hebrew: U+0590–U+05FF Directionality Cursive. Standards Vowels and Other Pronunciation Marks Shin and Sin Final (Contextual Variant) Letterforms Yiddish Digraphs Punctuation Cantillation Marks Positioning Meteg Atnah Hafukh and Qamats Qatan Holam Male and Holam Haser Puncta Extraordinaria Nun Hafukha Currency Symbol Alphabetic Presentation Forms: U+FB1D–U+FB4F Use of Wide Letters 9.2 Arabic Arabic: U+0600–U+06FF Figure 9-1. Directionality and Cursive Connection Directionality Standards Encoding Principles Punctuation The Non-joiner and the Joiner Figure 9-2. Using a Joiner Figure 9-3. Using a Non-joiner Figure 9-4. Combinations of Joiners and Non-joiners Tashkil Nonspacing Marks Figure 9-5. Placement of Harakat Arabic-Indic Digits Table 9-1. Arabic Digit Names Table 9-2. Glyph Variation in Eastern Arabic-Indic Digits Extended Arabic Letters Koranic Annotation Signs Additional Vowel Marks Honorifics Arabic Mathematical Symbols Date Separator Full Stop Currency Symbols Signs Spanning Numbers Figure 9-6. Arabic Year Sign Poetic Verse Sign Arabic Cursive Joining Minimum Rendering Requirements Joining Types Table 9-3. Primary Arabic Joining Types Table 9-4. Derived Arabic Joining Types Joining Rules Table 9-5. Arabic Glyph Types Arabic Ligatures Ligature Classes Table 9-6. Arabic Obligatory Ligature Joining Groups Ligature Rules Table 9-7. Arabic Ligature Notation Optional Features Arabic Joining Groups Dual-Joining Table 9-8. Dual-Joining Arabic Characters Right-Joining Table 9-9. Right-Joining Arabic Characters Letter heh Letter yeh Table 9-10. Forms of the Arabic Letter yeh Noon Ghunna Combining Hamza Above Table 9-11. Arabic Letters With Hamza Above Jawi Kurdish Arabic Supplement: U+0750–U+077F Marwari Arabic Extended-A: U+08A0–U+08FF Arabic Presentation Forms-A: U+FB50–U+FDFF Ornate Parentheses Nuktas Word Ligatures Arabic Presentation Forms-B: U+FE70–U+FEFF Spacing and Tatweel Forms of Arabic Diacritics Zero Width No-Break Space 9.3 Syriac Syriac: U+0700–U+074F Syriac Language Languages Using the Syriac Script. Shaping Directionality Syriac Type Styles Character Names Syriac Abbreviation Mark Figure 9-7. Syriac Abbreviation Figure 9-8. Use of SAM Ligatures and Combining Characters Diacritical Marks and Vowels Punctuation Digits Harklean Marks Dalath and Rish Semkath Vowel Marks Miscellaneous Diacritics. Table 9-12. Miscellaneous Syriac Diacritic Use Use of Characters of the Arabic Block Syriac Shaping Minimum Rendering Requirements Joining Types Table 9-13. Syriac Final Alaph Glyph Types Syriac Character Joining Groups Table 9-14. Dual-Joining Syriac Characters Table 9-15. Right-Joining Syriac Characters Table 9-16. Syriac Alaph Glyph Forms Ligature Classes Table 9-17. Syriac Ligatures 9.4 Samaritan Directionality Vowel Signs Consonant Modifiers Punctuation Table 9-18. Samaritan Performative Punctuation Marks 9.5 Mandaic Letter It Structure Punctuation Directionality Shaping and Layout Behavior Table 9-19. Dual-Joining Mandaic Characters Table 9-20. Right-Joining Mandaic Characters Line Breaking 10 Middle East-II 10.1 Old North Arabian Structure Ordering Numbers Punctuation 10.2 Old South Arabian Directionality Structure Segmentation Monograms Numbers Table 10-1. Old South Arabian Numeric Characters Table 10-2. Number Formation in Old South Arabian Character Names 10.3 Phoenician Directionality Punctuation Stylistic Variation Numerals Character Names 10.4 Imperial Aramaic Directionality Punctuation Numbers Table 10-3. Number Formation in Aramaic 10.5 Manichaean Directionality Structure Shaping Table 10-4. Dual-Joining Manichaean Letters Table 10-5. Right-Joining Manichaean Letters Table 10-6. Left-Joining Manichaean Letters Table 10-7. Non-Joining Manichaean Letters Table 10-8. Manichaean Ligatures Numbers Punctuation 10.6 Pahlavi and Parthian Inscriptional Parthian: U+10B40–U+10B5F Inscriptional Pahlavi: U+10B60–U+10B7F Directionality Shaping and Layout Behavior Table 10-9. Inscriptional Parthian Shaping Behavior Numbers Heterograms Psalter Pahlavi: U+10B80–U+10BAF Structure Numbers Punctuation 10.7 Avestan Directionality Shaping Behavior Table 10-10. Avestan Shaping Behavior Punctuation 10.8 Nabataean Structure Directionality Numerals Punctuation 10.9 Palmyrene Structure Directionality Numerals Symbols Punctuation 10.10 Hatran Structure Directionality Numerals Punctuation 11 Cuneiform and Hieroglyphs 11.1 Sumero-Akkadian Cuneiform: U+12000–U+123FF Early History of Cuneiform Geographic Range Table 11-1. Cuneiform Script Usage Sources and Coverage Simple Signs Complex and Compound Signs Mergers and Splits Fonts Glyph Variants Acquiring Independent Semantic Status Formatting Ordering Other Standards Cuneiform Numbers and Punctuation: U+12400–U+1247F Cuneiform Punctuation Cuneiform Numerals Early Dynastic Cuneiform: U+12480–U+1254F 11.2 Ugaritic Variant Glyphs Ordering. Character Names. 11.3 Old Persian Directionality Repertoire Numerals Variants 11.4 Egyptian Hieroglyphs Structure Directionality Rendering Table 11-2. Hieroglyphic Character Sequence Figure 11-1. Interpretation of Hieroglyphic Markup Hieratic Fonts Repertoire Character Names Sign Classification Enclosures Numerals 11.5 Meroitic Structure Directionality Shaping Punctuation Symbols Meroitic Cursive Numbers 11.6 Anatolian Hieroglyphs Structure Directionality Repertoire Annotations Punctuation Numbers Rendering 12 South and Central Asia-I 12.1 Devanagari Devanagari: U+0900–U+097F Standards Encoding Principles Principles of the Devanagari Script Rendering Devanagari Characters Consonant Letters Independent Vowel Letters Dependent Vowel Signs (Matras) Vowel Letters Table 12-1. Devanagari Vowel Letters Virama (Halant) Figure 12-1. Dead Consonants in Devanagari Consonant Conjuncts Figure 12-2. Conjunct Formations in Devanagari Explicit Virama (Halant) Figure 12-3. Preventing Conjunct Forms in Devanagari Explicit Half-Consonants Figure 12-4. Half-Consonants in Devanagari Figure 12-5. Independent Half-Forms in Devanagari Figure 12-6. Half-Consonants in Oriya Consonant Forms Figure 12-7. Consonant Forms in Devanagari and Oriya Rendering Devanagari Rules for Rendering Notation Dead Consonant Rule Consonant RA Rules Modifier Mark Rules Ligature Rules Memory Representation and Rendering Order Figure 12-8. Rendering Order in Devanagari Sample Half-Forms Table 12-2. Sample Devanagari Half-Forms Sample Ligatures Table 12-3. Sample Devanagari Ligatures Ligature Forms for Ra + Vocalic Liquids Table 12-4. RA + Vocalic Letter Ligature Forms Sample Half-Ligature Forms Table 12-5. Sample Devanagari Half-Ligature Forms Language-Specific Allographs Table 12-6. Marathi and Nepali Allographs Combining Marks Devanagari Digits, Punctuation, and Symbols Digits Punctuation Other Symbols Extensions in the Main Devanagari Block Sindhi Letters Konkani Bodo, Dogri, and Maithili Figure 12-9. Use of Apostrophe in Bodo, Dogri and Maithili Figure 12-10. Use of Avagraha in Dogri Kashmiri Letters Letters for Bihari Languages Table 12-7. Devanagari Vowels Used in Bihari Languages Prishthamatra Orthography Table 12-8. Prishthamatra Orthography Devanagari Extended: U+A8E0–U+A8FF Cantillation Marks for the SZmaveda Nasalization Marks Editorial Marks Vedic Extensions: U+1CD0–U+1CFF Tone Marks Diacritics for the Visarga. Nasalization Marks Ardhavisarga 12.2 Bengali (Bangla) Virama (Hasant) Vowel Letters Table 12-9. Bengali Vowel Letters Table 12-10. Diphthong Vowel Letters in Kokborok Two-Part Vowel Signs Special Characters Historic Characters Characters for Assamese Table 12-11. Assamese Consonant-Vowel Combinations Rendering Behavior Consonant-Vowel Ligatures Table 12-12. Bengali Consonant-Vowel Combinations Figure 12-11. Requesting Bengali Consonant-Vowel Ligature Figure 12-12. Blocking Bengali Consonant-Vowel Ligature Khiya Khanda Ta. Figure 12-13. Bengali Syllable tta Ya-phalaa Interaction of Repha and Ya-phalaa Punctuation Truncation Table 12-13. Use of Apostrophe in Bangla 12.3 Gurmukhi Encoding Principles Vowel Letters Table 12-14. Gurmukhi Vowel Letters Tones Ordering Rendering Behavior Table 12-15. Gurmukhi Conjuncts Table 12-16. Additional Pairin and Addha Forms in Gurmukhi Table 12-17. Use of Joiners in Gurmukhi Other Symbols Punctuation 12.4 Gujarati Vowel Letters Table 12-18. Gujarati Vowel Letters Rendering Behavior Table 12-19. Gujarati Conjuncts Punctuation 12.5 Oriya (Odia) Special Characters Vowel Letters Table 12-20. Oriya Vowel Letters Rendering Behavior Table 12-21. Oriya Conjuncts Consonant Forms Vowels Table 12-22. Oriya Vowel Placement Oriya VA and WA. Punctuation and Symbols Table 12-23. Ligation for the Syllable om Fraction Characters 12.6 Tamil Tamil: U+0B80–U+0BFF Virama (Pu!!i) Figure 12-14. Kssa Ligature in Tamil Rendering of the Tamil Script Tamil Vowels Independent Versus Dependent Vowels Left-Side Vowels Figure 12-15. Tamil Vowel Reordering Two-Part Vowels Figure 12-16. Tamil Two-Part Vowels Figure 12-17. Tamil Vowel Splitting and Reordering Figure 12-18. Vowel Reordering Around a Tamil Conjunct Tamil Ligatures Ligatures with Vowel i Figure 12-19. Tamil Ligatures with i Ligatures with Vowel u Table 12-24. Tamil Ligatures with u Figure 12-20. Spacing Forms of Tamil u Ligatures with ra Figure 12-21. Tamil Ligatures with ra Ligatures with aa in Traditional Tamil Orthography Figure 12-22. Traditional Tamil Ligatures with aa Figure 12-23. Traditional Tamil Ligatures with o Ligatures with ai in Traditional Tamil Orthography Figure 12-24. Traditional Tamil Ligatures with ai Figure 12-25. Vowel ai in Modern Tamil Tamil aytham Punctuation Numbers Use of Nukta Tamil Named Character Sequences Table 12-25. Tamil Vowels, Consonants, and Syllables 12.7 Telugu Vowel Letters Table 12-26. Telugu Vowel Letters Rendering Behavior NakZra-Pollu Table 12-27. Rendering of Telugu na + virama Reph Special Characters Fractions Punctuation 12.8 Kannada Kannada: U+0C80–U+0CFF Principles of the Kannada Script Vowel Letters Table 12-28. Kannada Vowel Letters Consonant Conjuncts Special Characters Kannada Letter LLLA Figure 12-26. Indicating Retroflexion in Badaga Vowels Rendering Kannada Explicit Virama (Halant) Vowelless NA Table 12-29. Rendering of Kannada na + virama Consonant Clusters Involving RA Jihvamuliya and Upadhmaniya Modifier Mark Rules Avagraha Sign Punctuation 12.9 Malayalam Vowel Letters Table 12-30. Malayalam Vowel Letters Two-Part Vowels Historic Characters Malayalam Orthographic Reform Table 12-31. Malayalam Orthographic Reform Rendering Malayalam Candrakkala Table 12-32. Malayalam Conjuncts Table 12-33. Candrakkala Examples Explicit Candrakkala Requesting Traditional Ligatures Requesting Open Forms of Conjuncts Table 12-34. Use of Joiners in Malayalam Anusvara Dot Reph Chillu Forms Special Cases Involving rra Table 12-35. Malayalam /rara/ and /uua/ Table 12-36. Malayalam /nr/ and /nt/ Legacy Chillu Sequences Table 12-37. Atomic Encoding of Malayalam Chillus Malayalam Numbers and Punctuation Archaic Numbers Date Mark Punctuation 13 South and Central Asia-II 13.1 Thaana Directionality Vowels Table 13-1. Thaana Glyph Placement Numerals Punctuation Character Names and Arrangement 13.2 Sinhala Vowel Letters Table 13-2. Sinhala Vowel Letters Other Letters for Tamil. Punctuation Digits Sinhala Archaic Numbers: U+111E0–U+111FF 13.3 Newa Structure Vowels Virama and Conjuncts Murmured Resonant Consonants Table 13-3. Murmured Resonants in Nepal Bhasa Rendering Ligatures Digits Punctuation Other Signs 13.4 Tibetan General Principles of the Tibetan Script Figure 13-1. Tibetan Syllable Structure Consonants Vowels Coding Order Allographical Considerations Head Position “ra” Full-Form “ra” in Head Position. Subjoined Position “wa”, “ya”, and “ra” Halanta (Srog-Med). Line Breaking Considerations Tibetan Punctuation Svasti Signs Other Characters Tibetan Half-Numbers Tibetan Transliteration and Transcription of Other Languages Other Signs Traditional Text Formatting and Line Justification Figure 13-2. Justifying Tibetan Tseks Tibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding 13.5 Mongolian History Directionality Encoding Principles Figure 13-3. Mongolian Glyph Convergence Cursive Joining Figure 13-4. Mongolian Consonant Ligation Figure 13-5. Mongolian Positional Forms Free Variation Selectors Figure 13-6. Mongolian Free Variation Selector Representative Glyphs Vowel Harmony Figure 13-7. Mongolian Gender Forms Narrow No-Break Space Mongolian Vowel Separator Figure 13-8. Mongolian Vowel Separator Baluda Numbers Punctuation Nirugu Syllable Boundary Marker 13.6 Limbu Consonants Vowels Vowel Length Glottalization Collating Order Glyph Placement Table 13-4. Positions of Limbu Combining Characters Punctuation Digits 13.7 Meetei Mayek Structure Vowel Letters Final Consonants Abbreviations Order Punctuation Digits 13.8 Mro Structure Character Names Digits Mro has a script-specific set of digits Punctuation 13.9 Warang Citi Structure Digits and Numbers Punctuation 13.10 Ol Chiki Structure Digits Punctuation Modifier Letters Glottalization Aspiration Ligatures 13.11 Chakma Independent Vowels Vowel Killer and Virama Chakma Fonts Punctuation Digits 13.12 Lepcha Structure Vowels Medials Retroflex Consonants Ordering of Syllable Components Table 13-5. Lepcha Syllabic Structure Rendering Digits Punctuation 13.13 Saurashtra Glyph Placement Digits Punctuation Saurashtra Consonant Sign Haaru 14 South and Central Asia-III 14.1 Brahmi Encoding Model Vowel Letters Table 14-1. Brahmi Vowel Letters Rendering Behavior Figure 14-1. Consonant Ligatures in Brahmi Vowel Modifiers Old Tamil Brahmi Bhattiprolu Brahmi Punctuation Numerals Table 14-2. Brahmi Positional Digits 14.2 Kharoshthi Kharoshthi: U+10A00–U+10A5F Figure 14-2. Geographical Extent of the Kharoshthi Script Directionality Diacritical Marks and Vowels Numerals Figure 14-3. Kharoshthi Number 1996 Punctuation Word Breaks, Line Breaks, and Hyphenation Sorting Rendering Kharoshthi Figure 14-4. Kharoshthi Rendering Example Combining Vowels Table 14-3. Kharoshthi Vowel Signs Combining Vowel Modifiers Table 14-4. Kharoshthi Vowel Modifiers Combining Consonant Modifiers Table 14-5. Kharoshthi Consonant Modifiers Virama Table 14-6. Examples of Kharoshthi Virama 14.3 Bhaiksuki Structure Rendering Virama and Conjuncts Various Signs Digits and Numbers Punctuation 14.4 Phags-pa History Basic Structure Syllable Division Candrabindu Figure 14-5. Phags-pa Syllable Om Alternate Letters Numbers Punctuation Positional Variants Table 14-7. Phags-pa Positional Forms of I, U, E, and O Mirrored Variants Table 14-8. Contextual Glyph Mirroring in Phags-pa Table 14-9. Phags-pa Standardized Variants Figure 14-6. Phags-pa Reversed Shaping Cursive Joining 14.5 Marchen Encoding Model Vowels and Consonants Other Signs Punctuation 14.6 Old Turkic Structure Directionality Punctuation 15 South and Central Asia-IV 15.1 Syloti Nagri Virama and Conjuncts Digits Punctuation Poetry Marks 15.2 Kaithi Standards Styles Rendering Behavior Vowel Letters Consonant Conjuncts Ruled Lines Nukta Punctuation Digits 15.3 Sharada Rendering Behavior Ruled Lines Virama Candrabindu and Avagraha Jihvamuliya and Upadhmaniya Punctuation Digits 15.4 Takri Vowel Letters Table 15-1. Takri Vowel Letters Consonant Conjuncts Nukta Headlines Punctuation Fractions 15.5 Siddham Nukta Vowels Virama and Conjuncts Figure 15-1. Siddham Consonant Cluster Head Marks Repetition Marks Section Signs Punctuation Table 15-2. Siddham Punctuation Characters 15.6 Mahajani Structure Digits Other Symbols Punctuation 15.7 Khojki Structure Punctuation Digits 15.8 Khudawadi Structure Vowel Letters Table 15-3. Khudawadi Vowel Letters Consonant Conjuncts Nasalization Nukta Table 15-4. Representation of Arabic Sounds in Khudawadi Punctuation Digits 15.9 Multani Structure Digits Punctuation 15.10 Tirhuta Structure Vowels Table 15-5. Tirhuta Vowel Letters Consonants Virama Nasalization Characters for Representing Sanskrit Nukta Punctuation Special Signs Numbers 15.11 Modi Structure Vowel Letters Table 15-6. Modi Vowel Letters Rendering Consonant Clusters Involving ra Figure 15-2. Modi Shaping for ra Punctuation and Word Boundaries Various Signs Numbers 15.12 Grantha Rendering Behavior Consonant Clusters Figure 15-3. Splitting Large Conjunct Stacks in Grantha Virama Table 15-7. Rendering of Explicit Virama Forms in Grantha Vowels Signs Cantillation Marks Table 15-8. Additional Svara Marks used in Grantha Punctuation Numbers 15.13 Ahom Structure Vowels Syllabic Structure Numerals Punctuation Variant Forms 15.14 Sora Sompeng Encoding Structure Character Names Punctuation Line Breaking 16 Southeast Asia 16.1 Thai Standards. Encoding Principles. Table 16-1. Glyph Positions in Thai Syllables Rendering of Thai Combining Marks Thai Punctuation Spacing Thai Transcription of Pali and Sanskrit Patani Malay 16.2 Lao Encoding Principles Punctuation Glyph Placement Table 16-2. Glyph Positions in Lao Syllables Additional Letters Rendering of Lao Combining Marks Lao Aspirated Nasals 16.3 Myanmar Myanmar: U+1000–U+109F Standards Encoding Principles Composite Characters Encoding Subranges Conjuncts Kinzi Medial Consonants Asat Contractions Great sa Tall aa Ordering of Syllable Components Table 16-3. Modern Burmese Syllabic Structure Spacing. Myanmar Extended-A: U+AA60–U+AA7F Khamti Shan Consonants Vowels Tones Table 16-4. Khamti Shan Tone Marks Digits Other Symbols Subjoined Characters Historical Khamti Shan Aiton and Phake Consonants Subjoined Consonants Vowels Ligatures Tones Myanmar Extended-B: U+A9E0–U+A9FF 16.4 Khmer Khmer: U+1780–U+17FF Principles of the Khmer Script Glottal Consonant Table 16-5. Independent Khmer Vowel Characters Subscript Consonants Subscript Independent Vowel Signs Consonant Registers Table 16-6. Two Registers of Khmer Consonants Encoding Principles Subscript Consonant Signs Table 16-7. Khmer Subscript Consonant Signs Dependent Vowel Signs Table 16-8. Khmer Composite Dependent Vowel Signs with Nikahit Independent Vowel Characters Subscript Independent Vowel Signs Table 16-9. Khmer Subscript Independent Vowel Signs Other Signs as Syllabic Components Ligatures Figure 16-1. Common Ligatures in Khmer Multiple Glyphs Figure 16-2. Common Multiple Forms in Khmer Characters Whose Use Is Discouraged Ordering of Syllable Components. Figure 16-3. Examples of Syllabic Order in Khmer Consonant Shifters Ligature Control Figure 16-4. Ligation in Muul Style in Khmer Spacing. Khmer Symbols: U+19E0–U+19FF Symbols 16.5 Tai Le Table 16-10. Tai Le Tone Marks Digits. Table 16-11. Myanmar Digits in Tai Le Punctuation. 16.6 New Tai Lue Structure Visual Order Two-Part Vowels Table 16-12. New Tai Lue Vowel Placement Final Consonants Tones Table 16-13. New Tai Lue Registers and Tones Digits 16.7 Tai Tham Consonants Independent Vowels Dependent Consonant Signs Dependent Vowel Signs Tone Marks Other Combining Marks Digits Punctuation Collating Order Line Breaking 16.8 Tai Viet Structure Visual Order Tone Classes and Tone Marks Final Consonants Symbols and Punctuation Table 16-14. Tai Viet Symbols and Punctuation Word Spacing Collating Order 16.9 Kayah Li Structure Vowels Tones Digits Punctuation 16.10 Cham Structure Independent Vowel Letters Consonants Ordering of Syllable Components Table 16-15. Cham Syllabic Structure Digits Punctuation Line Breaking 16.11 Pahawh Hmong Character Names Structure Figure 16-5. Pahawh Hmong Syllable Structure Vowels Consonants Combining Marks Punctuation and Other Symbols Digits and Numbers Logographs 16.12 Pau Cin Hau Structure Digits Punctuation 17 Indonesia and Oceania 17.1 Philippine Scripts Tagalog: U+1700–U+171F Hanunóo: U+1720–U+173F Buhid: U+1740–U+175F Tagbanwa: U+1760–U+177F Principles of the Philippine Scripts Consonant Letters. Independent Vowel Letters. Dependent Vowel Signs. Virama. Directionality. Rendering. Table 17-1. Hanunóo and Buhid Vowel Sign Combinations Punctuation. 17.2 Buginese Repertoire Structure Ligature Figure 17-1. Buginese Ligature Order Punctuation Numerals 17.3 Balinese Structure Table 17-2. Balinese Base Consonants and Conjunct Forms Table 17-3. Sasak Extensions for Balinese Behavior of ra Figure 17-2. Writing dharma in Balinese Behavior of ra repa Rendering Table 17-4. Balinese Consonant Clusters with u and u: Nukta Ordering Punctuation Hyphenation Musical Symbols Modre Symbols 17.4 Javanese Consonants Independent Vowels Dependent Vowels Figure 17-3. Representation of Javanese Two-Part Vowels Consonant Signs Rendering Digits Punctuation Reduplication Ordering of Syllable Components Line Breaking 17.5 Rejang Structure Rendering Ordering Digits Punctuation 17.6 Batak Structure Rendering Punctuation Line Breaking 17.7 Sundanese Sundanese: U+1B80–U+1BBF Structure Medials Final Consonants Combining Marks Historic Characters Additional Consonants Digits Punctuation Ordering Ordering of Syllable Components Table 17-5. Modern Sundanese Syllabic Structure Rendering Sundanese Supplement: U+1CC0–U+1CCF 18 East Asia 18.1 Han CJK Unified Ideographs Blocks Containing Han Ideographs Table 18-1. Blocks Containing Han Ideographs Table 18-2. Small Extensions to the URO IICore General Characteristics of Han Ideographs Table 18-3. Common Han Characters Terminology Distinguishing Han Character Usage Between Languages Figure 18-1. Han Spelling Figure 18-2. Semantic Context for Han Characters Simplified and Traditional Chinese Dialects and Early Forms of Chinese Sorting Han Ideographs. Character Glyphs Principles of Han Unification Three-Dimensional Conceptual Model Figure 18-3. Three-Dimensional Conceptual Model Unification Rules Figure 18-4. CJK Source Separation Table 18-4. Source Encoding for Sword Variants Figure 18-5. Not Cognates, Not Unified Abstract Shape Two-Level Classification Ideographic Component Structure Figure 18-6. Ideographic Component Structure Figure 18-7. The Most Superior Node of an Ideographic Component Ideograph Features Uniqueness or Unification Spatial Positioning Examples Table 18-5. Ideographs Not Unified Table 18-6. Ideographs Unified Han Ideograph Arrangement Table 18-7. Han Ideograph Arrangement Radical-Stroke Indices Mappings for Han Ideographs CJK Unified Ideographs Extension B: U+20000–U+2A6D6 CJK Unified Ideographs Extension C: U+2A700–U+2B734 CJK Unified Ideographs Extension D: U+2B740–U+2B81D CJK Unified Ideographs Extension E: U+2B820–U+2CEA1 CJK Compatibility Ideographs: U+F900–U+FAFF CJK Compatibility Supplement: U+2F800–U+2FA1D Kanbun: U+3190–U+319F Symbols Derived from Han Ideographs CJK and KangXi Radicals: U+2E80–U+2FD5 Standards. Semantics. CJK Additions from HKSCS and GB 18030 CJK Strokes: U+31C0–U+31EF 18.2 Ideographic Description Characters Applicability to Other Scripts Ideographic Description Sequences Figure 18-8. Using the Ideographic Description Characters Equivalence. Interaction with the Ideographic Variation Mark. Rendering. Character Boundaries. Standards. 18.3 Bopomofo Standards Mandarin Tone Marks Table 18-8. Mandarin Tone Marks Standard Mandarin Bopomofo Extended Bopomofo. Extended Bopomofo Tone Marks. Table 18-9. Minnan and Hakka Tone Marks Rendering of Bopomofo. 18.4 Hiragana and Katakana Hiragana: U+3040–U+309F Standards Combining Marks Iteration Marks Vertical Text Digraph Katakana: U+30A0–U+30FF Standards Punctuation-like Characters Vertical Text Digraph Katakana Phonetic Extensions: U+31F0–U+31FF Standards Kana Supplement U+1B000–U+1B0FF Figure 18-9. Japanese Historic Kana for e and ye 18.5 Halfwidth and Fullwidth Forms Unifications 18.6 Hangul Hangul Jamo: U+1100–U+11FF Hangul Jamo Extended-A: U+A960–U+A97F Hangul Jamo Extended-B: U+D7B0–U+D7FF Hangul Compatibility Jamo: U+3130–U+318F Standards Normalization Table 18-10. Separating Jamo Characters Hangul Syllables: U+AC00–U+D7A3 Standards Equivalence Hangul Syllable Composition Hangul Syllable Decomposition Hangul Syllable Name Hangul Syllable Representative Glyph Table 18-11. Line-Based Placement of Jungseong Collation 18.7 Yi Traditional Yi Script Standardized Yi Script Standards Naming Conventions and Order Yi Syllable Iteration Mark Punctuation Rendering Yi Radicals 18.8 Lisu Structure Tone Letters Table 18-12. Lisu Tone Letters Other Modifier Letters Digits and Separators Punctuation Table 18-13. Punctuation Adopted in Lisu Orthography Line Breaking Word Separation 18.9 Miao Encoding Principles Tone Marks Rendering of “wart” Ordering Digits Punctuation 18.10 Tangut Tangut: U+17000–U+187FF Structure Encoding Principles Character Names Punctuation Sources Sorting Stroke Order Tangut Components: U+18800–U+18AFF Repertoire Names Order Radical-Stroke Values 19 Africa 19.1 Ethiopic Ethiopic: U+1200–U+137F Basic and Extended Ethiopic. Encoding Principles. Variant Glyph Forms. Labialized Subseries. Table 19-1. Labialized Forms in Ethiopic -WAA Table 19-2. Labialized Forms in Ethiopic -WE Keyboard Input. Syllable Names. Encoding Order and Sorting. Word Separators. Section Mark Diacritical Marks. Numbers. Ethiopic Extensions 19.2 Osmanya Structure Ordering Character Names and Glyphs 19.3 Tifinagh History Source Standards Ordering Directionality Diacritical Marks. Contextual Shaping Figure 19-1. Tifinagh Contextual Shaping Bi-Consonants Figure 19-2. Tifinagh Consonant Joiner and Bi-consonants 19.4 N’Ko Character Names and Block Name Structure Diacritical Marks Table 19-3. N’Ko Diacritic Usage Table 19-4. N’Ko Tone Diacritics on Vowels Digits Ordinal Numbers Figure 19-3. Examples of N’Ko Ordinals Punctuation Ordering Rendering Table 19-5. N’Ko Letter Shaping 19.5 Vai Sources Basic Structure Historic Syllables Logograms Digits Punctuation Segmentation Ordering 19.6 Bamum Bamum: U+A6A0–U+A6FF Structure Diacritical Marks Punctuation Digits Bamum Supplement: U+16800–U+16A3F 19.7 Bassa Vah Structure Punctuation and Digits 19.8 Mende Kikakui Structure Directionality Numbers Table 19-6. Number Formation in Mende Kikakui 19.9 Adlam Structure Diacritical Marks Line Breaking Numbers Punctuation Cursive Joining 20 Americas 20.1 Cherokee Structure Casing Tones. Input Numbers. Punctuation. Standards. 20.2 Canadian Aboriginal Syllabics Canadian Aboriginal Syllabics: U+1400–U+167F Organization Arrangement Extensions Punctuation and Symbols Canadian Aboriginal Syllabics Extended: U+18B0–U+18FF 20.3 Osage Structure Casing Vowels Table 20-1. Combining Marks used in Osage Numbers and Punctuation 20.4 Deseret Letter Names and Shapes. Structure. Sorting. Typographic Conventions. Figure 20-1. Short Words Equivalent to Deseret Letter Names Phonetics. Table 20-2. IPA Transcription of Deseret 21 Notational Systems 21.1 Braille Example Usage Model. Imaging. Script 21.2 Western Musical Symbols Glyphs Symbols in Other Blocks Processing. Input Methods. Directionality. Figure 21-1. Examples of Specialized Music Layout Format Characters. Precomposed Note Characters. Figure 21-2. Precomposed Note Characters Alternative Noteheads. Figure 21-3. Alternative Noteheads Augmentation Dots and Articulation Symbols. Figure 21-4. Augmentation Dots and Articulation Symbols Ornamentation. Table 21-1. Examples of Ornamentation Gregorian Kievan 21.3 Byzantine Musical Symbols Processing. 21.4 Ancient Greek Musical Notation Unification Table 21-2. Representation of Ancient Greek Vocal and Instrumental Notation Naming Conventions Font Combining Marks 21.5 Duployan Structure Shorthand Format Controls: U+1BCA0–U+1BCAF 21.6 Sutton SignWriting Structure Repertoire Modifiers Punctuation 22 Symbols 22.1 Currency Symbols Unification Figure 22-1. Alternative Glyphs for Dollar Sign Fonts. Table 22-1. Currency Symbols Encoded in Other Blocks Lira Sign Dollar and Peso Yen and Yuan Euro Sign Indian Rupee Sign Turkish Lira Sign Ruble Sign Lari Sign Other Currency Symbols 22.2 Letterlike Symbols Letterlike Symbols: U+2100–U+214F Numero Sign Figure 22-2. Alternative Glyphs for Numero Sign Unit Symbols Compatibility Styles Standards Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF Words Used as Variables. Mathematical Alphabets Basic Set of Alphanumeric Characters. Additional Characters. Dotless Characters Figure 22-3. Wide Mathematical Accents Semantic Distinctions. Figure 22-4. Style Variants and Semantic Distinctions in Mathematics Mathematical Alphabets. Table 22-2. Mathematical Alphanumeric Symbols Compatibility Decompositions. Fonts Used for Mathematical Alphabets Fraktur Math Italics Figure 22-5. Easily Confused Shapes for Mathematical Glyphs Hard-to-Distinguish Letters. Font Support for Combining Diacritics. Type Style for Script Characters. Double-Struck Characters. Arabic Mathematical Alphabetic Symbols: U+1EE00–U+1EEFF Shaping Large Operators Properties 22.3 Numerals Encoding Principles Decimal Digits Table 22-3. Script-Specific Decimal Digits Exceptions CJK Ideographs Used as Decimal Digits Figure 22-6. CJK Ideographic Numbers Other Digits Hexadecimal Digits Compatibility Digits Table 22-4. Compatibility Digits Parsing of Superscript and Subscript Digits Numeric Bullets Glyph Variants of Decimal Digits Figure 22-7. Regular and Old Style Digits Accounting Numbers Non-Decimal Radix Systems Ethiopic Numerals Cuneiform Numerals Other Ancient Numeral Systems Acrophonic Systems and Other Letter-based Numbers Roman Numerals Greek Numerals Coptic Epact Numbers: U+102E0–U+102FF Rumi Numeral Symbols: U+10E60–U+10E7E CJK Numerals CJK Ideographic Traditional Numerals Chinese Counting-Rod Numerals Suzhou-Style Numerals Fractions Figure 22-8. Alternate Forms of Vulgar Fractions Common Indic Number Forms: U+A830–U+A83F 22.4 Superscript and Subscript Symbols Superscripts and Subscripts: U+2070–U+209F Parsing of Superscript and Subscript Digits Standards Superscripts and Subscripts in Other Blocks 22.5 Mathematical Symbols Semantics. Mathematical Property Mathematical Operators: U+2200–U+22FF Standards Encoding Principles Unifications Disunifications Table 22-5. Mathematical Operators Disunified from Punctuation Greek-Derived Symbols N-ary Operators Invisible Operators Minus Sign Delimiters Bidirectional Layout Other Elements of Mathematical Notation Supplements to Mathematical Symbols and Arrows Standards. Supplemental Mathematical Operators: U+2A00–U+2AFF Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF Mathematical Brackets. Long Division Fractional Slash and Other Diagonals Miscellaneous Mathematical Symbols-B: U+2980–U+29FF Wiggly Fence. Miscellaneous Symbols and Arrows: U+2B00–U+2B7F Arrows: U+2190–U+21FF Bidirectional Layout Standards Unifications Supplemental Arrows Long Arrows. Standardized Variants of Mathematical Symbols Change in Representative Glyphs for U+2278 and U+2279 22.6 Invisible Mathematical Operators Invisible Separator Invisible Multiplication Invisible Plus Invisible Function Application 22.7 Technical Symbols Control Pictures: U+2400–U+243F Code Points for Pictures for Control Codes Pictures for ASCII Space Standards Miscellaneous Technical: U+2300–U+23FF Keytop Labels. Floor and Ceiling Crops and Quine Corners Figure 22-9. Usage of Crops and Quine Corners Angle Brackets. APL Functional Symbols Symbol Pieces. Table 22-6. Use of Mathematical Symbol Pieces Horizontal Brackets Terminal Graphics Characters. Decimal Exponent Symbol Figure 22-10. Usage of the Decimal Exponent Symbol Dental Symbols. Metrical Symbols Electrotechnical Symbols User Interface Symbols Standards. Optical Character Recognition: U+2440–U+245F Standards 22.8 Geometrical Symbols Box Drawing and Block Elements Box Drawing Block Elements Standards Geometric Shapes: U+25A0–U+25FF Hatched Squares Lozenge Use in Mathematics Standards Geometric Shapes Extended: U+1F780–U+1F7FF Table 22-7. Geometric Shape Collections 22.9 Miscellaneous Symbols Rendering of Emoji Symbols Color Words in Unicode Character Names Miscellaneous Symbols and Pictographs Standards Weather Symbols Traffic Signs Dictionary and Map Symbols Plastic Bottle Material Code System. Recycling Symbol for Generic Materials. Universal Recycling Symbol. Paper Recycling Symbols. Gender Symbols Genealogical Symbols Game Symbols Animal Symbols Cultural Symbols Hand Symbols Emoji Modifiers Miscellaneous Symbols in Other Blocks Emoticons: U+1F600–U+1F64F Transport and Map Symbols: U+1F680–U+1F6FF Dingbats: U+2700–U+27BF Unifications and Additions. Ornamental Brackets. Ornamental Dingbats: U+1F650–U+1F67F Alchemical Symbols: U+1F700–U+1F77F Mahjong Tiles: U+1F000–U+1F02F Domino Tiles: U+1F030–U+1F09F Playing Cards: U+1F0A0–U+1F0FF Yijing Hexagram Symbols: U+4DC0–U+4DFF Tai Xuan Jing Symbols: U+1D300–U+1D356 Monograms Digrams Tetragrams Ancient Symbols: U+10190–U+101CF Phaistos Disc Symbols: U+101D0–U+101FF Directionality 22.10 Enclosed and Square Enclosed Symbols Square Symbols Source Standards Allocation Decomposition Casing Enclosed Alphanumerics: U+2460–U+24FF Enclosed CJK Letters and Months: U+3200–U+32FF CJK Compatibility: U+3300–U+33FF Japanese Era Names Table 22-8. Japanese Era Names Enclosed Alphanumeric Supplement: U+1F100–U+1F1FF Regional Indicator Symbols Enclosed Ideographic Supplement: U+1F200–U+1F2FF 23 Special Areas and Format Characters 23.1 Control Codes Representing Control Sequences Escape Sequences Specification of Control Code Semantics Table 23-1. Control Codes Specified in the Unicode Standard Newline Function 23.2 Layout Controls Line and Word Breaking No-Break Space Word Joiner Zero Width No-Break Space Zero Width Space Table 23-2. Letter Spacing Zero-Width Spaces and Joiner Characters Hyphenation. Line and Paragraph Separator Cursive Connection and Ligatures Joiner Non-joiner Cursive Connection Figure 23-1. Prevention of Joining Figure 23-2. Exhibition of Joining Glyphs in Isolation Examples. Figure 23-3. Effect of Intervening Joiners Transparency Joiner and Non-joiner in Indic Scripts Implementation Notes. Filtering Joiner and Non-joiner Combining Grapheme Joiner Blocking Reordering CGJ and Collation Rendering CGJ and Joiner Characters Bidirectional Ordering Controls Table 23-3. Bidirectional Ordering Controls Stateful Format Controls Table 23-4. Paired Stateful Controls Table 23-5. Paired Stateful Controls (Deprecated) 23.3 Deprecated Format Characters Symmetric Swapping Character Shaping Selectors Numeric Shape Selectors 23.4 Variation Selectors Variation Sequence CJK Compatibility Ideographs Representative Glyphs for Variants Mongolian 23.5 Private-Use Characters Properties. Normalization. Private Use Area: U+E000–U+F8FF Encoding Structure. Corporate Use Subarea. End-User Subarea. Allocation of Subareas. Supplementary Private Use Areas Encoding Structure. 23.6 Surrogates Area High-Surrogate Low-Surrogate Private-Use High-Surrogates 23.7 Noncharacters U+FFFF and U+10FFFF U+FFFE 23.8 Specials Byte Order Mark (BOM): U+FEFF Table 23-6. Unicode Encoding Scheme Signatures Table 23-7. U+FEFF Signature in Other Charsets Specials: U+FFF0–U+FFF8 Annotation Characters: U+FFF9–U+FFFB Figure 23-4. Annotation Characters Conformance Use in Plain Text Lexical Restrictions Formatting Input Collation Bidirectional Text Replacement Characters: U+FFFC–U+FFFD U+FFFC U+FFFD 23.9 Tag Characters Tag Characters: U+E0000–U+E007F Deprecated Use for Language Tagging Syntax for Embedding Tags Tag Identification. Tag Termination. Language Tags. Tag Scope and Nesting. Figure 23-5. Tag Characters Canceling Tag Values. Working with Language Tags Avoiding Language Tags. Higher-Level Protocols. Effect of Tags on Interpretation of Text. Display. Processing. Range Checking for Tag Characters. Editing and Modification. Dangers of Incomplete Support. Unicode Conformance Issues Formal Tag Syntax 24 About the Code Charts 24.1 Character Names List Images in the Code Charts and Character Lists Fonts Alternative Forms Orientation Special Characters and Code Points Combining Characters Dashed Box Convention Reserved Characters Noncharacters Deprecated Characters Character Names Informative Aliases Unicode 1.0 Names Jamo Short Names Normative Aliases Cross References Explicit Inequality Related Functions Related Names Transliteration Blind Cross References Information About Languages Case Mappings Decompositions Standardized Variation Sequences Subheads 24.2 CJK Ideographs CJK Unified Ideographs Table 24-1. IRG Sources Chart for the Main CJK Block Figure 24-1. CJK Chart Format for the Main CJK Block Charts for CJK Extensions Figure 24-2. CJK Chart Format for CJK Extension A Figure 24-3. CJK Chart Format for CJK Extension B Compatibility Ideographs Figure 24-4. CJK Chart Format for Compatibility Ideographs Figure 24-5. Annotations Identifying CJK Unifed Ideographs 24.3 Hangul Syllables A Notational Conventions Code Points Character Names Character Blocks Sequences Rendering Figure A-1. Example of Rendering Properties and Property Values Miscellaneous Extended BNF Table A-1. Extended BNF Character Classes Table A-2. Character Class Examples Operators Table A-3. Operators B Unicode Publications and Resources B.1 The Unicode Consortium The Unicode Technical Committee Other Activities B.2 Unicode Publications B.3 Unicode Technical Standards UTS #6: A Standard Compression Scheme for Unicode UTS #10: Unicode Collation Algorithm UTS #18: Unicode Regular Expressions UTS #22: Character Mapping Markup Language (CharMapML) UTS #35: Unicode Locale Data Markup Language (LDML) UTS #37: Unicode Ideographic Variation Database UTS #39: Unicode Security Mechanisms UTS #46: Unicode IDNA Compatibility Processing B.4 Unicode Technical Reports UTR #16: UTF-EBCDIC UTR #17: Unicode Character Encoding Model UTR #23: The Unicode Character Property Model UTR #25: Unicode Support for Mathematics UTR #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) UTR #33: Unicode Conformance Model UTR #36: Unicode Security Considerations UTR #50: Unicode Vertical Text Layout UTR #51: Unicode Emoji B.5 Unicode Technical Notes B.6 Other Unicode Online Resources Unicode Online Resources Unicode Website Unicode Anonymous FTP Site Charts Character Index Conferences E-mail Discussion List Emoji Emoji Charts FAQ (Frequently Asked Questions) Glossary Online Unicode Character Database Online Unihan Database Policies Unicode Common Locale Data Repository (CLDR) Updates and Errata Versions Where Is My Character? How to Contact the Unicode Consortium C Relationship to ISO/IEC 10646 C.1 History Table C-1. Timeline C.2 Encoding Forms in ISO/IEC 10646 UCS-4 UCS-2 Zero Extending Table C-2. Zero Extending C.3 UTF-8 and UTF-16 UTF-8 UTF-16 C.4 Synchronization of the Standards C.5 Identification of Features for Unicode C.6 Character Names C.7 Character Functional Specifications D Version History of the Standard Table D-1. Versions of Unicode and ISO/IEC 10646 Table D-2. Allocation of Code Points by Type (Versions 1.0.0 to 3.0) Table D-3. Allocation of Code Points by Type (Versions 3.1 to 5.1) Table D-4. Allocation of Code Points by Type (Versions 5.2 to 7.0) Table D-5. Allocation of Code Points by Type (Versions 8.0 to 9.0) E Han Unification History E.1 Development of the URO E.2 Ideographic Rapporteur Group E.3 CJK Sources Table E-1. G Source Documentation Table E-2. H Source Documentation Table E-3. M Source Documentation Table E-4. T Source Documentation Table E-5. J Source Documentation Table E-6. K Source Documentation Table E-7. KP Source Documentation Table E-8. V Source Documentation Table E-9. U Source Documentation Omission of Repertoire for Some Sources F Documentation of CJK Strokes Table F-1. CJK Strokes References R.1 Source Standards and Specifications R.2 Source Dictionaries for Han R.3 Other Script Sources R.4 Selected Resources: Technical R.5 Selected Resources: Other I Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
This page contains hyperlinks to The Unicode Standard, Version 9.0. The Unicode 9.0.0 page lists the contents with links to each PDF file.