Unicode 11.0 Core Specification Bookmarks This page contains links to sections, tables, and figures of the core specification for The Unicode Standard, Version 11.0. See Unicode 11.0.0 for full context about the Unicode Standard. Preface Why Unicode? What’s New? Organization of This Standard The Unicode Character Database Unicode Code Charts Unicode Standard Annexes Unicode Technical Standards and Unicode Technical Reports Updates and Errata Acknowledgements 1 Introduction Figure 1-1. Wide ASCII 1.1 Coverage Standards Coverage New Characters 1.2 Design Goals Figure 1-2. Unicode Compared to the 2022 Framework 1.3 Text Handling Characters and Glyphs Text Elements 2 General Structure 2.1 Architectural Context Basic Text Processes Text Elements, Characters, and Text Processes Figure 2-1. Text Elements and Characters Text Processes and Encoding 2.2 Unicode Design Principles Table 2-1. The 10 Unicode Design Principles Universality Efficiency Characters, Not Glyphs Figure 2-2. Characters Versus Glyphs Table 2-2. User-Perceived Characters with Multiple Code Points Figure 2-3. Unicode Character Code to Rendered Glyphs Semantics Plain Text Logical Order Figure 2-4. Bidirectional Ordering Figure 2-5. Writing Direction and Numbers Unification Figure 2-6. Typeface Variation for the Bone Character Dynamic Composition Figure 2-7. Dynamic Composition Stability Convertibility 2.3 Compatibility Characters Compatibility Variants Compatibility Decomposable Characters 2.4 Code Points and Characters Figure 2-8. Abstract and Encoded Characters Types of Code Points Table 2-3. Types of Code Points 2.5 Encoding Forms Figure 2-9. Overlap in Legacy Mixed-Width Encodings Figure 2-10. Boundaries and Interpretation Figure 2-11. Unicode Encoding Forms UTF-32 UTF-16 UTF-8 Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 2.6 Encoding Schemes Table 2-4. The Seven Unicode Encoding Schemes Figure 2-12. Unicode Encoding Schemes 2.7 Unicode Strings 2.8 Unicode Allocation Planes Allocation Areas and Blocks Assignment of Code Points 2.9 Details of Allocation Figure 2-13. Unicode Allocation Plane 0 (BMP) Figure 2-14. Allocation on the BMP Plane 1 (SMP) Figure 2-15. Allocation on Plane 1 Plane 2 (SIP) Other Planes 2.10 Writing Direction Figure 2-16. Writing Directions 2.11 Combining Characters Figure 2-17. Combining Enclosing Marks for Symbols Sequence of Base Characters and Diacritics Figure 2-18. Sequence of Base Characters and Diacritics Figure 2-19. Reordered Indic Vowel Signs Figure 2-20. Properties and Combining Character Sequences Multiple Combining Characters Figure 2-21. Stacking Sequences Table 2-5. Interaction of Combining Characters Table 2-6. Nondefault Stacking Ligated Multiple Base Characters Figure 2-22. Ligated Multiple Base Characters Exhibiting Nonspacing Marks in Isolation “Characters” and Grapheme Clusters 2.12 Equivalent Sequences Figure 2-23. Equivalent Sequences Normalization Figure 2-24. Canonical Ordering Decompositions Figure 2-25. Types of Decomposables Non-decomposition of Certain Diacritics 2.13 Special Characters Special Noncharacter Code Points Byte Order Mark (BOM) Layout and Format Control Characters The Replacement Character Control Codes 2.14 Conforming to the Unicode Standard Characteristics of Conformant Implementations Unacceptable Behavior Acceptable Behavior Supported Subsets 3 Conformance 3.1 Versions of the Unicode Standard Stability Version Numbering Errata and Corrigenda References to the Unicode Standard Precision in Version Citation References to Unicode Character Properties References to Unicode Algorithms 3.2 Conformance Requirements Code Points Unassigned to Abstract Characters Interpretation Modification Character Encoding Forms Character Encoding Schemes Bidirectional Text Normalization Forms Normative References Unicode Algorithms Default Casing Algorithms Unicode Standard Annexes 3.3 Semantics Definitions Character Identity and Semantics 3.4 Characters and Encoding Table 3-1. Named Unicode Algorithms 3.5 Properties Types of Properties Property Values Default Property Values Classification of Properties by Their Values Property Status Table 3-2. Normative Character Properties Table 3-3. Informative Character Properties Context Dependence Stability of Properties Simple and Derived Properties Property Aliases Private Use 3.6 Combination Combining Character Sequences Grapheme Clusters Application of Combining Marks Figure 3-1. Enclosing Marks 3.7 Decomposition Compatibility Decomposition Canonical Decomposition 3.8 Surrogates 3.9 Unicode Encoding Forms Table 3-4. Examples of Unicode Encoding Forms UTF-32 UTF-16 Table 3-5. UTF-16 Bit Distribution UTF-8 Table 3-6. UTF-8 Bit Distribution Table 3-7. Well-Formed UTF-8 Byte Sequences Encoding Form Conversion Constraints on Conversion Processes U+FFFD Substitution of Maximal Subparts Table 3-8. U+FFFD for Non-Shortest Form Sequences Table 3-9. U+FFFD for Ill-Formed Sequences for Surrogates Table 3-10. U+FFFD for Other Ill-Formed Sequences Table 3-11. U+FFFD for Truncated Sequences 3.10 Unicode Encoding Schemes Table 3-12. Summary of UTF-16BE, UTF-16LE, and UTF-16 Table 3-13. Summary of UTF-32BE, UTF-32LE, and UTF-32 3.11 Normalization Forms Normalization Stability Combining Classes Specification of Unicode Normalization Forms Starters Table 3-14. Combining Marks and Starter Status Canonical Ordering Algorithm Table 3-15. Reorderable Pairs Canonical Composition Algorithm Definition of Normalization Forms 3.12 Conjoining Jamo Behavior Definitions Hangul Syllable Decomposition Table 3-16. Hangul Characters Used in Examples Hangul Syllable Composition Hangul Syllable Name Generation Sample Code for Hangul Algorithms 3.13 Default Case Algorithms Definitions Table 3-17. Context Specification for Casing Default Case Conversion Default Case Folding Default Case Detection Table 3-18. Case Detection Examples Default Caseless Matching 4 Character Properties 4.1 Unicode Character Database 4.2 Case Definitions of Case and Casing Table 4-1. Relationship of Casing Definitions Table 4-2. Case Function Values for Strings Case Mapping Table 4-3. Sources for Case Mapping Information 4.3 Combining Classes Figure 4-1. Positions of Common Combining Marks Reordrant, Split, and Subjoined Combining Marks 4.4 Directionality 4.5 General Category Table 4-4. General Category 4.6 Numeric Value Ideographic Numeric Values Table 4-5. Primary Numeric Ideographs Table 4-6. Ideographs Used as Accounting Numbers 4.7 Bidi Mirrored 4.8 Name Table 4-7. Types of Character Name Aliases Unicode Name Property Table 4-8. Name Derivation Rule Prefix Strings Code Point Labels Table 4-9. Construction of Code Point Labels Use of Character Names in APIs and User Interfaces 4.9 Unicode 1.0 Names 4.10 Letters, Alphabetic, and Ideographic 4.11 Properties for Text Boundaries 4.12 Characters with Unusual Properties Table 4-10. Unusual Properties 5 Implementation Guidelines 5.1 Data Structures for Character Conversion Issues Multistage Tables Figure 5-1. Two-Stage Tables 5.2 Programming Languages and Data Types Unicode Data Types for C 5.3 Unknown and Missing Characters 5.4 Handling Surrogate Pairs in UTF-16 5.5 Handling Numbers 5.6 Normalization Figure 5-2. Normalization 5.7 Compression 5.8 Newline Guidelines Definitions Table 5-1. Hex Values for Acronyms Table 5-2. NLF Platform Correlations Line Separator and Paragraph Separator Recommendations 5.9 Regular Expressions 5.10 Language Information in Plain Text Requirements for Language Tagging Language Tags and Han Unification 5.11 Editing and Selection Consistent Text Elements Figure 5-3. Consistent Character Boundaries 5.12 Strategies for Handling Nonspacing Marks Keyboard Input Figure 5-4. Dead Keys Versus Handwriting Sequence Truncation Figure 5-5. Truncating Grapheme Clusters 5.13 Rendering Nonspacing Marks Figure 5-6. Inside-Out Rule Figure 5-7. Fallback Rendering Figure 5-8. Bidirectional Placement Figure 5-9. Justification Canonical Equivalence Table 5-3. Typing Order Differing from Canonical Order Table 5-4. Permuting Combining Class Weights Positioning Methods Figure 5-10. Positioning with Ligatures Figure 5-11. Positioning with Contextual Forms Figure 5-12. Positioning with Enhanced Kerning 5.14 Locating Text Element Boundaries 5.15 Identifiers 5.16 Sorting and Searching Culturally Expected Sorting and Searching Language-Insensitive Sorting Searching Sublinear Searching Figure 5-13. Sublinear Searching 5.17 Binary Order UTF-8 in UTF-16 Order UTF-16 in UTF-8 Order 5.18 Case Mappings Titlecasing Complications for Case Mapping Figure 5-14. Uppercase Mapping for Turkish I Figure 5-15. Lowercase Mapping for Turkish I Figure 5-16. Casing of German Sharp S Reversibility Caseless Matching Normalization and Casing Table 5-5. Casing and Normalization in Strings 5.19 Mapping Compatibility Variants 5.20 Unicode Security 5.21 Ignoring Characters in Processing Characters Ignored in Text Segmentation Characters Ignored in Line Breaking Characters Ignored in Cursive Joining Characters Ignored in Identifiers Characters Ignored in Searching and Sorting Characters Ignored for Display 5.22 U+FFFD Substitution in Conversion 6 Writing Systems and Punctuation 6.1 Writing Systems Figure 6-1. Overriding Inherent Vowels Table 6-1. Typology of Scripts in the Unicode Standard 6.2 General Punctuation Figure 6-2. Forms of CJK Punctuation Blocks Devoted to Punctuation Format Control Characters Space Characters Table 6-2. Unicode Space Characters Dashes and Hyphens Table 6-3. Unicode Dash Characters Paired Punctuation Language-Based Usage of Quotation Marks Figure 6-3. European Quotation Marks Table 6-4. Models of Visual Relationship between Quote Glyphs Table 6-5. East Asian Quotation Marks Figure 6-4. Asian Quotation Marks Table 6-6. Opening and Closing Forms Apostrophes Other Punctuation Table 6-7. Names for the @ Archaic Punctuation and Editorial Marks Figure 6-5. Examples of Ancient Greek Editorial Marks Figure 6-6. Use of Greek Paragraphos Indic Punctuation Table 6-8. Unicode Danda Characters CJK Punctuation Figure 6-7. CJK Parentheses Unknown or Unavailable Ideographs CJK Compatibility Forms 7 Europe-I 7.1 Latin Figure 7-1. Alternative Glyphs in Latin Table 7-1. Preferred Rendering of Cedilla versus Comma Below Figure 7-2. Diacritics on i and j Figure 7-3. Vietnamese Letters and Tone Marks Letters of Basic Latin: U+0041–U+007A Letters of the Latin-1 Supplement: U+00C0–U+00FF Latin Extended-A: U+0100–U+017F Latin Extended-B: U+0180–U+024F IPA Extensions: U+0250–U+02AF Phonetic Extensions: U+1D00–U+1DBF Latin Extended Additional: U+1E00–U+1EFF Latin Extended-C: U+2C60–U+2C7F Latin Extended-D: U+A720–U+A7FF Latin Extended-E: U+AB30–U+AB6F Latin Ligatures: U+FB00–U+FB06 7.2 Greek Greek: U+0370–U+03FF Table 7-2. Nonspacing Marks Used with Greek Figure 7-4. Variations in Greek Capital Letter Upsilon Greek Extended: U+1F00–U+1FFF Table 7-3. Greek Spacing and Nonspacing Pairs Ancient Greek Numbers: U+10140–U+1018F 7.3 Coptic Figure 7-5. Coptic Numerals 7.4 Cyrillic Cyrillic: U+0400–U+04FF Cyrillic Supplement: U+0500–U+052F Cyrillic Extended-A: U+2DE0–U+2DFF Figure 7-6. Combination of Titlo Letters Cyrillic Extended-B: U+A640–U+A69F Cyrillic Extended-C: U+1C80–U+1C8F 7.5 Glagolitic Glagolitic: U+2C00–U+2C5F Glagolitic Supplement: U+1E000–U+1E02F 7.6 Armenian 7.7 Georgian Georgian: U+10A0–U+10FF Georgian Extended: U+1C90–U+1CBF Georgian Supplement: U+2D00–U+2D2F Figure 7-7. Georgian Scripts and Casing 7.8 Modifier Letters Spacing Modifier Letters: U+02B0–U+02FF Figure 7-8. Tone Letters Modifier Tone Letters: U+A700–U+A71F 7.9 Combining Marks Figure 7-9. Double Diacritics Figure 7-10. Positioning of Double Diacritics Figure 7-11. Use of CGJ with Double Diacritics Figure 7-12. Interaction of Combining Marks with Ligatures Combining Diacritical Marks: U+0300–U+036F Combining Diacritical Marks Extended: U+1AB0–U+1AFF Figure 7-13. Positioning of Combining Parentheses Combining Diacritical Marks Supplement: U+1DC0–U+1DFF Table 7-4. Typicon Kavyka Symbols Combining Diacritical Marks for Symbols: U+20D0–U+20FF Figure 7-14. Use of Vertical Line Overlay for Negation Combining Half Marks: U+FE20–U+FE2F Figure 7-15. Double Diacritics and Half Marks Combining Marks in Other Blocks 8 Europe-II 8.1 Linear A 8.2 Linear B Linear B Syllabary: U+10000–U+1007F Linear B Ideograms: U+10080–U+100FF Aegean Numbers: U+10100–U+1013F 8.3 Cypriot Syllabary Table 8-1. Similar Characters in Linear B and Cypriot 8.4 Ancient Anatolian Alphabets Lycian: U+10280–U+1029F Carian: U+102A0–U+102DF Lydian: U+10920–U+1093F 8.5 Old Italic Figure 8-1. Distribution of Old Italic 8.6 Runic 8.7 Old Hungarian 8.8 Gothic 8.9 Elbasan 8.10 Caucasian Albanian 8.11 Old Permic Table 8-2. Combining Marks Used in Old Permic 8.12 Ogham 8.13 Shavian 9 Middle East-I 9.1 Hebrew Hebrew: U+0590–U+05FF Alphabetic Presentation Forms: U+FB1D–U+FB4F 9.2 Arabic Arabic: U+0600–U+06FF Figure 9-1. Directionality and Cursive Connection Figure 9-2. Using a Joiner Figure 9-3. Using a Non-joiner Figure 9-4. Combinations of Joiners and Non-joiners Figure 9-5. Placement of Harakat Table 9-1. Arabic Digit Names Table 9-2. Glyph Variation in Eastern Arabic-Indic Digits Figure 9-6. Arabic Year Sign Arabic Cursive Joining Table 9-3. Primary Arabic Joining Types Table 9-4. Derived Arabic Joining Types Table 9-5. Arabic Glyph Types Arabic Ligatures Table 9-6. Arabic Obligatory Ligature Joining Groups Table 9-7. Arabic Ligature Notation Arabic Joining Groups Table 9-8. Dual-Joining Arabic Characters Table 9-9. Right-Joining Arabic Characters Table 9-10. Forms of the Arabic Letter yeh Combining Hamza Table 9-11. Arabic Letters With Hamza Above Other Letters for Extended Arabic Arabic Supplement: U+0750–U+077F Arabic Extended-A: U+08A0–U+08FF Arabic Presentation Forms-A: U+FB50–U+FDFF Arabic Presentation Forms-B: U+FE70–U+FEFF 9.3 Syriac Syriac: U+0700–U+074F Figure 9-7. Syriac Abbreviation Figure 9-8. Use of SAM Table 9-12. Miscellaneous Syriac Diacritic Use Syriac Shaping Table 9-13. Syriac Final Alaph Glyph Types Table 9-14. Dual-Joining Syriac Characters Table 9-15. Right-Joining Syriac Characters Table 9-16. Syriac Alaph Glyph Forms Table 9-17. Syriac Ligatures Syriac Supplement: U+0860–U+086F 9.4 Samaritan Table 9-18. Samaritan Performative Punctuation Marks 9.5 Mandaic Table 9-19. Dual-Joining Mandaic Characters Table 9-20. Right-Joining Mandaic Characters 10 Middle East-II 10.1 Old North Arabian 10.2 Old South Arabian Table 10-1. Old South Arabian Numeric Characters Table 10-2. Number Formation in Old South Arabian 10.3 Phoenician 10.4 Imperial Aramaic Table 10-3. Number Formation in Aramaic 10.5 Manichaean Table 10-4. Dual-Joining Manichaean Letters Table 10-5. Right-Joining Manichaean Letters Table 10-6. Left-Joining Manichaean Letters Table 10-7. Non-Joining Manichaean Letters Table 10-8. Manichaean Ligatures 10.6 Pahlavi and Parthian Inscriptional Parthian: U+10B40–U+10B5F Inscriptional Pahlavi: U+10B60–U+10B7F Table 10-9. Inscriptional Parthian Shaping Behavior Psalter Pahlavi: U+10B80–U+10BAF 10.7 Avestan Table 10-10. Avestan Shaping Behavior 10.8 Nabataean 10.9 Palmyrene 10.10 Hatran 11 Cuneiform and Hieroglyphs 11.1 Sumero-Akkadian Cuneiform: U+12000–U+123FF Table 11-1. Cuneiform Script Usage Cuneiform Numbers and Punctuation: U+12400–U+1247F Early Dynastic Cuneiform: U+12480–U+1254F 11.2 Ugaritic 11.3 Old Persian 11.4 Egyptian Hieroglyphs Table 11-2. Hieroglyphic Character Sequence Figure 11-1. Interpretation of Hieroglyphic Markup 11.5 Meroitic 11.6 Anatolian Hieroglyphs 12 South and Central Asia-I 12.1 Devanagari Devanagari: U+0900–U+097F Principles of the Devanagari Script Table 12-1. Devanagari Vowel Letters Figure 12-1. Dead Consonants in Devanagari Table 12-2. Devanagari Atomic Consonants Figure 12-2. Conjunct Formations in Devanagari Figure 12-3. Multi-Consonant Conjuncts in Devanagari Table 12-3. Devanagari Consonant Conjuncts Figure 12-4. Preventing Conjunct Forms in Devanagari Figure 12-5. Half-Consonants in Devanagari Figure 12-6. Independent Half-Forms in Devanagari Figure 12-7. Half-Consonants in Oriya Figure 12-8. Consonant Forms in Devanagari and Oriya Rendering Devanagari Figure 12-9. Rendering Order in Devanagari Table 12-4. Sample Devanagari Half-Forms Table 12-5. Sample Devanagari Ligatures Table 12-6. RA + Vocalic Letter Ligature Forms Table 12-7. Sample Devanagari Half-Ligature Forms Table 12-8. Marathi and Nepali Allographs Devanagari Digits, Punctuation, and Symbols Extensions in the Main Devanagari Block Figure 12-10. Use of Apostrophe in Bodo, Dogri and Maithili Figure 12-11. Use of Avagraha in Dogri Table 12-9. Devanagari Vowels Used in Bihari Languages Table 12-10. Prishthamatra Orthography Devanagari Extended: U+A8E0–U+A8FF Vedic Extensions: U+1CD0–U+1CFF 12.2 Bengali (Bangla) Table 12-11. Bengali Vowel Letters Table 12-12. Diphthong Vowel Letters in Kokborok Table 12-13. Assamese Consonant-Vowel Combinations Table 12-14. Bengali Consonant-Vowel Combinations Figure 12-12. Requesting Bengali Consonant-Vowel Ligature Figure 12-13. Blocking Bengali Consonant-Vowel Ligature Figure 12-14. Bengali Syllable tta Table 12-15. Use of Apostrophe in Bangla 12.3 Gurmukhi Table 12-16. Gurmukhi Vowel Letters Table 12-17. Gurmukhi Conjuncts Table 12-18. Additional Pairin and Addha Forms in Gurmukhi Table 12-19. Use of Joiners in Gurmukhi 12.4 Gujarati Table 12-20. Gujarati Vowel Letters Table 12-21. Gujarati Conjuncts 12.5 Oriya (Odia) Table 12-22. Oriya Vowel Letters Table 12-23. Oriya Conjuncts Table 12-24. Oriya Vowel Placement Table 12-25. Ligation for the Syllable om 12.6 Tamil Tamil: U+0B80–U+0BFF Figure 12-15. Kssa Ligature in Tamil Tamil Vowels Figure 12-16. Tamil Vowel Reordering Figure 12-17. Tamil Two-Part Vowels Figure 12-18. Tamil Vowel Splitting and Reordering Figure 12-19. Vowel Reordering Around a Tamil Conjunct Tamil Ligatures Figure 12-20. Tamil Ligatures with i Table 12-26. Tamil Ligatures with u Figure 12-21. Spacing Forms of Tamil u Figure 12-22. Tamil Ligatures with ra Figure 12-23. Tamil Ligatures for shri Figure 12-24. Traditional Tamil Ligatures with aa Figure 12-25. Traditional Tamil Ligatures with o Figure 12-26. Traditional Tamil Ligatures with ai Figure 12-27. Vowel ai in Modern Tamil Tamil Named Character Sequences Table 12-27. Tamil Vowels, Consonants, and Syllables 12.7 Telugu Table 12-28. Telugu Vowel Letters Table 12-29. Rendering of Telugu na + virama 12.8 Kannada Kannada: U+0C80–U+0CFF Principles of the Kannada Script Table 12-30. Kannada Vowel Letters Figure 12-28. Indicating Retroflexion in Badaga Vowels Rendering Kannada Table 12-31. Rendering of Kannada na + virama 12.9 Malayalam Malayalam: U+0D00–U+0D7F Table 12-32. Malayalam Vowel Letters Malayalam Orthographic Reform Table 12-33. Malayalam Orthographic Reform Rendering Malayalam Table 12-34. Malayalam Conjuncts Table 12-35. Candrakkala Examples Table 12-36. Use of Joiners in Malayalam Table 12-37. Malayalam /rara/ and /uua/ Table 12-38. Malayalam /nr/ and /nt/ Table 12-39. Atomic Encoding of Malayalam Chillus Malayalam Numbers and Punctuation 13 South and Central Asia-II 13.1 Thaana Table 13-1. Thaana Glyph Placement 13.2 Sinhala Sinhala: U+0D80–U+0DFF Table 13-2. Sinhala Vowel Letters Sinhala Archaic Numbers: U+111E0–U+111FF 13.3 Newa Table 13-3. Murmured Resonants in Nepal Bhasa 13.4 Tibetan Figure 13-1. Tibetan Syllable Structure Figure 13-2. Justifying Tibetan Tseks 13.5 Mongolian Mongolian: U+1800–U+18AF Figure 13-3. Mongolian Glyph Convergence Figure 13-4. Mongolian Consonant Ligation Figure 13-5. Mongolian Positional Forms Figure 13-6. Mongolian Free Variation Selector Figure 13-7. Mongolian Gender Forms Figure 13-8. Mongolian Vowel Separator Mongolian Supplement: U+11660–U+1167F 13.6 Limbu Table 13-4. Positions of Limbu Combining Characters 13.7 Meetei Mayek Meetei Mayek: U+ABC0–U+ABFF Meetei Mayak Extensions: U+AAE0–U+AAF6 13.8 Mro 13.9 Warang Citi 13.10 Ol Chiki 13.11 Chakma 13.12 Lepcha Table 13-5. Lepcha Syllabic Structure 13.13 Saurashtra 13.14 Masaram Gondi Figure 13-9. Masaram Gondi Consonant Clusters Figure 13-10. Rendering of ra in Masaram Gondi Table 13-6. Various Signs in Masaram Gondi 13.15 Gunjala Gondi Figure 13-11. Gunjala Gondi Conjunct Formation 14 South and Central Asia-III 14.1 Brahmi Table 14-1. Brahmi Vowel Letters Figure 14-1. Consonant Ligatures in Brahmi Table 14-2. Brahmi Positional Digits 14.2 Kharoshthi Kharoshthi: U+10A00–U+10A5F Figure 14-2. Geographical Extent of the Kharoshthi Script Figure 14-3. Kharoshthi Number 1996 Rendering Kharoshthi Figure 14-4. Kharoshthi Rendering Example Table 14-3. Kharoshthi Vowel Signs Table 14-4. Kharoshthi Vowel Modifiers Table 14-5. Kharoshthi Consonant Modifiers Table 14-6. Examples of Kharoshthi Virama Figure 14-5. Subjoined Forms of ya 14.3 Bhaiksuki 14.4 Phags-pa Figure 14-6. Phags-pa Syllable Om Table 14-7. Phags-pa Positional Forms of I, U, E, and O Table 14-8. Contextual Glyph Mirroring in Phags-pa Table 14-9. Phags-pa Standardized Variants Figure 14-7. Phags-pa Reversed Shaping 14.5 Marchen 14.6 Zanabazar Square Figure 14-8. Conjunct Stacking in Zanabazar Square 14.7 Soyombo 14.8 Old Turkic 14.9 Old Sogdian 14.10 Sogdian 15 South and Central Asia-IV 15.1 Syloti Nagri 15.2 Kaithi 15.3 Sharada 15.4 Takri Table 15-1. Takri Vowel Letters 15.5 Siddham Figure 15-1. Siddham Consonant Cluster Table 15-2. Siddham Punctuation Characters 15.6 Mahajani 15.7 Khojki 15.8 Khudawadi Table 15-3. Khudawadi Vowel Letters Table 15-4. Representation of Arabic Sounds in Khudawadi 15.9 Multani 15.10 Tirhuta Table 15-5. Tirhuta Vowel Letters 15.11 Modi Table 15-6. Modi Vowel Letters Figure 15-2. Modi Shaping for ra 15.12 Grantha Grantha: U+11300–U+1137F Rendering Grantha Figure 15-3. Splitting Large Conjunct Stacks in Grantha Table 15-7. Rendering of Explicit Virama Forms in Grantha Table 15-8. Additional Svara Marks used in Grantha 15.13 Ahom 15.14 Sora Sompeng 15.15 Dogra 16 Southeast Asia 16.1 Thai Table 16-1. Glyph Positions in Thai Syllables 16.2 Lao Table 16-2. Glyph Positions in Lao Syllables 16.3 Myanmar Myanmar: U+1000–U+109F Table 16-3. Modern Burmese Syllabic Structure Myanmar Extended-A: U+AA60–U+AA7F Khamti Shan Table 16-4. Khamti Shan Tone Marks Aiton and Phake Myanmar Extended-B: U+A9E0–U+A9FF 16.4 Khmer Khmer: U+1780–U+17FF Principles of the Khmer Script Table 16-5. Independent Khmer Vowel Characters Table 16-6. Two Registers of Khmer Consonants Table 16-7. Khmer Subscript Consonant Signs Table 16-8. Khmer Composite Dependent Vowel Signs with Nikahit Table 16-9. Khmer Subscript Independent Vowel Signs Figure 16-1. Common Ligatures in Khmer Figure 16-2. Common Multiple Forms in Khmer Figure 16-3. Examples of Syllabic Order in Khmer Figure 16-4. Ligation in Muul Style in Khmer Khmer Symbols: U+19E0–U+19FF 16.5 Tai Le Table 16-10. Tai Le Tone Marks Table 16-11. Myanmar Digits in Tai Le 16.6 New Tai Lue Table 16-12. New Tai Lue Vowel Placement Table 16-13. New Tai Lue Registers and Tones 16.7 Tai Tham 16.8 Tai Viet Table 16-14. Tai Viet Symbols and Punctuation 16.9 Kayah Li 16.10 Cham Table 16-15. Cham Syllabic Structure 16.11 Pahawh Hmong Figure 16-5. Pahawh Hmong Syllable Structure 16.12 Pau Cin Hau 16.13 Hanifi Rohingya 17 Indonesia and Oceania 17.1 Philippine Scripts Tagalog: U+1700–U+171F Hanunóo: U+1720–U+173F Buhid: U+1740–U+175F Tagbanwa: U+1760–U+177F Principles of the Philippine Scripts Table 17-1. Hanunóo and Buhid Vowel Sign Combinations 17.2 Buginese Figure 17-1. Buginese Ligature 17.3 Balinese Table 17-2. Balinese Base Consonants and Conjunct Forms Table 17-3. Sasak Extensions for Balinese Figure 17-2. Writing dharma in Balinese Table 17-4. Balinese Consonant Clusters with u and u: 17.4 Javanese Figure 17-3. Representation of Javanese Two-Part Vowels 17.5 Rejang 17.6 Batak 17.7 Sundanese Sundanese: U+1B80–U+1BBF Table 17-5. Modern Sundanese Syllabic Structure Sundanese Supplement: U+1CC0–U+1CCF 17.8 Makasar 18 East Asia 18.1 Han CJK Unified Ideographs Blocks Containing Han Ideographs Table 18-1. Blocks Containing Han Ideographs Table 18-2. Small Extensions to the URO General Characteristics of Han Ideographs Table 18-3. Common Han Characters Figure 18-1. Han Spelling Figure 18-2. Semantic Context for Han Characters Principles of Han Unification Figure 18-3. Three-Dimensional Conceptual Model Unification Rules Figure 18-4. CJK Source Separation Table 18-4. Source Encoding for Sword Variants Figure 18-5. Not Cognates, Not Unified Abstract Shape Figure 18-6. Ideographic Component Structure Figure 18-7. The Most Superior Node of an Ideographic Component Table 18-5. Ideographs Not Unified Table 18-6. Ideographs Unified Han Ideograph Arrangement Table 18-7. Han Ideograph Arrangement Radical-Stroke Indices Mappings for Han Ideographs CJK Unified Ideographs Extension B: U+20000–U+2A6D6 CJK Unified Ideographs Extension C: U+2A700–U+2B734 CJK Unified Ideographs Extension D: U+2B740–U+2B81D CJK Unified Ideographs Extension E: U+2B820–U+2CEA1 CJK Unified Ideographs Extension F: U+2CEB0–U+2EBE0 CJK Compatibility Ideographs: U+F900–U+FAFF CJK Compatibility Supplement: U+2F800–U+2FA1D Kanbun: U+3190–U+319F Symbols Derived from Han Ideographs CJK and KangXi Radicals: U+2E80–U+2FD5 CJK Additions from HKSCS and GB 18030 CJK Strokes: U+31C0–U+31EF 18.2 Ideographic Description Characters Figure 18-8. Examples of Ideographic Description Characters Figure 18-9. Using the Ideographic Description Characters 18.3 Bopomofo Table 18-8. Mandarin Tone Marks Table 18-9. Minnan and Hakka Tone Marks 18.4 Hiragana and Katakana Hiragana: U+3040–U+309F Katakana: U+30A0–U+30FF Katakana Phonetic Extensions: U+31F0–U+31FF Kana Supplement: U+1B000–U+1B0FF Kana Extended-A: U+1B100–U+1B12F Figure 18-10. Japanese Historic Kana for e and ye Figure 18-11. Hentaigana Distinct Parent Ideographs Figure 18-12. Other Hentaigana Examples 18.5 Halfwidth and Fullwidth Forms 18.6 Hangul Hangul Jamo: U+1100–U+11FF Hangul Jamo Extended-A: U+A960–U+A97F Hangul Jamo Extended-B: U+D7B0–U+D7FF Hangul Compatibility Jamo: U+3130–U+318F Table 18-10. Separating Jamo Characters Hangul Syllables: U+AC00–U+D7A3 Table 18-11. Line-Based Placement of Jungseong 18.7 Yi 18.8 Nüshu 18.9 Lisu Table 18-12. Lisu Tone Letters Table 18-13. Punctuation Adopted in Lisu Orthography 18.10 Miao 18.11 Tangut Tangut: U+17000–U+187FF Tangut Components: U+18800–U+18AFF 19 Africa 19.1 Ethiopic Ethiopic: U+1200–U+137F Table 19-1. Labialized Forms in Ethiopic -WAA Table 19-2. Labialized Forms in Ethiopic -WE Ethiopic Extensions 19.2 Osmanya 19.3 Tifinagh Figure 19-1. Tifinagh Contextual Shaping Figure 19-2. Tifinagh Consonant Joiner and Bi-consonants 19.4 N’Ko Table 19-3. N’Ko Diacritic Usage Table 19-4. N’Ko Tone Diacritics on Vowels Figure 19-3. Examples of N’Ko Ordinals Table 19-5. N’Ko Letter Shaping 19.5 Vai 19.6 Bamum Bamum: U+A6A0–U+A6FF Bamum Supplement: U+16800–U+16A3F 19.7 Bassa Vah 19.8 Mende Kikakui Table 19-6. Number Formation in Mende Kikakui 19.9 Adlam 19.10 Medefaidrin 20 Americas 20.1 Cherokee 20.2 Canadian Aboriginal Syllabics Canadian Aboriginal Syllabics: U+1400–U+167F Canadian Aboriginal Syllabics Extended: U+18B0–U+18FF 20.3 Osage Table 20-1. Combining Marks used in Osage 20.4 Deseret Figure 20-1. Short Words Equivalent to Deseret Letter Names Table 20-2. IPA Transcription of Deseret 21 Notational Systems 21.1 Braille 21.2 Western Musical Symbols Figure 21-1. Examples of Specialized Music Layout Figure 21-2. Precomposed Note Characters Figure 21-3. Alternative Noteheads Figure 21-4. Augmentation Dots and Articulation Symbols Table 21-1. Examples of Ornamentation 21.3 Byzantine Musical Symbols 21.4 Ancient Greek Musical Notation Table 21-2. Representation of Ancient Greek Vocal and Instrumental Notation 21.5 Duployan Duployan: U+1BC00–U+1BC9F Shorthand Format Controls: U+1BCA0–U+1BCAF 21.6 Sutton SignWriting Sutton SignWriting: U+1D800–U+1DAAF 22 Symbols 22.1 Currency Symbols Figure 22-1. Alternative Glyphs for Dollar Sign Currency Symbols: U+20A0–U+20CF Table 22-1. Currency Symbols Encoded in Other Blocks 22.2 Letterlike Symbols Letterlike Symbols: U+2100–U+214F Figure 22-2. Alternative Glyphs for Numero Sign Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF Mathematical Alphabets Figure 22-3. Wide Mathematical Accents Figure 22-4. Style Variants and Semantic Distinctions in Mathematics Table 22-2. Mathematical Alphanumeric Symbols Fonts Used for Mathematical Alphabets Figure 22-5. Easily Confused Shapes for Mathematical Glyphs Arabic Mathematical Alphabetic Symbols: U+1EE00–U+1EEFF 22.3 Numerals Decimal Digits Table 22-3. Script-Specific Decimal Digits Figure 22-6. CJK Ideographic Numbers Other Digits Table 22-4. Compatibility Digits Figure 22-7. Regular and Old Style Digits Non-Decimal Radix Systems Acrophonic Systems and Other Letter-based Numbers Coptic Epact Numbers: U+102E0–U+102FF Rumi Numeral Symbols: U+10E60–U+10E7E Siyaq Numerical Notation Systems CJK Numerals Fractions Figure 22-8. Alternate Forms of Vulgar Fractions Common Indic Number Forms: U+A830–U+A83F 22.4 Superscript and Subscript Symbols Superscripts and Subscripts: U+2070–U+209F 22.5 Mathematical Symbols Mathematical Operators: U+2200–U+22FF Table 22-5. Mathematical Operators Disunified from Punctuation Supplements to Mathematical Symbols and Arrows Supplemental Mathematical Operators: U+2A00–U+2AFF Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF Miscellaneous Mathematical Symbols-B: U+2980–U+29FF Miscellaneous Symbols and Arrows: U+2B00–U+2B7F Arrows: U+2190–U+21FF Supplemental Arrows Standardized Variants of Mathematical Symbols 22.6 Invisible Mathematical Operators 22.7 Technical Symbols Control Pictures: U+2400–U+243F Miscellaneous Technical: U+2300–U+23FF Figure 22-9. Usage of Crops and Quine Corners Table 22-6. Use of Mathematical Symbol Pieces Figure 22-10. Usage of the Decimal Exponent Symbol Optical Character Recognition: U+2440–U+245F 22.8 Geometrical Symbols Box Drawing and Block Elements Geometric Shapes: U+25A0–U+25FF Geometric Shapes Extended: U+1F780–U+1F7FF Table 22-7. Geometric Shape Collections 22.9 Miscellaneous Symbols Miscellaneous Symbols and Pictographs Emoticons: U+1F600–U+1F64F Transport and Map Symbols: U+1F680–U+1F6FF Dingbats: U+2700–U+27BF Ornamental Dingbats: U+1F650–U+1F67F Alchemical Symbols: U+1F700–U+1F77F Mahjong Tiles: U+1F000–U+1F02F Domino Tiles: U+1F030–U+1F09F Playing Cards: U+1F0A0–U+1F0FF Chess Symbols: U+1FA00–U+1FA6F Yijing Hexagram Symbols: U+4DC0–U+4DFF Tai Xuan Jing Symbols: U+1D300–U+1D356 Ancient Symbols: U+10190–U+101CF Phaistos Disc Symbols: U+101D0–U+101FF 22.10 Enclosed and Square Enclosed Alphanumerics: U+2460–U+24FF Enclosed CJK Letters and Months: U+3200–U+32FF CJK Compatibility: U+3300–U+33FF Table 22-8. Japanese Era Names Enclosed Alphanumeric Supplement: U+1F100–U+1F1FF Enclosed Ideographic Supplement: U+1F200–U+1F2FF 23 Special Areas and Format Characters 23.1 Control Codes Representing Control Sequences Specification of Control Code Semantics Table 23-1. Control Codes Specified in the Unicode Standard 23.2 Layout Controls Line and Word Breaking Table 23-2. Letter Spacing Cursive Connection and Ligatures Figure 23-1. Prevention of Joining Figure 23-2. Exhibition of Joining Glyphs in Isolation Figure 23-3. Effect of Intervening Joiners Combining Grapheme Joiner Bidirectional Ordering Controls Table 23-3. Bidirectional Ordering Controls Stateful Format Controls Table 23-4. Paired Stateful Controls Table 23-5. Paired Stateful Controls (Deprecated) 23.3 Deprecated Format Characters 23.4 Variation Selectors 23.5 Private-Use Characters Private Use Area: U+E000–U+F8FF Supplementary Private Use Areas 23.6 Surrogates Area 23.7 Noncharacters 23.8 Specials Byte Order Mark (BOM): U+FEFF Table 23-6. Unicode Encoding Scheme Signatures Table 23-7. U+FEFF Signature in Other Charsets Specials: U+FFF0–U+FFF8 Annotation Characters: U+FFF9–U+FFFB Figure 23-4. Annotation Characters Replacement Characters: U+FFFC–U+FFFD 23.9 Tag Characters Tag Characters: U+E0000–U+E007F Deprecated Use for Language Tagging Syntax for Embedding Tags Figure 23-5. Tag Characters Working with Language Tags Unicode Conformance Issues Formal Tag Syntax 24 About the Code Charts 24.1 Character Names List Images in the Code Charts and Character Lists Special Characters and Code Points Character Names Informative Aliases Normative Aliases Cross References Information About Languages Case Mappings Decompositions Standardized Variation Sequences Positional Forms Figure 24-1. Mongolian Positional Forms Block Headers Subheads 24.2 CJK Ideographs CJK Unified Ideographs Table 24-1. IRG Sources Figure 24-2. CJK Chart Format for the Main CJK Block Figure 24-3. CJK Chart Format for CJK Extension A Figure 24-4. CJK Chart Format for CJK Extension B Compatibility Ideographs Figure 24-5. CJK Chart Format for Compatibility Ideographs Figure 24-6. Annotations Identifying CJK Unified Ideographs 24.3 Hangul Syllables A Notational Conventions Code Points Character Names Character Blocks Sequences Rendering Figure A-1. Example of Rendering Properties and Property Values Miscellaneous Extended BNF Table A-1. Extended BNF Table A-2. Character Class Examples Operators Table A-3. Operators B Unicode Publications and Resources B.1 The Unicode Consortium The Unicode Technical Committee Other Activities B.2 Unicode Publications B.3 Other Unicode Online Resources Unicode Online Resources How to Contact the Unicode Consortium C Relationship to ISO/IEC 10646 C.1 History Table C-1. Timeline C.2 Encoding Forms in ISO/IEC 10646 Zero Extending Table C-2. Zero Extending C.3 UTF-8 and UTF-16 UTF-8 UTF-16 C.4 Synchronization of the Standards C.5 Identification of Features for Unicode C.6 Character Names C.7 Character Functional Specifications D Version History of the Standard Table D-1. Versions of Unicode and ISO/IEC 10646 Table D-2. Allocation of Code Points by Type (Versions 1.0.0 to 3.0) Table D-3. Allocation of Code Points by Type (Versions 3.1 to 5.1) Table D-4. Allocation of Code Points by Type (Versions 5.2 to 7.0) Table D-5. Allocation of Code Points by Type (Versions 8.0 to 11.0) E Han Unification History E.1 Development of the URO E.2 Ideographic Rapporteur Group E.3 CJK Sources F Documentation of CJK Strokes Table F-1. CJK Strokes I Index A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
This page contains links to sections, tables, and figures of the core specification for The Unicode Standard, Version 11.0. See Unicode 11.0.0 for full context about the Unicode Standard.