Unicode Statistics
[Unicode]  Unicode Statistics Tech Site | Site Map | Search
 

Unicode® Statistics

This page provides various statistics regarding the Unicode Standard and related specifications.

Last updated: September 11, 2024

Character Counts

One of the most basic questions about the Unicode Standard is, "How many characters are encoded?" The answer to that question is surprisingly complicated, because there are so many different types of characters (and code points) involved in the architecture and maintenance of the universal character encoding.

Over the years, conventions have been developed for how to track the number of encoded characters of various types in the Unicode Standard. The counts were traditionally published in Appendix D, Version History of the Standard in each new version. That practice continued up to Unicode 12.0. Since then, to make this information more accessible, it has been restructured for presentation here. For an explanation of terminology related to code point types mentioned in these tables, see Section 2.4.1, Types of Code Points in the core specification. For information about some of the odder types of characters in Unicode, see also the Private-Use Characters and Noncharacters FAQ.

To help in visualizing the growth of the Unicode Standard over time, the following simple charts show some important raw character counts by year.

Charts for Characters Added by Year

Total Characters by Year
Total CJK Characters by Year

Emoji Counts

Counting emoji in the Unicode Standard constitutes a special challenge, because the full definition of emoji includes many different kinds of character sequences which are presented as a single emoji glyph to an end user. An obvious example would be a sequence of two regional indicator characters, which are then interpreted and displayed as a single, distinct "flag emoji". Tables have been compiled enumerating all the different kinds of emoji for different versions dating back to Version 3.0 of UTS #51, Unicode Emoji. Note that Emoji Version 3.0 is the earliest version with meaningful emoji counts. Emoji Version numbers prior to Version 11.0 were not tightly synched with versions of the Unicode Standard.

For information about which emoji characters were part of the Unicode Standard earlier than Unicode 9.0, see: Emoji Versions

Number of Scripts

As the Unicode Standard has expanded over the year, the number of scripts supported by the standard has also increased dramatically. The version-by-version additions are documented on the Supported Scripts page. For convenience, that table also tracks a running total of the number of scripts in the standard.

 


Access to Copyright and terms of use