About the Unicode Character Database
The Unicode Character Database (UCD) consists of a number of data files listing Unicode character properties and related data.
It also includes data files containing test data for conformance to several important Unicode algorithms.
Full documentation for the UCD can be found in
Unicode Standard Annex #44, Unicode Character
Database.
All files for the most up-to-date version of the Unicode
Character Database can be found at:
https://www.unicode.org/Public/UCD/latest/.
Files in the UCD/latest/ subdirectories are unversioned: they do not contain any version
indicator in their file name. However, most of the data files contain a file header in a standard format, which indicates the Unicode version and the date of last revision of that file.
The latest version of the Unicode Standard, which corresponds to the latest version of the UCD, can be found at:
https://www.unicode.org/versions/latest/.
Each specific version of the UCD is available for archival access in a versioned directory. For example, the UCD for Unicode 14.0 specifically is available at:
https://www.unicode.org/Public/14.0.0/
The UCD for Unicode 13.0 is available at:
https://www.unicode.org/Public/13.0.0/
and so on for each earlier version of the standard.
For access to versions of the UCD earlier than Version 4.1, the structure of the archival directories differed somewhat.
For full details, see
Unicode Standard Annex #44, Unicode Character
Database.
A comprehensive list of the exact data files that make up a
given version of the UCD can be found in the component lists
at Enumerated Versions of the Unicode Standard.
The contents of each version of the UCD is also available
in XML format. The XML files are in zipped format and
are stored in a subdirectory for each version. For example,
the XML version of UCD Version 14.0 can be found in:
https://www.unicode.org/Public/14.0.0/ucdxml/
Full documentation about the XML versions of the UCD can be
found in Unicode Standard Annex #42, Unicode Character Database in XML.
During periods when a preliminary (beta) version of
the standard is being released for public comment
Public Beta files
are available. For more information about any ongoing public betas see
the BETA notice
as well as Public Review
Issues.
All files and directories in the Unicode Character Database are
accessible both via HTTPS and FTP. For FTP access
use an FTP client and anonymous access.
For example, to access the contents of
https://www.unicode.org/Public/UCD/latest/ by FTP,
point an FTP client to www.unicode.org as the host,
and /Public/UCD/latest as the path.