This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
This third edition is based on Unicode
This document was produced by the
Comments should be sent to the
This document was produced by a group operating under the
Appendix
This document defines several sets of names, so that to each name is assigned a Unicode character or sequence of characters. Each of these sets is expressed as a file of XML entity declarations.
First draft, derived from the MathML2 sources.
Second draft, incorporating comments from Karl Tomlinson, Ian Hickson and others.
Final Last Call draft, incorporating new comments from many and ensuring that the listings are fully up-to-date with W3C and Unicode development.
Proposed Recommendation form incorporating editorial changes from the Director's meeting at which the decision to advance the status was reached and a couple of tiny late corrections.
2nd edition version incorporating the Arabic Mathematical symbols block.
Notation and symbols have proved very important for human communication,
especially in scientific documents. Mathematics has
grown in part because its notation continually changes toward being succinct
and suggestive. There have been many new signs
However, in some environments it is more convenient to use the ASCII input mechanism provided by XML entity references. Many entity names are in common use, and this specification aims to provide standard mappings to Unicode for each of these names. It introduces no names that have not already been used in earlier specifications. Note that these names are short mnemonic names designed for input methods such as XML entity references, not the longer formal names that form part of the Unicode standard.
Specifically, the entity names in the sets
starting with the letters iso
were first standardized in SGML (mml
were first standardized in
MathML xhtml
were first standardized in HTML
This document is the result of years of employing entity names on the Web. There were
always a few named entities used for special characters in HTML, and many more names
used for MathML. This means that this document can be
viewed as an extension and final revision of Chapter 6 of the MathML 2.0
Since there are so many character entity names, and the files specifying them are resources that may be subject to frequent lookup, a template catalog file has also been provided. Users are strongly encouraged to design their implementations so that relevant entity name tables are cached locally, since it is not expected that the listings provided with this specification will need changing for some long time.
Historically the entity sets have been split into relatively small groups of related characters
however for any new document type that is being defined it is strongly recommended that the combined
To incorporate the
The public identifier should always be used verbatim, The system identifier should be changed to suit local requirements.
The entity set is available in two forms:
The information is also available in JSON format. The JSON arrays encode the entity names and mappings to Unicode and also a list of those entity references for which the HTML (but not XML) parser allows the trailing semicolon to be omitted. So &
may be used as well as &
when using HTML.
An XSLT2 stylesheet is available which performs the reverse mapping, replacing Unicode characters by entity references.
This specification defines mappings to Unicode of many sets of names that have been defined by earlier specifications.
We present two tables listing all the sets combined, first in Unicode order and then in alphabetic order:
All in
All in
Then there come tables documenting each of the entity sets. Each set has a link to the DTD entity declaration for the corresponding entity set, and also a link to an XSLT2 stylesheet that will implement a reverse mapping from characters to entity names (this is, of course, only possible for entity names that map to a single Unicode code point).
Certain characters are of particular relevance to scientific document production. The following tables display Unicode ranges containing the characters that are most used in mathematics.
Note that each of the tables linked from this section contains 256 images and may take a while to load if the images have not been cached locally.
Many of the entities defined by this specification relate to the mathematical alphanumeric characters contained in the letter-like symbols block of Unicode Plane 0, or in the Mathematical Alphanumeric Symbols block in Unicode Plane 1. The following tables list all these symbols, highlighting those that are not in Plane 1, and giving entity names where appropriate.
Each of the entity definitions in a majority of the specification expands to a single Unicode character. The definitions that expand to a sequence of two or more characters are outlined in this section.
In addition to the Unicode Characters so far listed, one may use the
combining characters base
character, with no
intervening markup or space, just as is the case for combining accents.
In principle, the negation characters may be applied to any Unicode
character, although fonts designed for mathematics typically have some
negated glyphs ready composed. A MathML renderer should be able to use
these pre-composed glyphs in these cases. A compound character code
either represents a UCS character that is already available, as in the
case of
Note that it is the policy of the W3C and of Unicode that if a single
character is already defined for what can be achieved with a combining
character, that character must be used instead of the decomposed form.
It is also intended that no new single characters representing what
can be done by with existing compositions will be introduced.
Unicode attempts to avoid having several character codes for simple
font variants. For a code point to be assigned there should be
more than a nuance in glyphs to be recorded. To record
variants worth noting there is a special character
Historically there has been much confusion and lack of agreement over variant forms for lower case epsilon.
This specification uses the definitions below. Note that the
name
The situation for phi is very similar to that of epsilon, although with the further complication that early versions of Unicode had the sample glyphs for U+03C6 and U+03D5 swapped from the current usage, and some older fonts still in use follow that older convention. The definitions used in this specification are as listed below.
In addition to the combining and variant character combinations listed in the previous sections, the following table lists the remaining entity replacement texts that consist of more than one character.
Unicode does not have an fj character, although the other common f ligatures
such as fi (U+FB01) are contained in the Alphabetic Presentation Forms block.
The fj
;
modern typesetting engines should automatically use the fj ligature for this
combination if the font supplies such a ligature.
Unicode has a range of space characters (including all multiples of
1/18 em up to 6/18, except for 5/18 em) thus the
The entities
For reasons explained further in
Source files updated to Unicode 8.0, affecting the character tables, but with no changes to generated entity files or stylesheets.
Source files updated to Unicode 6.3, affecting the character tables, but with no changes to generated entity files or stylesheets.
Source files updated Unicode 6.1 data on Arabic math alphabets (U+1EE??). Additional tables shown in Sections 3 and 4.
Section htmlmathml
set which is used in MathML and HTML. Also link to XSL and JSON formats for the HTML MathML set.
References updated:
Several example images improved, bringing them more in line with the Unicode reference images.
Various editorial improvements, including using Unicode U+1234 notation more consistently rather than displaying the internal IDs of the form U01234.
The combined entities file distributed with the 2009-11-17 draft introduced an error that if two entity names differed only by case, only one was included. This has been corrected.
The combined entity set htmlmathml corresponding to the entities usable in HTML and MathML is now explicitly provided. The predefined set, corresponding to the entities predefined in XML is now documented (it was previously used internally).
The entities
The entity
A sample
The html5-uppercase set is now documented.
The entities
The entity
The entities
The source files have all been updated to match Unicode 5.2.
The entity
The entity
The entity
The entities
The entities
The following entity definitions have changed at this draft:
Differences between the XHTML entity definitions described here and the entity set
described in the
U+27E8 and U+27E9; XHTML 1.0 used U+2329 and U+232A (which have canonical decomposition to U+3008 and U+3009).
The current drafts of
The differences between MathML 2 and the current entity definitions are listed below.
ISOPUB (and MathML 1) defined an fj ligature;
Unicode does not have a specific character and the entity was dropped from MathML2.
It is re-instated here for maximum compatibility with
U+03C6 GREEK SMALL LETTER PHI (the definition used in HTML4); MathML2 used U+03D5 GREEK PHI SYMBOL.
these have been changed to map to the symbol character
(to match other uses of the var prefix such as
U+0237; MathML 2 used U+006A (j) as there was no dotless j before Unicode 4.1.
U+23E2 and U+23E7; MathML 2 used U+FFFD (REPLACEMENT CHARACTER) as these characters were added at Unicode 5.0 specifically to support these entities.
As noted above, the definitions of these entities have been changed so that the definitions use characters that are in NFC normal form.
U+27C8 and U+27C9; MathML2 used U+005C U+02282 and U+2283 U+002F.
U+2267 U+0338 ; MathML2 used the erroneous definition U+2266 U+0338.
The following bracket symbols have been added to the Mathematical symbols block in Unicode versions between 3.1 and 5.1. MathML2 used similar characters intended for CJK punctuation.
U+27E8 and U+27E9; MathML2 used U+2329 and U+232A (which have canonical decomposition to U+3008 and U+3009).
U+27EA and U+27EB; MathML2 used U+300A and U+300B.
U+2772 and U+2773; MathML2 used U+3014 and U+3015.
U+27EC and U+27ED; MathML2 used U+3018 and U+3019.
U+27E6 and U+27E7; MathML2 used U+301A and U+301B.
U+23DE and U+23DF; MathML2 used U+FE37 and U+FE38.
U+23DC and U+23DD; MathML2 used U+FE35 and U+FE36.
U+27E6 and U+27E7; MathML2 used U+301A and U+301B.
All data files used to construct the entity declarations, XSLT character maps, and
HTML tables referenced from this document are available from