Using Qualified Names (QNames) as Identifiers in XML Content

W3C

Using Qualified Names (QNames) as Identifiers in XML Content

TAG Finding 17 March 2004

This version:
http://www.w3.org/2001/tag/doc/qnameids-2004-03-17
Latest version:
http://www.w3.org/2001/tag/doc/qnameids
Previous versions:
http://www.w3.org/2001/tag/doc/qnameids-2004-02-27 http://www.w3.org/2001/tag/doc/qnameids-2004-01-14 http://www.w3.org/2001/tag/doc/qnameids-2004-01-06
Editor:
Norman Walsh, Sun Microsystems, Inc. <Norman.Walsh@Sun.COM>

This document is also available in these non-normative formats: XML and 2004-02-27 Diff.


Abstract

The question that prompted this finding was "are QNames acceptable replacements for URIs as identifiers within specifications?" This finding documents the TAG's opinion on the use of QNames as identifiers.

Status of this Document

This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG issue qnameAsId-18.

This is the 17 March 2004 revision of this finding. At their March face-to-face meeting, the TAG decided to accept this finding.

Additional TAG findings, both accepted and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC 2119].

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Preface
2 QNames as Identifiers
3 QNames in XML Element and Attribute Names
4 QNames in Other XML Names
    4.1 QNames in Other Specifications
    4.2 Namespace Bindings
5 Architectural Observations
6 Architectural Statement
7 References

Appendix

A Use Case: XML Canonicalization (Non-Normative)


1 Preface

This TAG Finding documents a portion of the web architecture where conflicting requirements and design goals intersect. It is a simple matter of fact that specifications which have chosen one set of design criteria interoperate less well with specifications that have chosen a different set.

Given that there are existing specifications which exhibit incompatible designs and strong arguments in favor of each design, the TAG elects not to assert architectural principles that would be in direct conflict with some significant set of specifications.

It's possible that these issues could be addressed in the scope of some larger, more global redesign of, for example, XML, but no short-term solution presents itself

2 QNames as Identifiers

This finding is concerned with the use of qualified names (QNames) as identifiers. That is, the contexts in which a colonized name can be understood to be a QName.

A related TAG issue, rdfmsQnameUriMapping-6, concerns the mechanism by which one can (or can not) construct a URI for a particular QName. We do not consider that issue in this finding.

3 QNames in XML Element and Attribute Names

Qualified names were introduced by [XML Namespaces]. They were defined for element and attribute names (only) and provide a mechanism for concisely identifying a {URI, local-name} pair. For example, in the following document:

<?xml version='1.0'?>
<doc xmlns:x="http://example.com/ns/foo">
<x:p/>
</doc>

The QName "x:p" is a concise, unambiguous name for the {URI, local-name} pair {"http://example.com/ns/foo", "p"}.

When used solely in element and attribute names, all QNames are identified by the XML processor and can logically be replaced by the URI/local-name pair they identify.

4 QNames in Other XML Names

At the request of the XML Schema Working Group, the XML Core Working Group is producing an erratum to [XML Namespaces] to clarify the meaning of colons in other contexts.

In particular, this erratum makes it clear that entity names, processing instruction targets, and notation names are not QNames and they may not include colons. Documents that do not satisfy this constraint are not namespace well-formed. Furthermore, the values of attributes of type ID, IDREF(S), ENTITY(IES), and NOTATION are also forbidden from containing colons. Documents that do not satisfy this constraint are not namespace valid.

A colon that introduces a namespace validity or namespace well-formedness error into a document does not introduce a QName. In other words, the term "identifier" in this finding is not related to XML identifiers of type ID since they cannot be QNames.

4.1 QNames in Other Specifications

Other specifications, starting with [XSLT], have taken QNames and employed them in contexts other than element and attribute names. Specifically, QNames have been used in attribute values and element content.

For example, in the following document, "x:p" is understood to be a QName even though it appears in an attribute value, not an element or attribute name.

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:x="http://example.com/ns/foo"
                version="1.0">

<xsl:output method="html"/>

<xsl:template match="x:p">
  <p>
    <xsl:apply-templates/>
  </p>
</xsl:template>

</xsl:stylesheet>
</doc>

In attribute values and element content, QNames are often used to identify a particular element type; they are, in principle, using QNames as they were intended. However, some specifications use QNames as shortcuts for unique identifiers derived from a {URI, local-name} pair that have no relationship to element or attribute types.

The [Functions and Operators] specification, for example, uses QNames to identify functions. This is motivated partly by backwards compatibility with XPath 1.0, but also by the fact that function names share some characteristics with element and attribute names. In particular, the names need to be globally unique so that name collisions don’t occur either between independently developed functions or different versions of the specification.

Using a QName as a shortcut for a {URI, local-name} pair is often convenient, but it carries a price. In order to identify QNames in content, a processor must understand the syntax and possibly the semantics of the content. The “x:p” in the preceding example can only be recognized as a QName by a processor that understands both XPath (in order to parse the attribute value) and XSLT (in order to know which attributes contain XPath expressions).

4.2 Namespace Bindings

Specifications for XML-based languages usually rely on the in-scope namespace bindings in the XML document to associate prefixes with namespace names.

Using the in-scope namespace bindings has the advantage that it theoretically allows a generic processor to interpret QNames in content without having to be aware of any application-specific mechanisms. The alternative, where every specification defines its own mechanism, would clearly lead to a badly fragmented web.

However, there is at least one application where a compelling argument has been made for requiring an alternative mechanism for defining namespace bindings. That application is [XPointer Framework]. It is an architectural principle of URIs that they be context-independent. It follows that the QNames that appear in an XPointer must not refer to in-scope namespaces as this would make transcription impossible in the general case.

We must therefore accept that there are some applications which use in-scope namespaces and some which use their own mechanisms. However, the namespace binding framework defined by the [XML Namespaces] is well established and widely supported. The cost associated with defining and deploying an alternate mechanism is very large and should be avoided wherever possible.

Because there is some possibility of variation in the way namespace bindings are established, even if a QName can be identified in content, it may be difficult or impossible to determine what {URI, local-name} it represents. The mapping may depend on the context in which it occurs. Therefore, at the very least, it is important for specifications to identify the mapping algorithm that they have chosen.

Specifications that use QNames to represent {URI, local-name} pairs MUST describe the algorithm that is used to map between them.

We observe also that there is an overlap in the lexical space of QNames and URIs.

Specifications that use QNames to represent {URI, local-name} pairs SHOULD NOT allow both forms in attribute values or element content where they would be indistinguishable.

5 Architectural Observations

The TAG makes the following observations:

6 Architectural Statement

In so far as the identification mechanism of the Web is the URI and QNames are not URIs, it is a mistake to use a QName for identification when a URI would serve.

That said, the TAG recognizes that there are sometimes pragmatic reasons for chosing short, lexical representations of more complex names and accepts that QNames are an established mechanism for doing so. Further, it must be observed that some things are identified by QNames: element and attribute names, types in W3C XML Schema, etc.

Where there is a compelling reason to use QNames instead of URIs for identification, it is imperative that specifications provide a mapping between QNames and URIs, if such a mapping is possible.

Finally, we observe that a whole class of interpretation problems can be avoided if the use of QNames can be restricted to contexts where their identification is natural and unambiguous (element and attribute names, simple content of type xs:QName, etc.) and we encourage developers to employ such restrictions wherever possible.

7 References

RFC 2119
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. IETF. March, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
Canonical XML
John Boyer, editor. Canonical XML Version 1.0. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xml-c14n.)
XPointer Framework
Paul Grosso, Eve Maler, Jonathan Marsh, Norman Walsh, editors. XPointer Framework. World Wide Web Consortium, 2002. (See http://www.w3.org/TR/xptr-framework/.)
XML Datatypes
Paul V. Biron and Ashok Malhotra, editors. XML Schema Part 2: Datatypes. World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xmlschema-2/.)
XML Namespaces
Tim Bray, Dave Hollander, Andrew Layman, editors. Namespaces in XML. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)
XML-DSig Core
Donald Eastlake, Joseph Reagle, and David Solo, editors. XML-Signature Syntax and Processing. World Wide Web Consortium, 2002. (See http://www.w3.org/TR/xmldsig-core/.)
XSLT
James Clark, editor. XML Transformations (XSLT) Version 1.0. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xslt.)
Functions and Operators
Ashok Malhotra, Jim Melton, and Norman Walsh, editors. XQuery 1.0 and XPath 2.0 Functions and Operators. World Wide Web Consortium, 2003. (See http://www.w3.org/TR/xpath-functions/.)

A Use Case: XML Canonicalization (Non-Normative)

Suppose you want to answer the question, “are two XML documents the same?” One way to attempt to answer this question is to define a canonical representation for an XML document and then compare the two canonical representations to see if they are the same. This use case provides a nice demonstration of the issues discussed in this finding.

Imagine for a moment that we aren’t concerned with QNames in content and consider these two documents:

Example: Document A
<doc xmlns="http://example.org/ns/one">
  <t:title xmlns:t="http://example.org/ns/one">
    Title
  </t:title>
  <div xmlns:t="http://example.org/ns/two"
       xmlns:u="http://example.org/ns/three">
    <title u:attr="1">
      Division Title
    </title>
    <t:para xmlns:u="http://example.org/ns/three"
            u:a1="one" u:a2='two'    u:a3='three'>Text</t:para>
  </div>
</doc>
Example: Document B
<r:doc xmlns:r="http://example.org/ns/one"
       xmlns:z="http://example.org/ns/four">
  <a:title xmlns:a="http://example.org/ns/one">
    Title
  </a:title>
  <r:div xmlns:u="http://example.org/ns/three"
         xmlns:a="http://example.org/ns/two">
    <r:title u:attr="1">
      Division Title
    </r:title>
    <a:para x:a1="one" x:a3='three' x:a2='two'
            xmlns:x="http://example.org/ns/three">Text</a:para>
  </r:div>
</r:doc>

Are they the same? At first glance, it probably isn’t clear if they’re the same or not. Two make the problem easier, we can define some rules for canonicalization:

Under these rules, Document A would be transformed to:

Example: Canonical Document A
<a1:doc xmlns:a1="http://example.org/ns/one">
  <a1:title>
    Title
  </a1:title>
  <a1:div xmlns:a2="http://example.org/ns/three"
          xmlns:a3="http://example.org/ns/two">
    <a1:title a2:attr="1">
      Division Title
    </a1:title>
    <a3:para a2:a1="one" a2:a2="two" a2:a3="three">Text</a3:para>
  </a1:div>
</a1:doc>

And, in fact, so would Document B, demonstrating that they are the same.

Now consider the case where QNames may occur in content. Suppose we begin with this document:

Example: Document C
<doc xmlns="http://example.org/ns/one">
  <t:title xmlns:t="http://example.org/ns/one">
    Title {t:title}
  </t:title>
  <div xmlns:t="http://example.org/ns/two"
       xmlns:u="http://example.org/ns/three">
    <title u:attr="1" t:attr="u:expr">
      Division Title
    </title>
    <t:para xmlns:u="http://example.org/ns/three"
            u:a1="one" u:a2='two'    u:a3='three'>Text</t:para>
  </div>
</doc>

By our canonicalization rules, this would become:

Example: Badly Canonical Document C
<a1:doc xmlns:a1="http://example.org/ns/one">
  <a1:title>
    Title {t:title}
  </a1:title>
  <a1:div xmlns:a2="http://example.org/ns/three"
          xmlns:a3="http://example.org/ns/two">
    <title a2:attr="1" a3:attr="u:expr">
      Division Title
    </title>
    <a3:para a2:a1="one" a2:a2="two" a2:a3="three">Text</a3:para>
  </a1:div>
</a1:doc>

Unfortunately, we’ve destroyed the information content of the document.

If we wish to preserve the information content of the document, we must be much more conservative. It’s no longer appropriate to throw away duplicate namespace declarations (unless they declare the same prefix and URI) or change prefixes:

Example: Canonical Document C
<doc xmlns="http://example.org/ns/one">
  <t:title xmlns:t="http://example.org/ns/one">
    Title {t:title}
  </t:title>
  <div xmlns:t="http://example.org/ns/two"
       xmlns:u="http://example.org/ns/three">
    <title u:attr="1" t:attr="u:expr">
      Division Title
    </title>
    <t:para u:a1="one" u:a2="two" u:a3="three">Text</t:para>
  </div>
</doc>

Under these rules, differences in the canonical representations of two documents do not necessarily constitute real differences. (Comparison of different documents is not the only use case for canonicalization, however, so it is not without value.)

For a much more complete discussion of these issues, see [Canonical XML] where they are addressed in detail and a complete description of reasonable “conservative” algorithm is described.