Useful Theories about URIs

Useful Theories about URIs
(Ontologies of the Web)

Status: $Revision: 1.12 $ Please send comments to sandro@w3.org and cc: www-archive@w3.org. This is a work-in-progress.

Reading over it five months later (2003-05-30) I like some of it, but it misses bits on how model theory applies and it downplays the "information-providing service" angle which I now like a lot.

Part 1: Introduction (Background, Goals, and Approach)

1.1 Background

1.1.1. Architecture = Theory + Vision

To design a building, an architect needs: (1) knowledge about buildings and their use in general and (2) a vision for how the building is to be used. As design proceeds and decisions are made, the area of needed expertise becomes narrower, but the degree of needed expertise becomes greater. In guiding the evolution of the web, we have a similar dual need: we need to know about complex computerized information systems, and we need a vision of how the web is to be used. In the web architecture effort, each of these areas has its own potential for confusion and disagreement. This document focuses on the first area, how systems work, applied specifically and in depth to the narrow area of identifiers. Hopefully, this will allow more productive discussions about visions for the web.

It would be nice to proceed from a firm base of knowledge, for us to fully understand how identifiers function in complex computerized information systems. But the construction of software systems is still largely a technical art, not an engineering discipline. We are like builders who have invented the arch and even the flying buttress, but are still a long way from finite element analysis. Instead of a firm base of scientific knowledge, we use personal collections of theories and techniques. Even when we have formal results, we often follow our intuition, our personal metaphors, instead of doing the math.

It seems to me that much of the ongoing debates of over URIs comes from jumping straight to vision, without being clear about our theories. We may not know how the web all going to work, but we can at least be more clear about our assumptions and the mental models we use in our designs.

1.1.2. Relationships Between Theories

In history, there have been a few several famous shifts in understanding, where theories collided and deeply changed expert knowledge. They may be useful in illustrating how theories can relate to each other. (These are popular-culture versions of the stories. The more true, more detailed versions, as far as I know them, are even more interesting.)

The most obvious relation is the correction. Aristotle said that heavier things fall faster. Galileo said that Aristotle's theory was simply wrong. He proposed a simple and obvious demonstration (drop a large and small iron ball from the leaning tower of Pisa), and eventually people came to see he was right. Aristotle's theory is now viewed as simply wrong.

A more subtle relation is refinement. Newton's laws of motion work well in human experience, but in extreme circumstances (such as when approaching the speed of light), Einstein theorized they would not hold. He proposed a new theory, which now appears to be more correct that Newton's, but for most practical application the difference is too small to measure or care about. Newtonian mechanics is still very useful, since it is much simpler.

A third relation between two competing theories is that they are both in need of correction or refinement. There is a slightly-less famous debate in physics about the nature of light: is it a wave or a particle phenomenon. This debate raged, as proponents of each side put forth evidence which clearly showed support for their model. Light diffracts and refracts as waves, yet it has momentum and arrives one unit at a time like a particle. Eventually, the notion emerged that it was neither truly a wave nor truly a particle. The distinct human experiences of waves and particles simply do not match the microscopic behavior of light. Both of the older theories are still used, like Newtonian mechanics, for many applications, but a combining refinement is accepted as in fact more accurate.

One lesson here is that while people may need theories and metaphors to do do their work, developing and selecting the appropriate ones is extraordinarily difficult and error-prone. Some theories, like Aristotle's idea of gravity, are obvious, accepted, and wrong. Others, like those of Galileo and Newton, have been demonstrated in numerous experiments to be good enough for most applications. Sometimes, as with the wave/particle debate, the it turns out neither side is right.

A final, trivial, and annoying relationship between theories is that they are effectively the same but use different terminology. When the theories are formalized (perhaps expressed as mathematical or logical formulas), the superficial nature of the difference becomes clear, but when stated informally (dueling metaphors), there can be considerable needless argument.

1.2 Goals

This web page, then, is a survey and perhaps a clearinghouse for theories about how URIs function. The goal is to have each theory named, presented, and discussed, along with data about how they interoperate with each other. Ontologies, giving more precision to the vocabularies and conceptual relationships may also be given, along with historical information and appropriate references. Feedback will be gratefully accepted and incorporated as time permits; please send e-mail to sandro@w3.org and also CC: www-archive@w3.org so that your comments will be available to others.

A secondary purpose here is to help discussions about what the web should evolve into. Should one or more of these theories be adopted as official? Could it be? At very least, a blessed single theory will need to be explained and justified in terms of its relationships to each of these other contenders.

1.3 Approach

1.3.1 Naming

I have named the theories, where possible, by naming the class of things which (in the theory) each URI identifies.

1.3.2 Inclusion

Each of these theories appears, to me, to be consistent and adequate for real work. Sometimes it's hard to draw the lines between them, etc. Evidence of their use and utility should be gathered.

1.3.3 Metaphors and Ontologies

A metaphor strongly reflects an informal theory; an ontology is a kind of formal theory. Both the formal the informal approaches seem valuable, so I'm hoping for each theory to be expressed with both metaphors and ontologies. Formal theory work can be very challenging, so I don't expect that to proceed unless it turns out to be necessary for something.

1.3.4. Terminology

I've tried to use the dominant terminology in the metaphor, as I see it.

1.3.4. Test and Use Cases

Ideally each of these theories would be described in terms of particular applications, suggesing how would should use them to approach some problem. That work isn't done.

Part 2: Context-Free Theories

These theories all include the idea that a URI string identifies something. The identified something is the same regardless of the context or situation in which the identification occurs. Mathematically, these theories involve an identification function which maps from each string (which happen to conform to URI syntax) to at most one thing identified by that string. The function has only one argument (the string), so context is not a factor in the mapping.

Since "URI" stands for "Uniform Resource Identifier" the identified-thing is often called a "resource". I try to avoid that term here, because it seems to cloud the debate more than it helps; many people have preconceived notions of what exactly a resource might be.

Many people consider these context-free theories the only legitimate ones, and this view seems to be supported by RFC 2396. However, the existence of multiple useful theories with different identification functions suggests none of these theories is adequate, and that URIs do not in fact function as context-free identifiers. Theories which take this alternative view are presented separately in Part 3.

Theories [@@@turn into comparison chart!]: Document | Location | View | Invocation Point | Information Source | Subject

Document

The web is a collection of maintainable virtual documents. A URI identifies a web page, which is very much like a mass-media publication, displaying text and graphics (like a brochure, magazine, or book), or sound (like a CD) or video (like a DVD). The big difference is that unlike traditional media, a web page can change. It's a bit like a book on your bookshelf being silently updated whenever the author choses to update it. Sometimes you open your favorite book and the text you wanted is gone: there's just the number "404".

Tanenbaum/Steen book

TimBL has some web-ontological bits in log and doc. In particular, look at log:semantics. He seems to view it something like this.

(Mediated Shared Memory) Location

Some URIs are memory addresses. This is intended to be an extension to REST, making explicit the distinction between things in the problem-domain and areas in memory where information is stored about the things in the problem-domain. This distinction may be necessary for some metadata applications.

Clients cannot directly observe or affect the contents of any memory location; instead they must talk to a web server who can serialize the location's contents for them. POST involves handing some new content to that agent in a non-specific way; it might mean: add it somehow to this location, store it in another location (which you assign, which is why we didn't PUT), or just do something with this new knowledge. The "just do something with it" action gives up addressability, and may be bad practice.

Some memory locations are dedicated to information about a particularly subject; when this the case, the address of the location can be used as a kind of indirect identifier for the subject. This allows URIs to also be used to refer to problem-domain objects, but the indirection must be made explicit, such as with a primarySubject property.

When REST talks about "sending a representation" that means both (1) sending a serialization of the contents of the memory location and (2) sending a data structure which stands for the subject. REST makes no distinction between these two sense; MSM does. With the Shared Memory ontology, it's possible for a location to not have a subject.

A URI's primary denotation is a cyc:InformationStore, I think.

Obviously this does not apply to all URIs -- we'll call the ones it does apply to "web addresses", maybe.

See How do you POST to a "document"?

View

A URI identifies a presentation of a dynamic selection from a database. You can alter the presentation, selection, and database contents independently.

I love this one.

POST is adding a record to some view. If that view is on the transaction log of the database, then POST becomes capable of delete/update on the database

(Partially-Parameterized Delegable) Invocation Point

A URI identifies a callable method on some object, with some parameters already bound. Other parameters may be added.

This seems the least abstract to me, the closest to "what really happens". It works with telnet:, mailto:, even javascript: URIs. It fits HTML form URL-encoding nicely.

You can't tell where the object/method identifier ends and the parameters begin; you just delegate.

(Updateable) Information Source

The URI identifies someone or something (agent or artifact) which can tell you something. You can sometimes also tell it something.

Subject

The URI identifies something you want to know about when you read the web page. The location is where the information (the subject's "state" in OO terminology) is held.

This fails in the case of two web pages about one thing.

I once thought this was rest but now I don't know.

Part 3: Contextual Theories

See Four Uses of a URL for some discussion about why contextual theories are required.

Location-or-Subject

URIs mostly function as either location or subject identifiers. If you just see a URI in the wild, you can't know which it is. Particular contexts can and should tell you. This underlies my Disambiguating RDF Identifiers proposal.

One sign of this being widely used is that in hypertext (eg the TAG home page) sometimes the link-text names the thing described ("W3C", "Tim Berners-Lee") and other times the link-text names a document ("issues list", "TAG charter", "IRC log"). In natural language this isn't a hard and fast distinction, and there's no need to indicate which kind of identification is being done. [original context]

Actually, this kind of works with subject -or- (any of the other context-free theories)

Part 4: Other Theories

I have grouped these separately, as second-class citizens, because I don't really understand them. In some cases they are probably the same as some theory above, (Feedback is welcome.)

Representational State Transfer (REST)

I just call it a resource and a representation. URIs identify resources. GET (or its semantic equivalent in non-HTTP protocols) can be used to obtain representations. A web page is a representation at time t, often composed of other representations.

-- Roy Fielding, 2003-01-14

[Descriptions, Ontologies, etc, welcome.]

Fielding Dissertation: CHAPTER 5: Representational State Transfer

REST Wiki

baker-draft

Then we have a main stream view of REST.

Communications End-Points

Maybe URIs just identify communications end-points, like TCP sockets, but ... different.


This work is being done as part of the MIT/LCS DAML Project under the MIT/AFRL cooperative agreement number F30602- 00-2-0593. This work is not on the W3C recommendation track and is not the product of a W3C working group or interest group.

Sandro Hawke
First: 2003/01/15; This: $Date: 2003/05/30 09:58:13 $