Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines a mechanism to selectively provide cross-site
access to a web resource. Using either a HTTP header or an XML processing
instruction (or both), resources can indicate they allow read access from
specified hosts (optionally using patterns). When a pattern is used, one
can also exclude certain hosts. For instance, allow read access from all
subdomains of example.org
(*.example.org
) with
the exception of public.example.org
(public.example.org
).
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 18 June 2007 Working Draft of the "Enabling Read Access for Web Resources" document. This document is produced by a Task Force of the Web Application Formats (WAF) Working Group. The WAF Working Group is part of the Rich Web Clients Activity in the W3C Interaction Domain.
Please send comments to the WAF Working Group's public mailing list public-appformats@w3.org with [access-control] at the start of the subject line. Archives of this list are available. See also W3C mailing list and archive usage guidelines.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The web has a rich set of resources that can be combined to build content, applications and feature-rich web sites. A contributor to this richness is web sites including references (e.g. a link or an image inclusion) to resources residing in other domains.
For security reasons, user agents such as web browsers implement a "same origin policy" that allows a document (e.g. some JavaScript) to read, process, or otherwise interrogate the contents of another resource if and only if the other resource resides in the same domain.
This restriction on "read" access to web resources is very strict and
generally appropriate. However, there are scenarios where an application
would like to "read" data from another resource on the web without these
restrictions and in these scenarios the browser's default "security
sandbox" has to be extended or eased. For example, a car reservation web
site may want to request trip itinerary data from an affiliated airline
reservation website to streamline making a car reservation. The easing of
read access restrictions is particularly important to web browsers that
implement the XMLHttpRequest
object and VoiceXML 2.1 browsers
using the data
element.
To facilitate clear and controlled read access to resources, this specification defines a read access control mechanism that enables a web resource to permit access to its content from external domains when such access would otherwise be prohibited by a same origin policy.
User agents can not conform to this specification without also conforming to a specification that uses the access control read policy.
As well as sections marked as non-normative, all diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
In this specification, The words must, must not, should, should not and may are to be interpreted as described in RFC 2119. [RFC2119]
A conformant specification is one that implements all the requirements (the must and must not statements) listed in this specification that are applicable to specifications.
A conformant user agent is one that implements all the requirements listed in this specification that are applicable to user agents, while also being consistent with the requirements listed in the specifications that use the access control read policy.
User agents may optimize any algorithm given in this specification, so long as the end result is indistinguishable from the result that would be obtained by the specification's algorithms. (The algorithms in this specification are generally written with more concern for clarity than efficiency.)
The term ToASCII
algorithm means that
the ToASCII
algorithm as described in RFC 3490 is applied
with both the AllowUnassigned
and
UseSTD3ASCIIRules
flags set. [RFC3490]
There is a case-insensitive match of strings s1 and s2 if after uppercasing both strings (by mapping a-z to A-Z) they are identical.
U+0009, U+000A, U+000D and U+0020 are space characters.
A space-separated list is a string of which the items are separated by one or more space characters (in any order). The string may also be prefixed or suffixed with zero or more of those characters.
To obtain the values from a space-separated list user agents must replace any sequence space characters (in any order) with a single U+0020 character, dropping any leading or trailing U+0020 character, and then chopping the resulting string at each occurrence of a U+0020 character, dropping that character in the process.
An XML MIME type is text/xml
,
application/xml
or any MIME type ending in +xml
.
The mechanism defined in this specification extends the "default browser security sandbox" to allow read access for cross-site resources. The extension opens a constrained hole in the browser's "default sandbox".
A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access (to untrusted content). However, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary resources on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and web server administrators are to be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.
User agents which implement this specification should take care not to expose other trusted data (cookies, HTTP header data) inappropriately.
User agents which implement this specification should also take care to properly normalize Unicode and to properly interpret IDNs to prevent URI spoofing attacks as outlined in the specification.
Application authors should be aware that content retrieved from another site is not itself trustable. Authors should take care to protect against exposing themselves to cross-site scripting attacks by rendering or executing the retrieved content directly without validation.
Specifications using the mechanism defined in this specification need to define when the access control read policy applies to a retrieved resource. For instance, a specification could define that in case of cross-site requests this mechanism is put in place.
The policy described is only safe for HEAD
and
GET
requests. Specifications should not use
it for other HTTP methods without specifying extra safety measures.
[RFC2616]
An access item is a domain containing a wildcard prefixed by a scheme and must match the following EBNF:
access-item ::= (scheme "://")? domain-pattern (":" port)? | "*" domain-pattern ::= subdomain | "*." subdomain
scheme
and port
are used as defined in RFC
3986. subdomain
is used as defined in RFC 1034. [RFC3986] [RFC1034]
In addition to matching the above EBNF the ToASCII
algorithm must
apply successfully (without errors) to each label
component
of the subdomain
(if any) from the access item.
If the port or scheme is omitted a wildcard match is performed on them.
An access item of *
matches anything. When *
is used as part of
domain-pattern
it matches any number of label
components before the subdomain
.
Several examples of conforming access items:
*
*.example.org
https://*.example.org
https://example.org:8443
The following access items would make the user agent deny access to the resource:
https://*.*:80
*://example.org
http://example.org/
http://example.org/example
http://example.org:
http://example.org:*
The following access items are not identical:
http://example.org
http://example.org:80
Content-Access-Control
headerResources to which the access control read
policy applies can have one or more Content-Access-Control
headers defined which must match the following EBNF:
Content-Access-Control ::= "Content-Access-Control" ":" LWS? ruleset ruleset ::= LWS? rule LWS? ("," LWS? rule LWS?)* rule ::= rule-type (LWS pattern)+ (LWS "exclude" (LWS pattern)+)? rule-type ::= "allow" | "deny" pattern ::= "<" access item ">"
As stated by RFC 2616, multiple Content-Access-Control
headers may be combined.
LWS
is used as defined by RFC 2616. [RFC2616]
In case resources on a domain are not all in the control of a single
person "deny" rules can be used by authors to deny read access from
external resources to the entire domain. Read access from other domains is
by default disallowed but individual resources on the domain could have
<?access-control?>
processing instructions specified which can allow access from other
domains. Although files containing such processing instructions HTTP
headers can be set accross an entire server making them far more
effective. The "exclude" clause can be used to list exclusions to these
"deny" rules.
"allow" rules can be used to allow read access from particular domains as long as those domains don't match any of the patterns listed in "exclude".
Content-Access-Control: allow <*.example.org> exclude <*.public.example.org>
Content-Access-Control: allow <webmaster.public.example.org>
Means that every subdomain of example.org
can access the
resource including webmaster.public.example.org
, but with
the exclusion of all other subdomains of public.example.org
.
Content-Access-Control: allow <example.org> <*.example.org>
Means that example.org
and all its subdomains can access
the resource.
<?access-control?>
processing instructionXML resources may include an <?access-control?>
processing
instruction within the XML Prolog to indicate in cases where the access control read policy applies from which
domains they can be fetched. [XML]
The processing instruction takes three pseudo-attributes which each take
a space-separated list of access items. These
pseudo-attributes are allow
, deny
and
exclude
. Either the allow
or deny
pseudo-attribute must be specified. allow
and deny
must not be specified at the same
time. If an attribute is specified it must at least
contain an access item.
An <?access-control?>
processing instruction that is part of the XML Prolog must be parsed using the same syntax rules as described in
the XML Stylesheet PI specification. <?access-control?>
processing
instructions outside the XML Prolog are ignored. [XMLSSPI]
The above means that the following examples would be non-conforming and would make the user agent deny access to the resource:
<?access-control?>
<?access-control x?>
<?access-control x=""?>
<?access-control allow=""?>
<?access-control allow="http://example.org"
x=""?>
<?access-control allow="allow.example.org"
deny="deny.example.org"?>
When a resource is requested to which the access control read policy is said to apply the user agent must then associate the following with that resource:
An unordered, initially empty, HTTP access control allow list of which each list item contains a match list and an exclude list.
An unordered, initially empty, HTTP access control deny list of which each list item contains a match list and an exclude list.
An unordered, initially empty, PI access control allow list of which each list item contains a match list and an exclude list.
An unordered, initially empty, PI access control deny list of which each list item contains a match list and an exclude list.
An allow access flag which is used in the algorithms to determine at certain points whether access will be granted. The flag has two values: "true" and "false". Its initial value is "false".
The match lists and exclude lists are unordered lists of access items. The match lists are guaranteed to be non-empty and the exclude lists can be empty.
After associating the aforementioned lists and when all HTTP headers have been received the user agent must run the following algorithm (unless stated otherwise):
Parse the Content-Access-Control
headers. If any value does not conform to the syntax required deny
access to the resource and terminate the algorithm. If parsed
successfully then for each rule run the following steps:
If rule-type
is "allow"
append a new list
item to the HTTP access control allow list
where the match list is constructed of each
access item following "allow"
and the exclude list of each access item following "exclude"
.
If "exclude"
is not present the exclude list will be
empty.
If rule-type
is "deny"
append a new list
item to the HTTP access control deny list
where the match list is constructed of each
access item following "deny"
and the exclude list of each access item following "exclude"
.
If "exclude"
is not present the exclude list will be
empty.
Then run the following steps for each list item (if any) in the HTTP access control deny list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Deny access to the resource and terminate the overall algorithm.
Run the following steps for each list item (if any) in the HTTP access control allow list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Set the allow access flag to "true" and go to the next step in the overall set of steps.
If the requested resource has an XML MIME type go to the next step. Otherwise, if the allow access flag is "false" deny access to the resource and terminate the overall algorithm. If the allow access flag is "true" user agents should grant access to the resource and must terminate the overall algorithm.
Parse the resource as an XML document using a streaming XML parser
following the rules set forth in the XML specification up to and
including the root element start tag. Then process the encountered
<?access-control?>
processing instructions (if any).
If there is either an XML parse error or failure to parse the
processing instructions deny access to the resource and
terminate the overall algorithm. Otherwise, run the following steps for
each <?access-control?>
processing instruction:
If the processing instruction has any other pseudo-attributes than
deny
, allow
and exclude
, has
not exactly two pseudo-attributes or has both deny
and
allow
specified terminate the overall algorithm and
deny access to the resource.
Let temp match list be the result of parsing the allow
or deny
pseudo-attribute value, whichever is present. If any obtained value
does not match the access item syntax or if
no values was obtained terminate the overall algorithm and deny
access to the resource.
If there is an exclude
pseudo-attribute let temp exclude list be the result of parsing the
exclude
pseudo-attribute value. If any obtained value
does not match the access item syntax or if
no value was obtained terminate the overall algorithm and deny
access to the resource. If there is no such pseudo-attribute let
temp exclude list be empty.
If there is an allow
pseudo-attribute append a new list
item to the PI access control allow list
where the match list is temp
match list and the exclude list is
temp exclude list.
Otherwise, there is a deny
psuedo-attribute. Append a
new list item to the PI access control deny
list where the match list is temp match list and the exclude
list is temp exclude list.
Then run the following steps for each list item (if any) in the PI access control deny list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Deny access to the resource and terminate the overall algorithm.
Then run the following steps for each list item (if any) in the PI access control allow list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Set the allow access flag to "true" and go to the next step in the overall set of steps.
If the allow access flag is "false" deny access to the resource. If the allow access flag is "true" user agents should grant access to the resource.
The requesting URI is the
scheme
followed by ://
, followed by the
domain
without any trailing U+002E (.
) (if any),
followed by :
, followed by the port
(defaulting
to the default port for the scheme
) of the resource from
which the request originated. If the resource does not have a host-based
authority (data:
URI scheme for instance) the requesting URI
is "null".
Define the above in terms of "origin"? See HTML5...
To determine whether a requesting URI and an access item match user agents must run the following algorithm:
Let requesting URI be origin and access item be item.
If item is a single U+002A (*
) there
is a match. Terminate this algorithm.
If origin is "null" there is no match. Terminate this algorithm.
If item has a scheme
and it does not
case-insensitively match the scheme
from origin there is no match. Terminate this algorithm.
If either item or origin has a
scheme
remove it including the ://
sequence
following it.
If item has a port
and it does not
match the port
from origin there is no
match. Terminate this algorithm.
If either item or origin has a
port
remove it including the U+003A (:
)
preceding it.
Let origin list be origin
split on the U+002E (.
) character (dropping that character
in the process) and item list be item split on the U+002E (.
) character
(dropping that character in the process). Ensure that the order is
preserved.
Reverse the order of origin list and item list.
Now process the first list item of both origin list and item list using the following steps:
Let the item from origin list be origin label and the item from item list be item label.
If item label is a single U+002A
(*
) character move to the next step in the overall set of
steps.
Apply the ToASCII
algorithm to
origin label and item label
and store the result in those variables respectively.
If origin label does not case-insensitively match item label there is no match (terminate the overall algorithm).
Otherwise, apply these set of steps to the next list item of both origin list and item list. If either of them has no next list item there is no match (terminate the overall algorithm.) If both no longer have a next list item go to the next step in the overall set of steps.
There is a match. Terminate this algorithm.
The editor would like to thank the following people for their contributions to this specification (ordered by first name):
Special thanks to Brad Porter, Matt Oshry and R. Auburn who helped editing earlier versions of this document.