Test Development FAQ

The Test Development FAQ is addressed to those who develop tests or organize testing efforts. It should also be useful to those who develop specifications or who run tests.

About this document

The FAQ provides introductory information about the purpose of testing, how to get started, and what the testing process involves. This FAQ primarily documents what is already considered good testing practice or the norm, but it also includes a number of advanced testing goals that have not yet been fully achieved by any Working Group.

This is a living document that is updated periodically, particularly in response to feedback from readers. You can provide feedback by emailing www-qa@w3.org (a publicly archived mailing list).

Test Development FAQ

Last edited $Date: 2010/01/29 13:55:50 $

  1. Are tests suites required by the W3C Process?
  2. Why is testing important?
  3. When should test development start?
  4. Who will develop the tests?
  5. Can we re-use tests developed by another Working Group?
  6. How do we decide what tests to develop?
  7. What should we do with test contributions we receive?
  8. What makes a good test?
  9. How many tests are enough?
  10. How should tests report their outcome?
  11. Do I really have to worry about all that legal stuff?
  12. How should I package and publish my tests?
  13. What about documentation?
  14. Should I automate test execution?
  15. Once I publish my tests, I'm done, right?
  16. How should I handle bugs in my test suite?
  17. Should test results be published?
  18. Should we implement a branding or certification program?

1. Are Test Suites required by the W3C Process?

As part of the transition from Candidate Recommendation to Proposed Recommendation, the W3C Process Document requires that the Working Group demonstrates that:

[...] each feature of the technical report has been implemented. Preferably, the Working Group SHOULD be able to demonstrate two interoperable implementations of each feature.

In most cases, the most practical way to demonstrate both that all the features were implemented, and that they are implemented in an interoperable fashion, is to to show that there are test cases that cover most of the features of the specification, and that for each of these test cases, there are at least two implementations that pass it.

So, while the Process Document leaves some leeway (which is useful since not all specifcations can make use of a test suite), if a Working Group is developing a technology that can be tested in a sensible fashion, the W3C Director is likely to require a test suite before allowing to move to Proposed Recommendation.

2. Why is testing important?

As the About W3C document explains:

In order for the Web to reach its full potential, the most fundamental Web technologies must be compatible with one another and allow any hardware and software used to access the Web to work together. W3C refers to this goal as “Web interoperability.” By publishing open (non-proprietary) standards for Web languages and protocols, W3C seeks to avoid market fragmentation and thus Web fragmentation.

Two implementations of a technology are said to be compatible if they both conform to the same specifications. Conformance to specifications is a necessary condition for interoperability, but it is not sufficient; the specifications must also promote interoperability (by clearly defining behaviors and protocols, for example).

In order to promote these goals the W3C Process Document's Proposed Recommendation entrance criteria include the requirement to demonstrate two interoperable implementations of each feature in the specification (see how this relates to testing).

Two types of testing are particularly helpful:

Note that both forms of testing help to detect defects (ambiguities, lack of clarity, omissions, contradictions) in specifications and are therefore useful when conducted while the specification is being developed.

Because testing is the key to interoperability, Working Groups are increasingly interested in this subject.

This FAQ focuses primarily on conformance testing (a key to interoperability) although some of its recommendations are also applicable to other kinds of testing. (See the Software QA and Testing FAQ for much useful information, including a comprehensive classification of different types of testing.)

3. When should test development start?

Test planning should start very early; ideally at the same time as you start working on the specification. Defining a testing approach (what kinds of tests to develop and how they should operate) and thinking about testability are helpful even in the early stages of specification development.

During the planning phase, identify all the specifications to be tested. This may seem obvious but often specifications refer to or depend on other specifications. It is also important to understand and to limit the scope of what is to be tested; so, focus on what really needs testing and not on related or dependent technologies being utilized indirectly by implementations.

Typically, Working Groups develop their test suites when the specifications have reached a reasonable level of stability. However, it is important to start the test development process before the specification is frozen since this helps to identify problems (ambiguities, lack of clarity, omissions, contradictions) while there is still time to correct them. 

Another interesting approach—often referred to as Test Driven Development—is developing tests specifically to explore issues and problems in the specification. (The OWL Working Group found this approach helpful.) Note that this implies significantly more work as you will need to keep the specification and the tests synchronized.

4. Who will develop the tests?

Most likely, it will be the members of your Working Group who contribute resources for test development. However, it is also worthwhile to approach third parties and ask if they are interested in developing tests. (For example, organizations that do not participate directly in your activities may want to contribute to your testing efforts if they have an interest in the effective deployment of the tested technology.)

Whichever approach you take, you will need to solicit and to manage contributions from others. This can require a considerable amount of organization and effort, particularly if you want to provide high-quality tests covering the full range of the specification. So, do take the time to create an informative and persuasive appeal for contributions.

Specify the format for developing the tests (including how tests are invoked and how they report their outcome) and any metadata to be supplied with the tests (including a description of the purpose of the test, a pointer to the portion of the specification that is tested, and the test's expected results).

For examples of guidelines, see

See also Test Suite Principles in the HTML4 Test Suite Documentation, which you may find instructive and useful.

Likewise, the Method for Writing Testable Conformance Requirements can be a useful approach to integrate testability within the specification itself.

Providing guidelines like these to your test developers will make it more likely that you will receive quality submissions. Obviously, the clearer your guidelines, the easier it will be for people to develop tests, and the greater the likelihood of tests being developed correctly and effectively.

5. Can we re-use tests developed by another Working Group?

As the family of XML languages evolves, there is an increasing tendency to develop modular specifications (specifications that are intended to be re-used in a variety of technologies). For example, XSLT and XForms both use XPath as their expression language. This trend presents the opportunity (and also the challenge) of a more modular approach to test development. If your specification incorporates such a language module, you may be able to incorporate into your own test suite tests that were developed by the Working Group that defined that module.

Also, do consider this trend and plan for it if you are developing a specification that you already know will be re-used. The guidelines and practices outlined in this FAQ are likely to prove even more important when tests being developed are intended for incorporation into more than one test suite.

For a brief discussion of some of the issues involved in test re-use, see this presentation from the W3C's 2005 Technical Plenary.

6. How do we decide what tests to develop?

It is best to focus development efforts where they will be most effective and useful. Namely, where:

Do be proactive and guide test developers to give priority to testing the areas of the specification where coverage is most needed. Note that this implies the creation and maintenance of some kind of coverage map (more on this topic under Question 8). Also, proactive guidance will help to avoid duplication of effort.

If you do not guide test developers, you may receive tests for the areas of the specification that are most easily tested, but where the value of such tests is minimal (perhaps because implementers are more likely to test these areas themselves and to find and correct any problems).

Test development is of course an iterative process.  As the CSS Test Suite Principles point out, " [...] experience with existing implementations is a great help. As implementations progress, new areas worthy of being tested will come to light, and the test suite should be updated regularly to track these developments."

7. What should we do with test contributions we receive?

The more successful you are in soliciting contributions the more important it is to create a process for managing them. All submissions should be reviewed to ensure that they are appropriate, correct, and of satisfactory quality. Keep track of who submitted each test and of the state that each test is in (for example, submitted, accepted, reviewed, returned for revision, or rejected).

For examples of test review guidelines see:

For a list of test review statuses, see the Web Content Accessibility Guidelines (WCAG) Working Group's HTML Test Suite Status.

Several Working Groups (Voice Browser, XForms, for example) have created test-case management systems to help with these tasks; you may want to adopt a similar approach.

8. What makes a good test?

A good test is:

For a more detailed discussion of these and other test design principles see the HTML4 Test Suite Documentation.

9. How many tests are enough?

There is no simple answer to this question; it depends on your goals and on the available resources. What is most important is that you get the optimal coverage for the goals you have set with the resources you can apply.

Coverage measurement involves partitioning the specification in some manner and then measuring or estimating the extent to which each area of the specification has been tested. There are many ways to partition a specification: by feature, language elements, logical sections, testable assertions, or even paragraphs or pages. Once you determine how to partition the specification, you can define your coverage goals.

Coverage measurements can be expressed in both breadth and depth terms. Breadth measurements are relatively simple to derive, since they are expressed in terms of the ratio of tests to specification features. (For example, the percentage of language elements or testable assertions that are covered by at least one test.) Depth measurements are more subjective, since they require that you to estimate how thoroughly each feature is tested. (For example, you might calculate or estimate that where features are covered by tests, the tests exercise approximately 30% of the functionality that could be tested.)

Note that it may be appropriate to define different coverage goals for different areas of the specification. In some areas, a breadth first strategy (covering each feature at a minimal level—perhaps with only a single test—before testing any particular area more thoroughly) might be most appropriate; in other areas, it might be better to focus on features that are more difficult to implement, or on parts of the specification where incomplete or incompatible implementations are more likely.

Of course, if you do not measure your coverage, you cannot determine whether you allocated your resources effectively. Whether or not you define coverage goals in advance, it is always helpful to provide some kind of coverage report with your test suite. This report could simply be a mapping of tests to areas of the specification, or it could be more detailed and also provide counts and averages of the number of tests associated with different areas. Such reports can help the users of your test suite understand its strengths and weaknesses.

Several Working Groups do provide such mappings—note that it is but a small step from this to an actual breadth metric. For example, the XForms Test Suite does not directly publish coverage numbers, but indicates that it has covered all the test assertions defined in the specification; the HTML 4.01 Test Suite shows which assertions have a matching test case; and the WCAG Working Group's Test Suite for WCAG 2.0 Sorted by Guideline indicates which areas of the Guidelines are covered by tests.

10. How should tests report their outcome?

All tests in the test suite should report their outcome in a consistent manner, making it easy for humans or computer programs to understand and to process them. The following test states, defined by EARL (the Evaluation And Report Language), have proved useful:

The more information (within reason) reported by failing tests, the more useful the tests. If your users know that one test out of one thousand fails, but they don't know what it was testing or why it failed, that is not very helpful. If they know what the test was testing, what behavior it was expecting from the implementation under test, and how the implementation failed to conform to these expectations, this makes it easier for users to find and fix the problem. The more useful your test suite is, the more it will be used.

See, for example, the CSS1 Test Suite which describes clearly the output that can be expected from each test. The HTML 4 Test Suite uses a similar approach (which was in fact derived from that used in CSS 1.

Another useful approach provide reference files against which the actual test output can be compared. See the MathML2.0 and SVG 1.1 test suites for examples of this technique.

Unfortunately, yes. Copyright, patent, and license issues can upset or delay the best-organized test development efforts (and have done so for several Working Groups). Your test suite will need to be distributed under a W3C-approved license. In addition to the two traditional W3C licenses (the Document License and the Software License), W3C has set up a dedicated double-licensing system for test suites.

This also means that contributions to the test suite will have to be provided under contribution licenses that do not contradict or inhibit the distribution license. See these Policies for Contribution of Test Cases to W3C and note the importance of the W3C's Patent Policy. The QA Handbook contains a brief discussion of the legal issues associated with test development and points out the serious consequences (such as delays in publication) that can result from neglecting these matters.

It is best to specify in your submission guidelines the licensing terms under which contributions will be distributed (see the DOM Conformance Test Suites Process Document for an example of how to do this).

With test suites designed and produced primarily by the Working Group member companies and organizations, it is crucial to ensure that licensing issues are reviewed by legal experts before actual test suite development begins, or at the very least in parallel with it. Otherwise, the release of your test suite may be delayed, possibly by several months, while licensing issues are being resolved.

12. How should I package and publish my tests?

While tests are useful, a test suite is much more useful. What's the difference?

Test runs should be deterministic (that is, for a particular implementation on a particular configuration different testers should obtain the same results). If you simply publish a random collection of tests (such as just a directory containing lots of files), it will be difficult for testers to identify or understand:

For these reasons, it is best to package the tests into a test suite and to explain how to determine which tests to run, how to run the tests, and how to interpret the results. A complete test suite will contain, in addition to tests, some or all of the following:

For example, the HTML4 Test Suite is provided as a complete package containing tests, test harness, and documentation.

13. What about documentation?

A complete test suite contains high-quality documentation that describes the following:

For example, see the CSS1 Test Suite, which provides clear instructions for setting up and running the tests, or the HTML4 Test Suite , which also provides information about the areas of the specification covered by the tests, together with a list of testable assertions. (Note that both of these test suites embed much of their documentation directly within the test suite harness, making it easily and immediately accessible to test suite users.)

Because of the way the W3C does its work, the people who execute a test suite are often the same as those who contributed to its development. Some Working Groups have therefore chosen to create a single document containing guidelines for test development and also instructions for test execution. Since this is potentially confusing for people who play only one of these roles, it is preferable to provide this information in two separate documents.

14. Should I automate test execution?

If at all possible, yes. Automated test runs are less prone to operator errors and more likely to be deterministic (that is, report the same results when run on similar configurations at different times). Automation is relatively easy when the browser can be used as the "driving force" (for examples of this approach, see the SVG, CSS1, and HTML4 test suites ; see also the test harness for the mobile web).

If automation is impractical because it would require the construction of a test harness or framework code that runs on a variety of different platforms, you should at least provide sufficient metadata and documentation to enable others to construct a test harness or framework. See the discussion of Test Case Metadata in the W3C Wiki and note also that the The XQuery Working Group's Test Task Force is defining XML-based test metadata. (A similar approach was used successfully in the DOM Conformance Test Suites, which provide both Java and ECMA Script bindings that were derived from language-neutral XML test descriptions.)

Some tests, such as those requiring human visual confirmation, are inherently difficult or impossible to automate completely. In these circumstances the process of running the tests should still be routinized as much as possible (for example, by providing a standard set of prompts for the tester to respond to, together with clear descriptions of what to expect and how to judge whether the implementation is correct.) See the MUTAT developer information tool for one approach to this issue and for practical examples of such test suites see the WCAG 2.0 Test Suite, as well as the SVG, CSS1, and HTML4 test suites referenced above.

The easier it is to run your tests, the more widely they will be used.

15. Once I publish my tests, I'm done, right?

Sorry, no. Test suites must evolve over time to:

This means that you must plan for multiple releases of the test suite. Always use version numbers so people know which test suite they're using. Also, indicate the version or versions of the specification your test suite addresses.

For examples, see the three versions of the W3C SVG 1.1 Test Suite by the SVG Working Group, or the complete list of CSS Test Suites, maintained by the CSS Working Group. 

Another approach (used by the OWL, RDF, XML Core, and SOAP Working Groups) is to publish test suites as Technical Reports, so they are "naturally" versioned (using the previous, this, latest version links within each Technical Report). Even if you adopt this approach, it will still be helpful to your user base if you also publish a single table or list of all available releases, since following a series of links backwards can be time-consuming.

16. How should I handle bugs in my test suite?

The best way to handle bugs is not to introduce them in the first place.   ;)

Review all test submissions to ensure that they are appropriate and correct. In addition to reviewing individual tests, the test suite as a whole should be tested before publication. Run the test suite on several implementations if possible, to verify that the tests behave as expected (that is, to verify that they pass when the implementation is correct and fail when it is incorrect). If you supply a test harness, check that it works correctly. Review the documentation to make sure it is complete and understandable; even better, request that someone unfamiliar with the test suite review the documentation. Pay particular attention to any setup and configuration instructions, since these are often the most problematic for test suite users.

If your test suite is buggy or difficult to use, people won't trust it and won't use it.

No matter how thoroughly you test, some bugs will still slip through. Define a process to accept and respond to bug reports. An issue management system such as Bugzilla can help with this task. See the XML Query Working Group's example of how to use Bugzilla (W3C-member only link).

In response to bugs it might be necessary to:

Unless you want to define a "patch" process to allow partial updates to the test suite (this is probably more trouble than it's worth), the simplest way to handle bugs might be to publish, where appropriate, a list of errata or known issues with workarounds, together with a list of tests known to be incorrect (and which need not be run, or whose failure can be ignored). Periodically you should issue revisions of the test suite, in which the problems are corrected.

17. Should test results be published?

While not required by W3C processes, providing a means for people to publish their test results can be beneficial. Publicity and competition provide strong incentives for developers to improve their implementations.

Some Working Groups have defined RDF formats for collecting and processing test results, and there are a number of XSLT style sheets that can be used to format results in an attractive way. For example:

Published test results are often called Implementation or Interoperability Reports. See these examples from SVG 1.1and UAAG 1.0 and others listed in the Conformance and QA WG's Matrix of W3C Specifications (links to implementation reports can be found in the last column).

18. Should we implement a branding or certification program?

While you may not want to define and implement a fully fledged program with all of the legal and administrative overhead that this implies, a simple logo that can be displayed on a web-page (such as the logo at the bottom of this page) may be useful. Note that whatever program you implement should probably involve self certification (you do not want to be in the business of certifying implementations as conformant, since this is legally risky).

For discussions of the issues involved in certification programs see the Study of a W3C Certification Activity and the QA Handbook. For examples of successful logo programs, see the W3C XHTML Markup Validation Service and the Web Content Accessibility Guidelines (WCAG) Conformance Logos initiative.

Last modified $Date: 2010/01/29 13:55:50 $ by $Author: dom $