Tag Archives: XML

Test suite and validation

Most standards produce test suites to demonstrate the features of the standard. This is also very useful to test implementations. GPAC also has a (very limited) test suite to test that the code does not regress. However, when the tests need to be clicked on or when there is animation, the validation of a test may get quite complex.

There are several ways to solve this problem. Erik Dahlström from Opera told me that they use additional JavaScript with specific APIs to do their regression testing. This is interesting but I’m concerned by the time it takes to author this Javascript. So I  thought about another way.

Within GPAC, I’ve implemented a small plugin which enables two things:

  • to play a test, record the interactions (in an XML file) and take PNG snapshots upon specific events;
  • and then to replay the content, reproduce the interactions and compare the snapshots to indicate if the result is valid or not.

A playlist of test sequences can then be created and the validation can be automatic. It doesn’t take much time to record the interactions, what I called the validation script.

You should see this soon on GPAC SVN.

XML and whitespaces

I thought handling of white spaces in XML documents was simple. Now, I know better. Let me share my understanding. There are 2 main levels of processing for white spaces:

  • at the parser level;
  • at the application level either for manipulating text in the DOM or for rendering text.

Parsing

White spaces may appear almost anywhere but for use in an application, only those in attribute values and in text content matter. XML 1.0 (or 1.1, for that respect it’s the same) indicates what should be done with white spaces:

An XML processor MUST always pass all characters in a document that are not markup through to the application. A validating XML processor MUST also inform the application which of these characters constitute white space appearing in element content.

XML defines what is markup and what is not.

Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations, and any white space that is at the top level of the document entity (that is, outside the document element and not inside any other markup).

So attribute values are markup as part of the start-tag construct and are not affected by the above quote and white space in attributes should be dealt differently:

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

Additionally, XML defines the xml:space attribute but at the parsing level it’s useless.

DOM Manipulations

The characters in the DOM are received by the parser for text and cdata section nodes, in the form of a DOMString. In DOM3 it is possible to retrieve/set the text content of a DOM subtree without scanning the whole tree using the textContent atribute. This attribute of a DOM node returns the text content with all whitespaces if no validation is performed or without the whitespaces not specifically allowed in the associated grammar (DTD, Schema, RNG,…) :

On getting, no serialization is performed, the returned string does not contain any markup. No whitespace normalization is performed and the returned string does not contain the white spaces in element content (see the attribute Text.isElementContentWhitespace).

Rendering

The rendering algorithm should handle the white spaces. For example, SVG overrides the semantics of the XML-defined xml:space attribute to apply normalization (convert carriage return, line feed, tabs into space, remove duplicated, trailing and leading spaces) when the value of xml:space is “default” and render the result without modifying the DOM tree value; and when the value is “preserve”, only the conversion of CR, LF and TAB is done, all spaces are displayed. In HTML context, CSS defines the white-space property and associated processing model.

Thanks to Robin for his help in understanding all of this mess.