Automated Testing Script Generation

During a recent SVG Working group discussion, we talked about how to improve the SVG test suite. Part of the problems in the development of the test suite is the sheer number of tests to be produced (i.e. written + reviewed + approved) and then tested against every implementation. The purpose of the testing is to check if every feature passes in at least two independent implementation: that is the criteria to exit Candidate Recommendation. The SVG Tiny 1.2 test suite already contains 496 tests, the SVG 1.1 2nd edition has 526. The SVG 2 spec will probably have more. Having one person test all features is not realistic anymore.

The CSS WG has been working for a long time on this issue. CSS 2.1 has 9685 tests! Together with a team from W3C, the WG set up a test suite server, where one can browse the CSS Test Suite, test its viewer and report the results. All results are then aggregated. The W3C architecture allows for 2 types of tests:

  • ref tests (see Mozilla’s use of ref tests), where a test is provided together with an alternative way of producing the same result (visually).
  • JavaScript tests, where together with the test, one provides JavaScript calls to test that the DOM tree is in the expected state.

Both of these types require quite some work to produce either the reference image or the expected results. This can be tedious. Also, for interactive cases, one would have to create the script to test all interaction paths. And for animation, one would have to author the JavaScript to test changes at some point in time.

In GPAC, we experimented a different approach. In this approach, a test file (say file.svg) is loaded and played by the browser. During that playback, all user interactions are recorded in a script file (say file.xvs): mouse events, keyboard events … Some special (user-defined) events are used to trigger the generation by the browser of a snapshot. For instance, “CTRL+End” can be used to generate a snapshot ‘now’ or “CTRL+Next” at the next frame change (ie. when an animation produces a different visual result). Later on, the script file (file.xvs) is loaded. It triggers the loading and playback of the test file (file.svg) and the player reproduces all events, at the exact time (relative from the load) it was generated in the first run. Snapshots are also generated and compared with the reference snapshot. If all snapshots are the same, the test has passed. If not, this is a fail.

An example of an XML Validation Script (XVS) used by GPAC is given here:

<TestValidationScript file=”SVG_WG\SVG\profiles\1.1F2\test\svg\animate-elem-09-t.svg” >
<snapshot time=”237″ image=”animate-elem-09-t-reference-000.png” />
<mousemove time=”4749.000000″ x=”99″ y=”354″ />
<mousemove time=”4749.000000″ x=”103″ y=”343″ />
<mousemove time=”4749.000000″ x=”108″ y=”333″ />

GPAC also uses an XML Validation List (XVL) to test multiple files sequentially, this manifest is given below:

<TestSuiteValidationScript content-base=”SVG_WG\SVG\profiles\1.1F2\test\svg” >
<Test scenario=”animate-elem-02-t.xvs” content=”animate-elem-02-t.svg” />
<Test scenario=”animate-elem-03-t.xvs” content=”animate-elem-03-t.svg” />
<Test scenario=”animate-dom-01-f.xvs” content=”animate-dom-01-f.svg” />
<Test scenario=”animate-dom-02-f.xvs” content=”animate-dom-02-f.svg” />

In the current implementation, the snapshot is an image but this could be extended to a DOM snapshot. Similarly, we chose to record the events in an XML form. We could use JS file such that the browser upon loading executes the JS file to generate the events. In GPAC, we still have some synchronization issues: snapshots are not generated at exactly the same time during recording and playback. This is a limitation that could be fixed. For now, we render animations at a low frame rate to avoid that. You can check the source code of the GPAC module that does that here.

I think this method is interesting compared to the existing approaches because it eases the creation of the tests. No need to generate an alternative image. No need to generate associated JS file. It requires an implementation capable of producing the correct result. It may have some security issues because it requires the driving of the browser by an external script. But I’m not sure about that since Opera is using driver for its browser driven by scripts (see OperaWatir). This method was initially targeted for regression testing. It assumed that at some point the (only) implementation gave a correct rendering of a test. This could be extended in the context of standardization activities, where one writes a test and normally should have at least one implementation given the expected results (at least in most cases).

Leave a Reply

Your email address will not be published. Required fields are marked *