Enhancing Automatic Tests for Graphics-Based Applications

Enhancing Automatic Tests for Graphics-Based Applications

This paper describes the problems of using image comparisons in automatic tests and reviews two image processing algorithms, which can be integrated into the existing automatic test tools.

By: David Birin


This paper describes the problems of using image comparisons in automatic tests and reviews two image processing algorithms, which can be integrated into the existing automatic test tools.

Problem description

The current automatic functional test tools do not provide the solutions for working with graphics based applications such as GIS / CAD / computer games etc.

These applications are characterized with a graphic screen, which displays an image that changes dynamically according to user inputs.

Most of the testing activities on such applications are reduced to viewing these changes and verifying that the new image represents a valid result to the operation. This is mostly done by manual testers. They perform an action and examine the resulted image, often using the rule of thumb “if it looks good – it’s o.k.” This rule can be translated into an add-in, which can be integrated with the existing functional testing tools.

In addition, the amount of comparisons is also an important factor. Let’s take an animation renderer, for example. A bug was fixed in the system and now we need to perform regression tests with a previously rendered clip. The test case is a 3 minute clip, and there are 30 frames per second. Simple math shows that there will be 5400 comparisons; this task is almost impossible for a human tester. That makes it a great candidate for automation.

Current solution

Most of the functional testing tools provide functions for bitmap comparison. In such case, when the tests are run, the visible image is compared with a reference image that was saved earlier. Sometimes, these functions have a tolerance argument, which describes the percent of pixels that can be different between the two pictures. This tool has two major lacks. The first is the ability to define the rules for the tolerance. For example, for a picture where all the pixels moved one pixel to the right, the bitmap comparison will report that the new picture and the reference picture are different in 100%, and therefore the test will report failure. By the definition of the application such state is not considered a bug. This cannot be implemented using the existing tools. The second lack is relevant to the time spent when analyzing the tests results. Although the bitmap comparison has ended in failure, it’s important to inspect why. Going over the picture pixel by pixel to find the pixels that are changed is a Sisyphean and almost impossible job.

Proposed Solution

An add-in was developed in order to solve this problem. Such an add-in can be personalized to the client needs, using different image processing algorithms which apply to set of rules as predefined by the subject matter experts for success / failure. Combining such a tool into an existing set of automatic tests can broaden the coverage of the tests, increase the efficiency of the tests, and provide results with a higher precision than experienced before. This add-in drastically reduces the number of false positives (failures that are not actually failures) caused by differences in random number generation, OS or machine architecture differences.

The add-in is COM based; COM add-ons are supported by all the major functional test tools.

Solution description

This section will review two algorithms for comparison. It is important to understand that there is no “magic solution” and the solution needs to be suited to the client needs.

First solution: The first step in this solution is to prepare reference pictures to use in the tests. The reference picture is converted into a special 3-colored picture, which represents the background, the shape, and the shape borders. During the test run a screenshot is saved. This screenshot is also converted into the 3-colored representation and then compared to the reference picture in the images displayed in appendix 1; a bug in the zoom mechanism is simulated. This bug displays picture in 0.95 of their original zoom. The comparison will end in a failure, it can be rerun with a bigger tolerance to measure the error factor. The comparison also outputs other data such as the number of border pixels which moved. This data is used later when estimating the error severity. Another important feature of the comparison is the possibility to see the error. Our comparator colors the pixels that do not pass the defined rules with a different color; this helps the fast detection and analysis of the errors.

Second solution: This solution is using perception based image comparison process [1] that is used to tell when images are perceptually identical even though they contain imperceptible numerical differences. For example: some pictures will look identical to a human eye although they have changes in color factors (for example, the image is a bit darker).

The algorithm analyzes the visible picture and the reference picture and by using predefined mathematical formulas[2], it can disregard differences, which result from spatial frequency sensitivity, luminance sensitivity, color sensitivity. As displayed in appendix 2, the three images are almost identical, the only difference between them is the Gamma factor. Comparing these images using this solution (after deciding what is the threshold for Gamma bugs) will give us the result “Images are perceptually indistinguishable”.


While the current automatic testing tools do not provide substantial support in testing graphic applications, there is a real need to overcome this barrier. As described in this article, it is possible by developing a custom solution for each problem using advanced image processing techniques. The solution can be later integrated into the widespread existing tools. The described above add-in increases the coverage of the automatic tests into areas that were considered terra incognita. It also gives better confidence in the results of automatic tests in accordance to manual tests.


[1] Yee H, A Perceptual Metric for Production Testing in Journal of Graphics Tools, 2004 [2] Yee H, Sumanta P, Greenberg P.D, Spatiotemporal Sensitivity and Visual Attention for Efficient Rendering of Dynamic Environments in ACM Transactions on Graphics, 20(1), January 2001