Application Note -- Software Test Tools Considered Harmful?

APPLICATION NOTE: Software Test Tools Considered Harmful?

By Dr. Edward F. Miller, President, Software Research, Inc.

Testing of the GUI part of a product can sometimes be a very complex task, and when the GUI is part of a client-server application in which more than one machine is involved, it can be even more complicated. Some test-tool suppliers would have you believe that you can never do tests on GUIs without object oriented (widget) level testing, and other organ- izations would have you believe that only ``true-time'' (100% realistic) recording and playback are needed. In practice, both are required, both are useful, and both methods have advantages and limitations.

True-Time Tests

Since the mid-1980's, when SR introduced automated capture and replay technology, first for MS-DOS and then for UNIX serial-port type environ- ments, the main technology for test capture and playback has been "true time" recording of test sessions. Keyboard and mouse activity events are recorded and played back in a way that maintains time and position based faithfulness to exactly what the user entered.

The preservation of ``true time'' provided for a reliable playback because, it was fairly assumed, the user would not make a recording of something that was unrealistic. Hence, during playback so long as the machine had enough capability, then the playback would not ``fail to synchronize''.

A failure to synchronize during playback causes the test recording to abort and typically signals that the application has changed in a way that the test has detected. The problem is that in some cases, when the application hasn't changed, the failed test would imply that it had incorrectly. Too many such false-negative results would tend to lead to the view that the test suite is unreliable.

Synchronization Aspects

Done correctly, and with the correct kinds of playback synchronization enhancements in place, true-time based testing can be extremely effective. Generally, such tests are very precise; this means that a true-time test will FAIL - i.e. correctly identify a defect - even if the smallest product change occurs.

From Xwindows there are some special considerations for this, such as how to relocate windows to the original positions, etc. These matters are discussed in more detail elsewhere, but it suffices to recognize that most operational objections are easily overcome.

Object-Oriented ("OO") Tests

More recently, with the increase in need for tests that execute without fail on multiple platforms (see below), SR's response has been to implement, on both UNIX and MSWindows platforms, a combination of true-time capability with a different kind of test recording based on use of special techniques to record which visual image, or object, is used, along with when and where it is used. Either can be used independently of the other, or they can be combined to deliver the best features of each.

Versions of such object oriented tests (OO tests) play back more reliably on multiple platforms because such tests inherently are less sensi- tive to tiny changes in the application window features, border width, placement, color, etc. Such tests' advantage is their conceptual simpli- city and cross-platform portability.

Where the Problems Lie

Almost all OO tests need a semi-invasive technique that may introduce its own kind of problems to the testing process. To run in OO mode on a UNIX platform, a capture/playback system must instrument the underlying windowing library (typically it is the Xt library for Motif applica- tions). When you run your application tests this way there is the risk that, because the application is not really the same one that you are going to be shipping to customers, you can't really be sure that you've tested the same software the customer gets. In other words, there is a risk that such invasivity - done to simplify the testing process - will: (a) cause errors that are present in the application to be missed by the tests (false positive tests); and/or, (b) find errors that aren't really present (false negative tests). Neither of these kinds of test outcome helps build confidence in the application's quality if they occur often.

Typically, one must link the application to the special Xt library. On some platforms this is done at run time, but on some platforms this "dynamic linking" is not available and a separate build is required. Separate builds introduce another source of potential error.

Another concern is that some approaches, in which the user "programs" the way the application is going to be exercised, make the testing pro- cess as vulnerable to error as the programming process itself. Here's why: the tester who programs tests does so to exercise what he believes is supposed to have been implemented. If the testers' program misses a required feature, the test will succeed incorrectly, but that feature will never have been tried. Even worse, if the testers' program incorrectly exercises a feature - for example, if it pushes a button that is not actually visible on the screen - then the test will also succeed but it will have done so using "illegal" means.

Detailed Analysis

This section analyzes these points (and a few more) in more detail, addressing the advantages and disadvantages of each method.

True-Time Advantages

Reproducibility True-time (TT) tests have excellent reproducibility: they confirm identical operation before/after a software product change, and typically detect the smallest change. The user can be assured that there is an EXACT match between what was recorded and what is played back.

Realism TT tests support realistic load generation. For example, if you are using the background mode X11 server, X11virtual, you know the load you are imposing is just like a real user's load. It has the same keys- troke rates and inter-key delays, the exact same excursion activity, and the exact same properties of clicks and drags.

Vectored Playback

Playback of input to something graphical, e.g. idraw, will be completely accurate. If the application accepts mouse vectors then true time is NEEDED for specific positioning and speed of mouse movement as both can effect what is drawn.

Missing Feature Detection

Good protection if you delete a feature: the test will almost certainly fail because the recorded test asks for something that is no longer there to be played back.

Added Feature/Modified GUI Detection

Good protection if you add a feature, which generally implies a change to the layout of the GUI. This will be detected as the test goes to the previous location of a UI object and tries to invoke it.

Screen-Position Dependent

No reprogramming if you change the name of a button: things are position dependent. The action will continue to work as it did, but if required the change in name can still be detected via screen comparisons.

Highly Sensitive Detector

TT testing is a highly sensitive to the smallest changes! Every change is flagged as a test failure. This gives the tester the ability to determine the significance of the change rather than having the tool determine the significance. However, this can also be a disadvantage.

True-Time Disadvantages

Brittle Tests

Tests fail for trivial reasons and, short of re-recording the test, or by use of OCR, there's little you can do about it. This is the disadvantage of the highly sensitive checking once the tester has determined the change is not significant.

System Dependency

If your window border offsets change, the whole suite could fail, necessitating more work. This goes back to the requirement that all tests need to be run from a known state, and with TT this includes the state of the operating environment as well.

Cross-Platform Testing

Running the same script on multiple platforms will not necessarily work. Differences between the platforms such as screen geometry, window system look and feel, and font support can all effect the ability of a TT test to run across different platforms.

Depends on Reproducible Behavior of System Under Test

If the system under test does not have the property that, given the same initial state and assuming that the same inputs are applied, then the same final state obtains, then TT testing may not work. Flight Simulator is a good example; a recorded flight generally won't reproduce with CAPBAK/X because the internal state is updated much faster than the resolution of the playback process (1 msec). Eventually the internal state differs enough so that the roundoff error in input event timing causes a slight state change. The plane you've recorded your flight on almost always crashes.

Widget Mode Advantages

Cross-Platform Testing

A test on one application will exercise the same features regardless of the platform. Widget mode tests are independent of screen geometry, window manager features/look, and font availability.

Less Brittle Tests

The same test passes if all the buttons have the same name but are in different location. This extends the life of a test due to changes from release to release of the screen layout.

Missing Feature Detection

There is good protection if you delete a feature. The test will fail as it attempts to provide input to a button or other object that is no longer present in the application.

Changed Feature Detection

There is good protection for a changed feature as the test will provide the old inputs into the object. This should result in improper outcomes for the test.

Application Change Insensitivity

OO tests are less sensitive to changes. Hence, they are more forgiving" of any noted regressions. This is particularly useful for an application going through a series of user interface modifications.

Reprogramming Simplified

Test scripts are easy to modify as they are not dependent on location, but simply on the object that needs to be exercised and the action necessary on that object. This allows for easy to read scripts that look like a series of function calls.

Widget Mode Disadvantages

Impossible Tests Possible

OO tests can run applications in a way that a user can not. The OO test can push an invisible button, can input to an object in an iconified window, or provide in put to a window obscured by other windows or off the screen. All of these actions are unrealistic and not providing for user level testing.

Extra Feature Detection

You have no protection if you add a button to a GUI. The test won't know something was added as the change to the user interface is not detected by the OO test.

Invasivity for Statically Linked Applications

OO testing is supported using a modified toolkit which is either dynamically linked in at run-time or statically linked at compile time. This means that the tested application is different from what will actu- ally be shipped to the user. This change should not be significant, but does add a risk to the released application.

Reprogramming Required

Reprogramming will be needed if you change the name of an object such as a button. These program ming changes are straightforward, but must all be made for the test to run correctly.

Unrealistic Load Generation

Playback in X11virtual will work but does not impose a 100% realistic user-like load. Recordings or programmed scripts in OO do not contain timing information so that the user interactions can at best be estimated.

Conclusion

It is clear that no single GUI test method will ever suffice to handle every situation, but it appears that the mixture of true-time and OO mode give the software tester the best possible chance to build tests that are: 7 Non-invasive if they have to be (using true-time recording); 7 Are based on what the GUI really shows (using the ability to derive tests from hands-on recording rather than programmatically); 7 Provides for testing of graphics - where true-time mode is critical - as well as testing of the GUIs that drive the graphics; and, 7 Reliable and flexible enough to provide real confidence increases in delivered product quality. Software testing is a complicated enough problem without burdening the tester with the need to choose between two effective test modes. Instead, it makes most sense to use both modes where they each make the best sense. Note: A shortened version of this Applications Note appeared in the February 1995 edition of TTN Online Edition.