APPLICATION NOTE: Software Test Tools Considered Harmful?
By Dr. Edward F. Miller, President, Software Research, Inc.
Testing of the GUI part of a product can sometimes be a very complex
task, and when the GUI is part of a client-server application in which
more than one machine is involved, it can be even more complicated. Some
test-tool suppliers would have you believe that you can never do tests
on GUIs without object oriented (widget) level testing, and other organ-
izations would have you believe that only ``true-time'' (100% realistic)
recording and playback are needed. In practice, both are required, both
are useful, and both methods have advantages and limitations.
Since the mid-1980's, when SR introduced automated capture and replay
technology, first for MS-DOS and then for UNIX serial-port type environ-
ments, the main technology for test capture and playback has been "true
time" recording of test sessions. Keyboard and mouse activity events are
recorded and played back in a way that maintains time and position based
faithfulness to exactly what the user entered.
The preservation of ``true time'' provided for a reliable playback
because, it was fairly assumed, the user would not make a recording of
something that was unrealistic. Hence, during playback so long as the
machine had enough capability, then the playback would not ``fail to
A failure to synchronize during playback causes the test recording to
abort and typically signals that the application has changed in a way
that the test has detected. The problem is that in some cases, when the
application hasn't changed, the failed test would imply that it had incorrectly.
Too many such false-negative results would tend to lead to
the view that the test suite is unreliable.
Done correctly, and with the correct kinds of playback synchronization
enhancements in place, true-time based testing can be extremely effective.
Generally, such tests are very precise; this means that a true-time
test will FAIL - i.e. correctly identify a defect - even if the
smallest product change occurs.
From Xwindows there are some special considerations for this, such as
how to relocate windows to the original positions, etc. These matters
are discussed in more detail elsewhere, but it suffices to recognize
that most operational objections are easily overcome.
Object-Oriented ("OO") Tests
More recently, with the increase in need for tests that execute without
fail on multiple platforms (see below), SR's response has been to implement,
on both UNIX and MSWindows platforms, a combination of true-time
capability with a different kind of test recording based on use of special
techniques to record which visual image, or object, is used, along
with when and where it is used. Either can be used independently of the
other, or they can be combined to deliver the best features of each.
Versions of such object oriented tests (OO tests) play back more reliably
on multiple platforms because such tests inherently are less sensi-
tive to tiny changes in the application window features, border width,
placement, color, etc. Such tests' advantage is their conceptual simpli-
city and cross-platform portability.
Where the Problems Lie
Almost all OO tests need a semi-invasive technique that may introduce
its own kind of problems to the testing process. To run in OO mode on a
UNIX platform, a capture/playback system must instrument the underlying
windowing library (typically it is the Xt library for Motif applica-
tions). When you run your application tests this way there is the risk
that, because the application is not really the same one that you are
going to be shipping to customers, you can't really be sure that you've
tested the same software the customer gets. In other words, there is a
risk that such invasivity - done to simplify the testing process - will:
(a) cause errors that are present in the application to be missed by the
tests (false positive tests); and/or, (b) find errors that aren't really
present (false negative tests). Neither of these kinds of test outcome
helps build confidence in the application's quality if they occur often.
Typically, one must link the application to the special Xt library. On
some platforms this is done at run time, but on some platforms this
"dynamic linking" is not available and a separate build is required.
Separate builds introduce another source of potential error.
Another concern is that some approaches, in which the user "programs"
the way the application is going to be exercised, make the testing pro-
cess as vulnerable to error as the programming process itself. Here's
why: the tester who programs tests does so to exercise what he believes
is supposed to have been implemented. If the testers' program misses a
required feature, the test will succeed incorrectly, but that feature
will never have been tried. Even worse, if the testers' program
incorrectly exercises a feature - for example, if it pushes a button
that is not actually visible on the screen - then the test will also
succeed but it will have done so using "illegal" means.
This section analyzes these points (and a few more) in more detail,
addressing the advantages and disadvantages of each method.
Reproducibility True-time (TT) tests have excellent reproducibility:
they confirm identical operation before/after a software product change,
and typically detect the smallest change. The user can be assured that
there is an EXACT match between what was recorded and what is played
Realism TT tests support realistic load generation. For example, if you
are using the background mode X11 server, X11virtual, you know the load
you are imposing is just like a real user's load. It has the same keys-
troke rates and inter-key delays, the exact same excursion activity, and
the exact same properties of clicks and drags.
Playback of input to something graphical, e.g.
idraw, will be completely accurate. If the application accepts mouse
vectors then true time is NEEDED for specific positioning and speed of
mouse movement as both can effect what is drawn.
Missing Feature Detection
Good protection if you delete a feature: the test will almost
certainly fail because the recorded test asks for something that is no
longer there to be played back.
Added Feature/Modified GUI Detection
Good protection if you add a feature, which generally implies a
change to the layout of the GUI. This will be detected as the test goes
to the previous location of a UI object and tries to invoke it.
No reprogramming if you change the name of a button: things are
position dependent. The action will continue to work as it did, but if
required the change in name can still be detected via screen comparisons.
Highly Sensitive Detector
TT testing is a highly sensitive to the smallest changes! Every
change is flagged as a test failure. This gives the tester the ability
to determine the significance of the change rather than having the tool
determine the significance. However, this can also be a disadvantage.
Tests fail for trivial reasons and, short of re-recording
the test, or by use of OCR, there's little you can do about
it. This is the disadvantage of the highly sensitive checking once the
tester has determined the change is not significant.
If your window border offsets change, the whole suite
could fail, necessitating more work. This goes back to the requirement
that all tests need to be run from a known state, and with TT this
includes the state of the operating environment as well.
Running the same script on multiple platforms will not necessarily
work. Differences between the platforms such as screen geometry,
window system look and feel, and font support can all effect the ability
of a TT test to run across different platforms.
Depends on Reproducible Behavior of System Under Test
If the system under test does not have the property that, given
the same initial state and assuming that the same inputs are applied,
then the same final state obtains, then TT testing may not work.
Flight Simulator is a good example; a recorded flight generally
won't reproduce with CAPBAK/X because the internal state is updated much
faster than the resolution of the playback process (1 msec). Eventually
the internal state differs enough so that the roundoff error in input
event timing causes a slight state change. The plane you've recorded
your flight on almost always crashes.
Widget Mode Advantages
A test on one application will exercise the same features
regardless of the platform. Widget mode tests are independent of screen
geometry, window manager features/look, and font availability.
Less Brittle Tests
The same test passes if all the buttons have the
same name but are in different location. This extends the life of a
test due to changes from release to release of the screen layout.
Missing Feature Detection
There is good protection if you delete a feature. The test will
fail as it attempts to provide input to a button or other object that is
no longer present in the application.
Changed Feature Detection
There is good protection for a changed feature as the test will
provide the old inputs into the object. This should result in improper
outcomes for the test.
Application Change Insensitivity
OO tests are less sensitive to changes. Hence, they are more
forgiving" of any noted regressions. This is particularly useful for
an application going through a series of user interface modifications.
Test scripts are easy to modify as they are not dependent on
location, but simply on the object that needs to be exercised and the
action necessary on that object. This allows for easy to read scripts
that look like a series of function calls.
Widget Mode Disadvantages
Impossible Tests Possible
OO tests can run applications in a way that a user can not. The
OO test can push an invisible button, can input to an object in an iconified
window, or provide in put to a window obscured by other windows or
off the screen. All of these actions are unrealistic and not providing
for user level testing.
Extra Feature Detection
You have no protection if you add a button to a GUI. The test
won't know something was added as the change to the user interface is
not detected by the OO test.
Invasivity for Statically Linked Applications
OO testing is supported using a modified toolkit which is either
dynamically linked in at run-time or statically linked at compile time.
This means that the tested application is different from what will actu-
ally be shipped to the user. This change should not be significant, but
does add a risk to the released application.
Reprogramming will be needed if you change the name of an object
such as a button. These program ming changes are straightforward, but
must all be made for the test to run correctly.
Unrealistic Load Generation
Playback in X11virtual will work but does not impose a 100%
realistic user-like load. Recordings or programmed scripts in OO do not
contain timing information so that the user interactions can at best be
It is clear that no single GUI test method will ever suffice to handle
every situation, but it appears that the mixture of true-time and OO
mode give the software tester the best possible chance to build tests
7 Non-invasive if they have to be (using true-time recording);
7 Are based on what the GUI really shows (using the ability to
derive tests from hands-on recording rather than programmatically);
7 Provides for testing of graphics - where true-time mode is critical
- as well as testing of the GUIs that drive the graphics; and,
7 Reliable and flexible enough to provide real confidence
increases in delivered product quality.
Software testing is a complicated enough problem without burdening the
tester with the need to choose between two effective test modes.
Instead, it makes most sense to use both modes where they each make the
Note: A shortened version of this Applications Note appeared in the
February 1995 edition of TTN Online Edition.