APPLICATION NOTE: TCAT Application to AT&T Phone System Crash

Summary

On January 15, 1990, one of AT&T's #4ESS toll switching systems in New York City experienced an intermittent failure that caused a major service outage on the AT&T U.S. National Telephone Network [Reference 1].

The error was introduced due to a software flaw that had escaped detection even by AT&T's extremely sophisticated software test methods. The error, however (as described in [Reference 2]), would have been revealed with attainment of complete C1 coverage???

Error Description

What was reported in ACM's Software Engineering Notes [Reference 2] is that the software defect was traced to an elementary programming error, which is described as follows:

In the offending "C" program text there was a construct of the form:


/* ``C'' Fragment to Illustrate AT&T Defect */

do {
switch expression {
...
case (value):
if (logical) {
sequence of statements
break
}
else
{
another sequence of statements
}
statements after if...else statement
}
statements after case statement
} while (expression)

statements after do...while statement

Programming Mistake Described

The mistake is that the programmer thought that the break statement applied to the if statement in the above passage, was clearly never exercised. If it had been, then the testers would have noticed the abnormal behavior and would have been able to corr

The only caveat to this statement is the following: it is possible that tests applied to the code contain information which would reveal the error; however, if the testers do not examine the output and notice the error, then the deficiency is not with th

In the case of a misplaced break statement, it is very likely that the error would have been detected.

References

1. "Can We Trust Our Software?", Newsweek, 29 January 1990.

2. ACM SIGSOFT, Software Engineering Notes, Vol. 15, No. 2, Page 11ff, April 1990.