Over-reliance on testing considered harmful!

Few years back I bought a costly smart phone attracted by its “cool” features. Once I started using it, I realized that it was way too much buggy. Let me give two examples of the problems I faced with the phone.

The mobile came with supporting software in a CD. After installing the software and started it, the software crashed throwing me a Visual C++ runtime error: “R6025: Pure virtual function call” error! As a C++ expert, I could easily understand the bug: It is incorrect to call a pure virtual function from a constructor, and when we make such a call, it will crash the application. But the question is: Why hasn’t the testing team for that mobile phone software didn’t catch it before releasing it?

A strange problem with my mobile was that it “froze” or “hanged” if I talked for “too long”. After a few freezes, I figured out that “too long” means approximately half-an-hour, which is not really “too long”! By “froze” or “hung” I mean the screen would go blank, and it wouldn’t respond to any key presses. So, I couldn’t “restart” the mobile (remember the mobile wouldn’t restart for pressing any keys). The only way I could fix the problem is to remove the battery and put it back! Later I also figured out a “work-around” – if I plug the mobile in its charger, it will immediately spring back to life! This discovery showed me an important aspect of the problem – since it recovered when an event occurred (inserting charger in this case), it is likely to be a software bug!

When I talked to my friends about this problem, most of them said were not surprised – they said that bugs are very common in mobiles. Reading about the strategies of mobile phone companies and talking to my friends working in those companies was enlightening: for the mobile phone companies, time to market is the most important factor that gives competitive advantage and drives business success.

Mobile phone companies face cut-throat competition from other mobile companies, and only those who deliver the “latest” and “coolest” functionality first to the market survive!

In this cut-throat competition, the main quality strategy used by these companies is to test the software: if the tested features work, it goes to the market. May be there is a bit of exaggeration in this statement, but it more or less captures the essence of how these companies focus on functionality, and how they use testing as the primary means to check quality.

The situation is not so different in the software industry where testing is the main approach for improving software quality. According to Boris Beizer, testing accounts for approximately half of the total software development costs! Depending on size and type of the software company, ratio of developers to testers range from 5:1 to 1:1! It is safe to say that software companies over-rely on testing.

Let us take a holistic view of testing to understand why focus on software testing is not the way to create high software quality.

“Testing can show the presence of bugs, not their absence” – E.W. Dijkstra

Dijkstra’s statement appears to be a clever play of words, but if you think about it, it is insightful. What he means is, with testing you can only check if the software has bugs. If you encounter no bugs while testing, it just means that testing did not uncover any bugs; however, you cannot say there are no bugs in the software. For this reason, by doing testing alone, you cannot say that software will work fine. Why? Simple – since it is infeasible and impossible to test all the possibilities.

This meaning is reflected in US Food and Drug Administration’s guidance statement on validation for medical software: “Software testing by itself is not sufficient to establish confidence that the software is fit for its intended use”.

Yes, there has been considerable progress in software testing research and today there are sophisticated testing tools available, but the statement that Dijkstra made and the position of FDA on testing still holds. To get a better clarity on the limitations of testing, let us discuss testing from two different perspectives.

Real-world software is complex, and its complexity is rising every year. For example, Windows 3.1 size was approx. 4 million LOC in 1990 and in 2012 it was 40 million LOC for Windows XP.

I don’t know the sizes of latest releases of Linux or Windows, but you can make educated guess on their size. Many applications I know are more than a million LOC, and their size is increasing every year. The humungous size of real-world code bases makes testing extremely difficult. Let me give a simple example.

Few years back when I was writing code in Eclipse 3.1, it crashed. I just checked its stack trace. The stack trace revealed that the bug was a NullPointerException in org.eclipse.jface.dialogs.DialogSettings.load method in DialogSettings.java in line 278. The stack trace had 56 method calls in it starting from org.eclipse.equinox.launcher.Main.run method in Main.java line 1236. If I were to fix this bug as a developer, it would be so difficult just to reproduce this problem by creating a test case on my own. Further, it would be close to impossible to make sure that whatever the fix I make is correct and that I have not broken any existing functionality. Obviously Eclipse is a huge code base, and there is a large testing infrastructure for Eclipse. Still, any developer who has worked in such large code bases will understand what I said – the complexity of testing in such code bases is overwhelming. In other words, the enormous complexity of the software renders testing is ineffective no matter what testing technique, method or process you use.

From another perspective, the bugs found during testing are only the tip of the iceberg: For every bug found in testing, there are 10 or more bugs lurking beneath yet to be uncovered! In this way, software bugs are akin to icebergs – what we see is the tip of the iceberg and 1/8th to 1/10th of the iceberg is submerged beneath the water. In other words, these “latent bugs” lurk in the darkness only to be exposed later when changes are made to the software or when a user attempts using some unusual functionality. For certain kinds of bugs such as data-flow based or functionality based, testing is effective. For other kinds of bugs such as design bugs, testing is ineffective.

To summarize, focusing extensively on testing is not the right strategy for ensuring high quality of the software. Then what is the alternative? The answer is something that is well-known for many decades. What is it?

There is no “silver bullet” for creating high-quality software, but there is one solution that comes close. It is the lowly and humble manual reviews! For a long time, it is well-known that using manual review techniques such as peer-reviews, inspections, code and design walk-throughs are effective in creating high-quality software. There are many reasons why manual reviews are more effective than testing. First, manual reviews not only find actual bugs, they also find potential and latent bugs. They can also find hard-to-find bugs early in the software development lifecycle such as design bugs (this fact is important because luminaries such as Capers Jones found that 25% to 64% of bugs in software are design bugs). Further, reviews don’t just find bugs, they also identify weak spots in software where improvements are required, and hence help improve software quality.

Though manual reviews are used in the software industry, it hasn’t received the focus and attention it deserves. If manual reviews are very useful why aren’t they used extensively? The main reason why manual reviews are not extensively applied in practice is that they are effort intensive. For example, because reviews are effort intensive, most development projects in the hurry to meet deadlines skip manual reviews and depend on testing to catch the bugs. What can we do about that? Fortunately, static analyzers can address this limitation of manual reviews.

Starting from the days of simple “lint”-like pattern matchers, static analyzers have come a long way. Today, sophisticated static analyzers are available that use modern and advanced techniques such as model checking, abstract interpretation, and program querying. These tools are able to find bugs that are hard to detect with testing or even manual reviews. These tools also scale well to work on large scale software and find bugs in millions of lines of code with ease. Though many of these tools are costly, since they find hard-to-find bugs, the ROI (return-on-investment) is very good. Research shows that a considerable percentage of bugs that are detectable by manual reviews can be automatically detected using static analyzers.

By integrating static analyzers as part of the software development lifecycle, bugs can be found earlier. After that if extensive manual review is performed on the code, review can focus on more hard-to-find bugs. With extensive static analysis and manual reviews, testing will find only a few bugs (mostly that relate to interface and context related problems since they are not well addressed by manual reviews and static analyzers).

Consider the two examples I mentioned in the beginning of this article. The call to pure virtual function inside a constructor is a bug that is easily catchable using both static analysis tools as well as manual reviews. Only if we create right test-case(s) and if the bug manifests in the given context testing can find this bug; otherwise, it wouldn’t. For the problem in event-handling code, manual reviews can help find it since it is likely to be a design problem. Static analyzers are not effective (yet) in finding such problems. Advanced testing techniques such as model-based testing can help find such bugs, but MBT is not widely used in practice.

When manual reviews are combined with static analysis tools and are made as integral part of the development lifecycle, they can help create high-quality software. Of course, testing has an important role to play, for example in acceptance testing. However, the right strategy for creating high-quality software is to focus extensively on manual reviews and static analysis in addition to testing the software. Summary: We have to find defects earlier in the lifecycle proactively; because testing finds defects relatively later in the lifecycle and is reactive in nature, we should not rely too much on it to ensure high-quality of the software.

Leave a Reply

Your email address will not be published. Required fields are marked *