Testing · Navigating the World

How can I possibly test "all the stuff" every iteration?

It's never possible to test "all the stuff" because there are so many variables involved in running any piece of software anywhere that there's always another test that could be performed. What is possible, though, is to decide what is the important stuff to test given what we know about stakeholder concerns, risks to business value, time available, the software, and other relevant factors.

Firstly, does every single feature or area of the software really get changed every single iteration? The whole point of iterating over software during the development is to introduce small, but valuable changes and deliver them to users. Since the changes are small, most of the software is the same as it was in the last iteration and therefore does not need to undergo comprehensive testing again.

Secondly, you need to determine the level of risk introduced by these changes. What is being done to the software that introduces so much risk every iteration that someone believes it needs to be tested in its entirety? If something is being done that introduces that much risk, why is that being done and why is it being done every iteration? Thinking about risk helps you to work out what is an appropriate amount of time and effort to spend reviewing the changes. Sometimes that will include not looking at them at all. Maybe changes have broader scope than anticipated, or they introduce unintended consequences in other parts of the software. While these are fair concerns, most changes made to the software don't impact the entire system and you should not assume by default that they might.

Next, even if you're the only tester in your team, it doesn't mean you're the only person who can take on testing tasks. Perhaps you can suggest that someone else should pick up the task of checking that bug fix, or that you'd like to pair with someone to review the coverage of this test suite and see whether it can be extended to remove a day's manual effort at the end of each sprint, or that you think it would be a good idea to get together as a team to think about edge cases before coding the next feature so that more robust testing can be done during development. Similarly, some development tasks might not need to be tested or verified by a dedicated tester, but by anyone who did not write the code.

In the same vein, wise use of automation might take some of the work off your plate. If there's some time-consuming repetitive testing tasks that are mechanical and boring to do, then they're likely to be done badly or not at all. Look for ways to subcontract that work to automation and free a human up to do something they're better suited to. While automation has limits and can’t check everything (see Let's just automate the testing), it can be very helpful. You need to understand what automation can and can’t do in your project, so you can make informed decisions about which features and paths you don’t need to spend much time on. When someone asks you to repeatedly perform specific tests, you might use that as opportunity to discuss spending some time on improving automation instead.

Attempting to test "all the stuff", whether every iteration or otherwise, is a fool's errand. Instead, you need to understand the scope of what's changed, analyze the risks created by those changes and then design tests accordingly. Remember that other members of your team can probably assist with some testing tasks and automation used wisely will likely help you to reduce the amount of testing you need to do whenever the software changes.

Developers can't find bugs in their own code

Developers can and do find bugs when coding. They not only find them in the code they are writing now, but also in code they wrote earlier, in their colleagues' code, and in the code of third-party libraries and applications they are using.

Some people think developers can't find bugs because they don't realise that the developer is likely testing at layers lower than the tester. The developer might test something impossible to know about at any higher level. Something obvious from white box analysis could be a hidden, random guessing game from the perspective of a black box tester.

Developers - with their intimate knowledge of their own code - have hunches and concerns that can help testers find rich areas of concern to investigate. This presents an awesome opportunity for testers to broaden their understanding of risk by collaborating with developers.

The developer's detailed knowledge of their code has a downside, though, in that they lack critical distance. Critical distance may be defined as the totality of differences between any two ways of relating to or thinking about the same thing. Software testing in general benefits from critical distance and testers can identify problems that the developer themself was unlikely to spot.

Another way that the developer might have trouble is when the bug arises from a misunderstandings or mistaken assumptions. These represent blind spots that impacted the developer's choices while coding and are also likely to impact decisions they make crafting a test strategy. There are ways to get around these blind spots, but they are a difficult hurdle, particularly when we don't realize they are there. This is an instance where someone not as closely tied to the implementation choices, e.g. a tester, can often do better with testing.

Testing optimized toward exposing deep, complex problems takes uninterrupted effort and time. The typical time demands on a developer mean they either spend it designing a solution, coding a solution, searching for a fix to a problem, or coding that solution. These demands compete with the time for testing.

Developers find - and fix - bugs all the time during their coding work. They don't typically find all of the bugs in their code and are less likely to identify some kinds of bugs, but the same can be said of testers too. The combination of developer testing where it makes sense and testing performed by expert testers is likely to increase the chances of finding the bugs that matter.

We test to make sure it works

We test to learn and make sure we understand how it works, and for whom; and how it doesn't work, and for whom.

To get to an understanding of what "works" means for a particular piece of software, we start with testing to explore and mentally model it. As we learn more and come to an agreement with stakeholders about what it means for the software to be working, we must still be mindful that any certainty we’ve gained is limited to the types and variety of testing and experimenting we've performed. We can still be fooled – there may be dependency on variables that we might not even be aware of, it might appear to work for us but might not to the customer (and vice versa), or we may have failed to consider particular types of user.

So, when we test the software, we might explore who is likely to be using it, the things they might use it for, and what they would hope to get from it. It’s also worth finding out why we built this particular solution, what else we tried, and why the others were rejected. Through collaboration we are uncovering what "it works" means for different users (and stakeholders), situations, in relation to the purpose and intent of the product. All of these different components of the context of the software help us to test whether it gives the required value to the people who matter and also what risks it poses to these people.

With context in mind, we strive to understand what the software does well enough to tell the team making it enough that they can determine what could be changed to align with the intent. Our understanding also should enable us to tell the stakeholders enough that they can determine if what it does is sufficient to be deliverable.

We also need to be interested in whether the product does things that we don't intend it to, or that users expressly do not want, and what the effects of those things might be. In this sense, we also consciously test to find out how it might not work.

No level of testing can provide certainty that the software works, so our understanding of perceived risks and threats to the value of the software drive our prioritization of what to test and to what degree. As we explore, our assessment of risk may well change in light of new information too.

Testing is just to make sure the requirements are met

Requirements come in many different shapes and sizes. While checking (explicit) requirements can be important, testing has much more to offer than just demonstrating that the software fulfils such requirements.

Many companies consider written requirements as their only requirements. This is a very narrow view of requirements. The majority of requirements are implicit and fuzzy. Requirements are fallible because the people who write them are fallible - they are often incomplete, ambiguous and contain internal contradictions.

Testing is a way of discovering unstated requirements and can often only come from a really thorough evaluation of the software. Testing can (and should) question if the requirements would actually solve the problem that the software is meant to solve, or if there is a different way it can be solved which is more appealing to the client. Testing can tease out ambiguities and clarify uncertainty, as well as discover and expose constraints.

There's also often an assumption that the requirements satisfy the needs of the stakeholders, and that the relevant set of stakeholders has been consulted and had their needs taken into account. Testing should ask whether we've met the requirement that the requirements make sense to the right set of people.

It's important to note that testing cannot ensure that requirements are met. Testing is sympathetically sceptical about the project. It seeks to help the stakeholders to build the best version of the thing they want within their constraints.

Moving away from viewing testing as just a way to verify that the software meets the explicit requirements allows for more space for testing to add value to the project. Testing is a way of evaluating the software and its requirements, as well as identifying additional requirements, discovering ambiguities and exposing constraints.

Why didn't you find those issues before we shipped?

There are many reasons why issues find their way into released software - and we expect this to happen.

We should first note that we have a collective responsibility - from management down - for what we deliver. A healthy organization will be able to use this question as a means to analyze and reflect on their methods, procedures and practices.

We may have missed issues before we shipped because:

We didn't look, or
We didn't look in the right places, or
We looked in the right places but didn't provoke the issue, or
We looked in the right places and provoked the issue, but didn't realise
We looked in the right places, and provoked the issue, and realised, but didn't understand the impact

There are many possible reasons for the above:

Our ideas of what to review didn't cover this
Our risk assessment prioritised other areas
Our budget didn't permit us to test all of the places we thought of
Our ability to control the product behaviour is limited
Our visibility of the product behaviour is limited
Our access to environments in which to test is limited

And we could explain some of these reasons:

We didn't understand the domain implications
We didn't understand something important about how our customers would use the product
We didn't understand the code to a relevant depth or breadth
We didn't understand the requirements, implicit or explicit
We made some assumptions that were invalid
We didn't spend enough time with enough perspectives to think of this possibility
We didn't place a high value on checking our work relative to building it
We didn't place a high value on checking our work relative to shipping it

Mistakes happen because humans are fallible. However, we can learn with the aim of improving when we all share responsibility and avoid the blame game. So a better version of this question might be "Could we examine as a group why these issues were missed before we shipped the product, so that we in future can ship a better quality product to our customers? I would like you to be honest with me, and not hold anything back."

When is the best time to test?

The best time to test is, usually, now. You still need to think about what to test, how to test it, and why it might make sense to test it now.

Consider whether there's a good chance the effort will be worth the expense. Is there enough of something to be tested now? Is the information you get from the testing most useful right now? Is there actually an opportunity to test now?

Deciding what you could or should be testing now depends on many factors, including where the software is in its development lifecycle and how involved you can be as a tester in these various stages. Each phase of developing a product - from ideation to release - has some hypothesis that could benefit from being challenged, though.

Tasks like these could all be productive and cost-effective testing at different times:

Test a new market for opportunities
Suggest pros and cons of potential projects
Build models of potential features

Are observability and monitoring part of testing?

To the extent that observability and monitoring influence testability, they are part of testing.

Observability is an attribute of a system which refers to the ability to make observations of its behavior and state. Observability is not a kind of testing. Observability affects our ability to test, since we utilize and exploit a system's observability when we test.

Observability can help to answer questions after a product is released and so can serve multiple needs including testing, customer support and business intelligence. Observability exposes how our systems actually work, by making it easier to learn how our users actually use the systems that we built and shipped.

Turning to monitoring, most tooling that you'll integrate for logging, monitoring and telemetry will have a facility for interrogating the data it produces. This can help to answer questions, and so be part of testing, but can also be used productively to explore that data looking for questions to ask. Armed with this new understanding we can enhance our telemetry, craft better dashboards and alerts, tweak the infrastructure, or change the product - this can be considered part of testing too.

When combined with capabilities such as partial deployment and quick rollback, monitoring can provide risk reduction in a post-hoc way instead of as an expensive pre-emptive effort.

A system that has better observability has better testability, and we can use monitoring to suggest usage patterns that could inform future testing and also to run experiments we can't replicate in our labs. The fact that it can be used for testing, however, does not make it a part of it.

Information about how software is actually used, what parts are most visited, for how long, when, on what environments and by how many users is invaluable to a development team - this information, and potentially more, is provided by observability and monitoring. For testers, it can be extremely useful, helping to design better tests and to design testing environments that better model actual production environments.

The TOAD acronym, introduced by Noah Sussman and built on by Chris McMahon, can be helpful in understanding the tight connections between Testing, Observability And DevOps.

The idea behind TOAD is that there is a common thread between all three concepts and, when you focus on them, you are able to better understand your application. DevOps facilitates the development process from desktop to production. Observability tells us what has happened on the system in great detail. Testing helps us to understand what the system actually does. (Taken from Chris McMahon's blog)

The interplay of testing, observability and DevOps make sense. Let's use an example:

We create well designed automated tests to help us show our application works.
We add them to a pipeline to detect changes throughout development.
We observe if the changes in the behaviour of the application caused by the tests are acceptable or if they create new problems.

If monitoring is robust and thorough enough, it can reveal information about the system’s functionality, reliability, and other characteristics that are traditionally assessed during testing. Observability takes us further by understanding what is happening to our application(s) under test during our testing. This blurring of the lines is captured well by TOAD.

Observability and monitoring can both provide capabilities that help testing. There is value to and overlap with testing in expressive logging and tracing capabilities. Taking observability seriously (and thinking of it as a requirement of the software) allows you to understand and learn about the software you're testing, at a depth beyond UI "correctness" against modeled expectations. The ability to capture evidence of failure can help us move software engineering and the discipline of testing past shallow, binary "functioning to spec" checking to continuous learning and exploration.

How can I possibly test "all the stuff" every iteration? #

Developers can't find bugs in their own code #

We test to make sure it works #

Testing is just to make sure the requirements are met #

Why didn't you find those issues before we shipped? #

When is the best time to test? #

Are observability and monitoring part of testing? #