636.692.1426
fritz@archdevops.com

Tripwire Automation

DEVOPS IS MORE THAN JUST DELIVERING SOFTWARE.

Tripwire Automation

It’s possible to write too much test automation.

I get that it’s cool and fun to come in and slam out code that does work for you so you don’t have to. Getting to learn new tech to get the job done is a great feeling.

But, as with any code, each line that you write has to be maintained.

That’s a problem if you’re writing a ton of automation. If it’s not maintained, because you’re spending all your time writing new automation, then your current automation can suffer from bit-rot.

Another problem that comes up is when multiple tests fail, and it turns out it’s for the same reason. You’ll have to spend time and money investigating what happened, which takes away from other fun stuff.

If writing lots of automation will be a problem, then a solution for that is to write less automation. The question is: what automation do you write?

This leads me to a concept called Tripwire Testing.

I’m sure you know what a tripwire is. You see them in movies. The really cool ones are made out of infrared lasers and you need special glasses and canned air to see them. All they do is let you know when someone or something has crossed a threshold.

When the tripwire is tripped, then you go investigate. It could be a bad guy, or it could be a cat. Either way, you can respond appropriately once you know what caused it.

For software then, instead of testing the condition by looking everywhere sequentially (even if it’s really quick), you, the human, inspect the area whenever a qualifying event happens (such as a particular test failing).

Like a tripwire going off.

What Does This Look Like?

First, recall that the purpose of automation is to remove repetitive work–in this case, having to run tests manually.

Second, pulling this kind of automation off will require you have a firm knowledge of your code base. Automation’s job is not to tell you information about the system, that you don’t know how to find out yourself.

Tripwire automation has the following characteristics:

  • a certain type of failure should only show up once per group of tests run
  • every test written should give you unique info about the code under test

Checking for Certain Failure Types

If you had 50 tripwires across a door, and someone walked through it and tripped them all, what would happen? What would happen if something smaller walked through and tripped only 5?

Wouldn’t it mean the same as if one tripwire was strategically placed so that anytime anything walked through the door, it would go off? You’d still have to check.

In the same way, if you have 50 tests that all fail for the same reason, you’d have to go check. Unlike a physical tripwire though, you’d have to investigate all the failures. What a colossal waste of time!

That says there a different problem. Maybe you have some bad data in a database, or a webservice that’s upstream that’s misbehaving.

A second tripwire is needed in this case. Create a test that checks for that specific problem, then make the first set of tests dependent on this new test passing.

Cleaning Up

As part of adopting this style of automation, it’s important to take stock of what each test, and each suite of tests, are doing.

If you have a suite of tests, what is the whole thing trying to test? Does it look like there’s a test for every possible combination of parameters?

Why? Are that many tests required, or did the tests get written with thoroughness and due diligence in mind?

They probably meant well, but again, there’s no sense having multiple tests that look for the same kind of bug.

What kind of bug is being looked for? Is it possible to pare down the tests to something more strategic?

Example

An example is one I had at a previous client. The developers on the team have a simple test for the team’s webservices. All they do is make sure the services are up and the ones that need to talk to each other, can.

We run those tests first, and they take maybe 1 minute to run. If any of them fail, none of the 100s of subsequent tests get run.

Now that’s what I call failing fast. Can you imagine how much time and frustration it would be, to run the 100s of tests after that, without knowing if all the services are up, and groups of tests fail? It’d be very expensive to troubleshoot those tests.

Need an automation strategy? Wrestling with your current one? Let’s talk.

Schedule your free 15-minute discovery call here.