Case Study: Insurance

Real-world examples of automancery at companies like yours

Fun Fact: Did you know that you can insure insurance?

Well we didn’t. Not until we took on a project at Reinsurance Group of America or RGA.

Life insurance can be expensive. Individual brokers often don’t have deep enough pockets to offer a high enough payout to their clients.

But RGA has deep pockets. Really deep ones. And for a monthly premium, they’ll offload some of that risk from those brokers so the brokers can offer higher quality packages to their clients. It’s a win-win.

However: that’s just the tip of the iceberg. All the regulation, screening, approvals, actuary, payments and data sanitation are lurking under the surface, being complex and unwieldy. And the poor testing team has to grind through the tests, or maintain constantly-breaking automated scripts, or wait awhile for test runs to be complete. For them? It’s a lose-lose.

Until we got there.

Team #1: Refactoring and Cutting Automation Run Time in Half

On the Claims team they had quite a bit of testing automated already. Around 200 tests ran regularly, every day, and they’d take about 90 minutes to run. Not ideal, but they lived with it.

The first challenge here was helping clean up the code. They made heavy use of Cucumber (which is an awesome tool that lets you write your tests in plain English). But inspecting the code at deeper levels revealed lots of code like this:

Given('I navigate to the Add Claims page')
{
   page = new Class("AddClaims");
   page.navigate_to();
}

Given('I navigate to the Create User page')
{
   page = new Class("CreateUser");
   page.navigate_to();
}

Given('I navigate to the Delete Record page')
{
   page = new Class("DeleteRecord");
   page.navigate_to();
}

There’s a pattern here. A ton of code was duplicated just for the sake of getting to a particular page. That was one example of many.

Instead, we did something like this:

Given(/I navigate to the (.*) page/) |page_name|
{
   // treat the string coming in from page_name as a class
   // call the navigate_to() function built into that class.
   // end result = you'll go to that page. neat huh?

   page_name_no_spaces = page_name.replace(" ", ""); 
   page = new Class(page_name_no_spaces.as_class());
   page.navigate_to();
}

Going through and finding ways to parameterize the code was really helpful, because:

  • It made it much quicker to debug faulty tests
  • It was a whole lot less to remember
  • Less code needed to be written–if there was a new page that was created, now there wouldn’t be a need for a new Cucumber step to be written
  • The tests could stay in their current format, and keep running just like before. But they’d just run better.

But we didn’t stop there.

The team ran all their tests on a server called Jenkins. That’s not the name they gave it, that’s the name of the tool. Jenkins. It really is like a butler, doing whatever you ask it to so that you don’t have to.

The challenge with Jenkins was it took 90 minutes for tests to run. It wasn’t ideal. But it was fixable.

The Jenkins server had been set up to divide up those 200+ tests into smaller groups of around 20-30 tests each. Each group would get aimed at a dedicated computer, and then the tests in those groups would get run one after another, until everything was done.

That’s a problem, because if Group #3 finished up its tests first, then the computer that they ran on would just sit there like a freeloading teenager, not contributing anything. And so the tests would take as long as the longest running set of tests.

Instead, it was rearchitected like this:

  • Get the entire list of tests
  • Get the entire list of computers that can run a test
  • Go into a loop and look for an available computer (or wait for one to become available), then take the topmost test off the stack to run on that computer.
  • Profit.

The result was: every computer was used to run whichever test they were given, and the next and the next, until all tests were done. And the entire test suite would get run in half the time.

Now this was before they adopted any cloud architecture on that team. Now they use AWS to spin up a server for every test in the cloud. But back then, the could simply add another computer (even a company laptop) to the mix, and get tests to run even faster.

Very nice.

But we weren’t done there.

Team #2: Cleaning up 1000s of Lines of Code

Fun fact #2: sometimes RGA’s pockets aren’t deep enough, so they have a pool of “mini-RGAs” they they distribute risk to. This is called “retrocession” and yes it’s as complex as it sounds. The testing and test automation are equally complex.

On the International Retrocession Administration team, they had 800-1000 individual tests running all the time. It was a really tedious process since a lot of the tests also had a lot of repeated code spread throughout. It was about 10x bigger than the Claims team’s code base.

This called for some automation to help out with the automation.

Over the span of a week, a tool was built that would analyze the code base and identify places where the code looked really similar. Some manual inspection was needed, but a small database of likely repeated code was created.

For example: there might be 3 different places where a certain piece of data is sent to the system, but only one needed to exist. That was considered the “gold standard” for that operation.

Next, the automated tool would go through all the tests and make a single change, to call that gold standard code. It’s a cosmetic change on the surface, but ends up calling a totally different piece of code.

Once that was done, the test that was changed was automatically run. If the test passed, then that test got saved for future use and all the other redundant code got deleted. If not? Then that change was reverted and the next attempt was tried.

In all, it ended up clearing out 1000s of lines of code, which is great for maintenance and a lot easier to keep track of in your head.

Bonus: We built a railgun

As a final small project as the contract wrapped up, it was determined that there needed to be a quick way to get data loaded into the system on the retrocession team.

Usually what they would do is take the entire database, and put it back to a known state prior to running a test. That’s an operation that takes around 5 minutes per test, and was why many times tests weren’t done by the time people came into the office the next day.

Instead, a Cucumber-based data engine was created that let people clearly see what data was going into the database prior to running a test.

Rather than taking 5 minutes to put unknown data into an agreed upon “ok” state, we shot data straight into the database in 5 seconds.

That project was affectionately called “The Railgun.” Because sometimes, data has to be there, right on time. Pew pew.

Anything can be automated. Need some help? Let’s get some time to chat.