Off the beaten track

You learn a lot more when things don’t quite go to plan, than when everything goes smoothly. So here’s a story I think we can all learn from.

James and I wanted to do a kind of performance test which involved the creation of many elements in our system. So we decided that a quick way to do it would be to leverage the common methods in our existing GUI automation suite to create a heap of accounts through the interface. We drew up a plan and wrote the test script.

Why did we choose to use GUI automation? First, as was mentioned, we already had existing code we could reuse, so we thought script creation would be reasonably fast. Also, James hadn’t used our test suite much before, so it was a good opportunity for him to become more familiar with it. In addition we figured it might reveal some interesting GUI bugs while it was running, as kind of a by-product.

This test script was massive. Luckily, most of the creation methods were already written, so we just had to get the test creating what we wanted. I knew that GUI automation could be a bit fragile. So we got the script to track its progress in a text file, so that when it got interrupted, we could just restart it and it could just pick up from where it left off.

We estimated it would take a couple of days to create the script, and about 50 hours to run it. How wrong we were.

The first day started out slow, as James battled with installing Visual Studio, Webaii and Gallio on his machine, which progressed as an epic series of fails until finally he was able to START writing code at the end of the day. Meanwhile, I managed to make a decent start on my part of the script. After a while we realised we would have to write some of the data directly to the database, so we asked a friendly developer for some help. Writing data this way would prove to be a bit of a time saver later.

Four days later, we were finally able to start running this script. It tripped over a few times, but it could run uninterrupted for an hour or two. Some observations were made:

TK: a little null reference exception never hurt anyone. off it goes again.

JM: As if nothing ever happened

TK: happy, carefree test script

TK: gamboling through the wild plains of GUI automation

TK: though null reference exceptions may block its path, does it falter? nay it continues, unhindered and free

a majestic beast

It was actually running quite smoothly on my machine, only occasionally failing to find page elements for some unknown reason. So we ran it on my machine for about half a day.

JM: Sweeping majestically across the plains and all that…

TK: like a gazelle, leaping through the DOM…

JM: Yeah!

JM: Definitely evoking gazelle-like feelings

TK: let no id=basicTab stand in its way

TK: it’s much nicer to think of it as a gazelle

JM: Than?

TK: rather than how i usually think of it – like a small child picking up the browser and punching it into submission

After a few hours we realised that our original estimates were way off. At this rate it was going to take about 250 hours to run. We didn’t have that kind of time, so we decided to split up the creation workload and run it in parallel across five different machines. Then the real fun began. While it ran relatively smoothly on my machine, it didn’t go so well on other machines because my machine was much faster than the other machines we acquired. So pages were taking longer to load on the other machines, and the automation kept falling over because it couldn’t find page elements in time. Our majestic script was not looking so gazelle-like anymore.

TK: off it stumbles, like a fumbling beast, tripping over its own legs and the occasional blade of grass

TK: running like an oafish monstrosity, crashing into trees and occasionally falling into holes that aren’t there

TK: oh see how on broken wings it flies, soaring erratically through slight zephyrs, smacking into small buildings

It would only run a few hours at a time if we were lucky, and it had to be manually restarted. After a few days, its progress was dismally short of its target.

We decided that we couldn’t spend much more time on this, so if it wasn’t done by the end of the day, we would just use what we had. We still wanted to get some nice peaks happening, so we modified the progress files to focus only on the peak areas, divided it up among the working machines and set it on its way again.

With most of the test data in place, we made plans to start our performance test. The final piece of the puzzle was simulating user activity in order to trigger the functionality that would (we hoped) stress the system in interesting ways. We took the path of least resistance in this case; using a Ruby library that had been written as an experiment in a previous test. With a few small modifications, the Ruby script consumed the output from our main setup automation and simulated the actions that we required. Again, we divided the work and ran several scripts in parallel to speed things up.

After running the test for a few days, we didn’t find any big showstopper bugs but we did end up finding a few smaller bugs. So was it worth doing? We didn’t necessarily get what we set out to achieve, but it produced many valuable outcomes:

  • It demonstrated a few things about our test framework that could be improved upon, which motivated us to make these improvements.
  • It demonstrated a few shortcomings of GUI-level automation. Trying to get GUI-level automation to do anything for a long amount of time is prone to error from multiple sources: The test environment, the browser, the speed of the machine it’s running on, the sun, the moon, the way MbUnit happens to be feeling at the time, etc. For generating data, we would have been better off just directly injecting the data into the database.
  • Just by telling the development team that we were going to do this test, and telling them exactly what we were planning to do made them think about what issues it could run into and they started working on improvements even before the test had been run. One developer who was assigned to look into a specific bottleneck ended up using elements of our data setup script to create his own targeted test for his changes.
  • The Ruby user action simulator ended up being a really useful general purpose tool.
  • We discovered a heap of GUI-related bugs as a result of checking the progress of the GUI automation.

What could we have done better?

  • When we started out, we only had a vague idea of how long it would take and how long we could afford to spend doing the test (we needed to ship the software at some point too). In hindsight, perhaps we should have set some clear upper limit on the time we would spend working on the problems before admitting defeat and attempting to tackle the problem in another way.
  • Our objectives and expectations evolved as we learnt about the system, the test framework and hundreds of other variables that we hadn’t considered. This is a common problem in software projects and perhaps if we had set some clearer goals at the start we wouldn’t have drifted at times.
  • We could have involved other team members more in what we were doing by updating them on the specifics of what we were actually implementing in order to give them a chance to give their feedback and opinions.

And hey, in the end the feature was deployed successfully and within a reasonable timeframe. So ultimately we achieved our mission, and learned some valuable things in the process. Success!

Written in collaboration with James Martin.

2 thoughts on “Off the beaten track

  1. What a great retrospective. You guys really nailed the value of < 100% success in a couple of ways:

    -Even though your high level goal did not turn out completely as expected, you had smaller successes and got your testing done, in the end.
    -You've taken the time to evaluate your experience and use what you've learned to make more improvements.

    I used to have a hairdresser who was very fond of telling me, "take the best, leave the rest and get on with your life." Of course, she was talking about love and not software, but the same rule applies.

    Perhaps I'll take a lesson from you two and do a testing retrospective as well :)

  2. Hi Trish,

    A few comments based on my experience creating automation frameworks capable of running 1000’s of tests.

    It demonstrated a few shortcomings of GUI-level automation.
    – Better to say, it demonstrated some textbook examples of scalability and robustness issues in the framework. Seems like, mbUnit tests in a programmer’s way.

    Trying to get GUI-level automation to do anything for a long amount of time is prone to error
    – On the other hand, if your test suite can do that, then you are rewarded with a test harness helping to discover a vast number of issues (all of high severity – race condition bugs, endurance bugs) that can hardly be found by testing other ways.

    Trying to get GUI-level automation to do anything for a long amount of time is prone to error from multiple sources: The test environment, the browser, the speed of the machine it’s running on, the sun, the moon, the way MbUnit happens to be feeling at the time, etc.
    * Test Volume. Support run-time plug-in / plug-out; absolutely NO hard-coding of script names, data files, etc.
    * Test Environment. Use independent data model supporting various external data sources.
    * Browser. Do not mix up browser compatibility testing scripts with functional testing scripts.
    * Hardware/Software Performance. Use configurable time-out based synchronization.
    * The Sun / the Moon. Run same comprehensive Automated Test Plan during the day and overnight to expose time-dependant issues. Some of them are dependant on configuration settings, other may indicate actual bugs.
    * The way testing tool feels. That’s a hard one. If you have to massage the tool itself, it deserves throwing away. Otherwise you’ll spend too much time contributing to development of the tool instead of testing the product.

    For generating data, we would have been better off just directly injecting the data into the database.
    Yes and no :)
    If your application under test stores EXACTLY the same values you input through the GUI and nothing more, then it’s “Yes”.
    If the application creates records based on GUI input PLUS defaulting and configuration parameters (especially, if some of those parameters change right at run-time), then it’s “No” – because data injected directly won’t match those created through the GUI.

    Thank you,
    Albert Gareev
    automation-beyond dot com

    PS. I wanted to supply a few links expanding ideas in my replies but haven’t found a way to do that without waking up your watch dog :)

Comments are closed.