The three big differences cloud computing brings with it are really just modern twists on old practices.
I don’t understand the hype around “testing cloud-based applications.”
Making the rounds of test-related conferences this past year, I’ve noticed that—because our industry isn’t immune to hype—cloud computing and how to test it has started to be a talking point. One conference organizer came right out and said they wanted a couple of talks on the subject for their next event. And if it is a looked-for conference topic then you just know that mainstream testers are going to think they need to pay attention.
But they really don’t need to work themselves into a tizzy over it. Cloud or not, it’s still just an application that needs to be tested. All the same tricks and techniques that you would use if the application resided nice and safe and sound behind your corporate firewall still apply. In fact, after working on an application for 18 months that was entirely in the cloud, I can say that testing a cloud application is almost exactly the same as testing an earthbound application, except that you need to pay extra attention to three things: location, fault tolerance, and elasticity.
Being outside of the firewall is an important difference. Security is the first thing people think of when their application gets deployed into the cloud, and having tested a security product for almost six years I think that’s an appropriate focus. But security is only a small part of the ramifications of being on the other side of the firewall. Here is the beginning of a list of location-driven questions that will need to be answered—and those answers tested:
Can your internal systems reach it? Outgoing holes in the firewall will need to be made.
Can it reach your internal systems? Incoming holes in the firewall will need to be made, too.
What about communication to partner or other third-party systems? Does it need to be in a different form if you allow those communications from the cloud?
Is the machine stripped of unnecessary applications and locked down? You can’t rely on the firewall any longer to catch the easy things. They need to be addressed both through the cloud management tooling and the instance itself in the cloud, not just one or the other.
What about intrusion detection?
What is the patch/update plan?
How do people authenticate onto the machine to do all this maintenance?
What is, and more importantly, isn’t encrypted when entering or leaving the cloud? And there better be a good reason for everything that isn’t.
Where are the backups of logs and databases going? And how are they getting there?
What are the costs for actions and events inside the cloud? Amazon’s AWS cloud, for instance, has different pricing internally and externally. Applications will need to be modified to maximize their use of the lowest-cost transport and storage options.
Can mail from your cloud instance reach its intended destination?
But if you have been around classically deployed applications for a while, the odds are these questions are just variations of ones you would have had to ask anyway. So, initially, location is no big deal.
The next point of distinction is fault tolerance and how you react to it. In classic situations you can have all sorts of hardware-level heartbeats and monitors in place for things like I/O rate, bad sectors, temperature, etc. In the cloud you don’t have any of that. What you get instead is an email that reads something like this:
This was directly taken from the notice Amazon sends to customers when the physical hardware hosting their instance is having problems. And it is scary if you’re not expecting it. (Especially if the instance in question is your database server, no one in the company has the root password, and you’ve managed to corrupt the sudoers file. Don’t ask...).
It is important to think of your cloud resources as something that can go away at any time. And if my experience at a cloud-based company is any indication, they are indeed likely to go away more often than if you own the actual hardware. Thinking this way leads to three critical questions:
If an instance goes away unexpectedly, how does that affect the overall system?
Given advanced warning, is there a plan for replacing an instance?
How much time do you need, at a minimum?
The answers, ideally, should be “negligible,” “yes,” “about the time it takes to run a script.”
The DevOps community has grown significantly over the last year and has developed techniques to solve these problems. At the time I left, our system could (pretty) gracefully recover from any part of it going down through a combination of round-robining through a cluster of app servers and database hot-backups. The weakest link in the system was the load balancer, but even it could be re-created through a combination of a custom image and cfengine—meaning that it could be rebuilt and running in about five minutes. Regular instance creation and migration became almost routine with the right tooling and processes.
Of course, getting that system in place took a lot of work. And it took even more testing, because it really did need to work. Being able to auto-provision a server is not just a goal for cloud applications, though. The same steps and techniques apply to provisioning a physical piece of hardware, it’s just that in the cloud there’s an additional level of scale.
Scaling, or more correctly elasticity, is the only thing that is really different when testing cloud-based applications. Much of the early marketing for cloud-based computing touted being able to scale up and (more importantly) scale down your computing resources as load demanded, which the cloud enables since it’s all just software. Hardware elasticity is in theory also possible, but the costs in terms of hardware and human capital can be prohibitive.
From an implementation perspective, this is really just the fault-tolerance problem wrapped in a fully automated skin.
The testing around elasticity is designed to ensure that these fully-automated scripts function correctly—around the triggering conditions for these scripts, and for scaling up and down. These triggering conditions used to be pretty simple things to monitor—like CPU, memory, and number of web processes. Now they are just as likely to include some aspect of business metrics. Improperly reacting to any of these metrics can cost the company money, either directly (another instance running) or indirectly (slow site turns away customers), so their collection and interpretation absolutely need to be tested.
Unlike location and fault tolerance, though, elasticity testing is unique to testing for the cloud. But even so, its uniqueness is based on it being a software procurement event rather than hardware.
Cloud computing is certainly hype-worthy. AWS and similar services have dramatically lowered the cost of creating a start-up and have reshaped how organizations think about their computing resources. When it comes to testing, though, I don’t think the hype it is starting to receive is warranted. The three big differences cloud computing brings with it—location, fault tolerance, and even elasticity—are really just modern twists on old practices: not completely new things we don’t understand, just modifications to how we have always tested.
Adam Goucher has been testing professionally for over 12 years at a range of organizations from national banks to startups. A large part of that time has been spent augmenting his exploratory testing with automation. He is the maintainer of Selenium IDE and consults on automation through his company, Element 34.