Friday, December 15, 2006

Mock testing examples and resources

Mock testing is a very controversial topic in the area of unit testing. Some people swear by it, others swear at it. As always, the truth is somewhere in the middle. But first of all, let's ask Wikipedia about mock objects. Here's what it says:

"Mock objects are simulated objects that mimic the behavior of real objects in controlled ways. A computer programmer typically creates a mock object to test the behavior of some other object, in much the same way that an automobile designer uses a crash test dummy to test the behavior of an automobile during an accident."

This is interesting, because it talks about accidents, which in software development speak would be errors and exceptions. And indeed, I think one of the main uses of mock objects is to simulate errors and exceptions that would otherwise be very hard to reproduce.

Let's get some terminology clarified: when people say they use mock objects in their testing, in most cases they actually mean stubs, not mocks. The difference is expanded upon with his usual brilliance by Martin Fowler in his article "Mocks aren't stubs". I'll let you read that article and draw your own conclusions. Here are some of mine: stubs are used to return canned data to your methods or functions under test, so that you can make some assertions on how your program reacts to that data (here, I use "program" as shorthand for "method or function under test", not for executable or binary.) Mocks, on the other hand, are used to specify certain expectations about how the methods of the mocked object are called by your program: how many times, with how many arguments, etc.

In my experience, stubs are more useful than mocks when it comes to unit testing. You should still use a mock library or framework even when you want to use stubs, because these libraries make it very easy to instantiate and work with stubs -- as we'll see in some of the examples I'll present.

I said that mock testing is a controversial topic. If you care to follow the exchange of comments I had with Bruce Leggett on this topic, you'll see that his objections to mocking are very valid. His main point is that if you mock an object and the interface or behavior of that object changes, your unit tests which use the mock will pass happily, when in fact your application will fail.

I thought some more about Bruce's objections, and I think I can come up with a better rule of thumb now than I could when I replied to him. Here it is: use mocking at the I/O boundaries of your application and mock the interactions of your application with external resources that are not always under your control.

When I say "I/O boundaries", I mean mostly databases and network resources such as Web servers, XML-RPC servers, etc. The data that these resources produce is consumed by your application, and it often contains some randomness that makes it very hard for your unit tests to assert things about it. In this case, you can use a stub instead of the real external resource and you can return canned data from the stub. This gives you some control over the data that is consumed by your program and allows you to make more meaningful assertions about how your program reacts to that data.

These external resources are also often unreachable due to various error conditions which again are not always under your control, and which are usually hard to reproduce. In this case, you can mock the external resource and simulate any errors or exceptions you want, and see how your program reacts to them in your unit tests. This relates to the "crash test dummy" concept from the Wikipedia article.

In most cases, the external resources that your application needs are accessed via stable 3rd party libraries or APIs whose interfaces change rarely. For example, in Python you can use standard library modules such as urllib or xmlrpclib to interact with Web servers or XML-RPC servers, or 3rd party modules such as cxOracle or MySQLdb to interact with various databases. These modules, either part of the Python stdlib or 3rd party, have well defined interfaces that rarely if ever change. So you have a fairly high degree of confidence that their behavior won't change under you at short notice, and this makes them good candidates for mocking.

I agree with Bruce that you shouldn't go overboard with mocking objects that you create in your own application. There's a good chance the behavior/interface of those objects will change, and you'll have the situation where the unit tests which use mock versions of these objects will pass, when in fact the application as a whole will fail. This is also a good example of why unit tests are not sufficient; you need to exercise your application as a whole via functional/integration/system testing (here's a good concrete example why). In fact, even the most enthusiastic proponents of mock testing do not fail to mention the need for testing at higher levels than unit testing.

Enough theory, let's see some examples. All of them use Dave Kirby's python-mock module. There are many other mock libraries and modules for Python, with the newest addition being Ian Bicking's minimock module, which you should definitely check out if you use doctest in your unit tests.

The first example is courtesy of Michał, who recently added some mock testing to the Cheesecake unit tests. This is how cheesecake_index.py uses urllib.urlretrieve to retrieve a package in order to investigate it:

try:
downloaded_filename, headers = urlretrieve(self.url, self.sandbox_pkg_file)
except IOError, e:
self.log.error("Error downloading package %s from URL %s" % (self.package, self.url))
self.raise_exception(str(e))
if headers.gettype() in ["text/html"]:
f = open(downloaded_filename)
if re.search("404 Not Found", "".join(f.readlines())):
f.close()
self.raise_exception("Got '404 Not Found' error while trying to download package ... exiting")
f.close()

To test this functionality, we used to have a unit test that actually grabbed a tar.gz file from a Web server. This was obviously sub-optimal, because it required the Web server to be up and running, and it couldn't reproduce certain errors/exceptions to see if we handle them correctly in our code. Michał wrote a mocked version of urlretrieve:

def mocked_urlretrieve(url, filename):
if url in VALID_URLS:
shutil.copy(os.path.join(DATA_PATH, "nose-0.8.3.tar.gz"), filename)
headers = Mock({'gettype': 'application/x-gzip'})
elif url == 'connection_refused':
raise IOError("[Errno socket error] (111, 'Connection refused')")
else:
response_content = '''
HTML_INCLUDING_404_NOT_FOUND_ERROR
''''
dump_str_to_file(response_content, filename)
headers = Mock({'gettype': 'text/html'})

return filename, headers
(see the _helper_cheesecake.py module for the exact HTML string returned, since Blogger refuses to include it because of its tags)

The Mock class from python-mock is used here to instantiate and mock the headers object returned by urlretrieve. When you do:
headers = Mock({'gettype': 'text/html'})
you get an object which has all its methods stubbed out and returning None, with the exception of the one method you specified, gettype, which in this case will return the string 'text/html'.

This is the big advantage of using a library such as python-mock: you don't have to manually stub out all the methods of the object you want to mock; instead, you simply instantiate that object via the Mock class, and let the library handle everything for you. If you don't specify anything in the Mock constructor, all the methods of the mocked object will return None. In our case, since cheesecake_index.py only calls header.gettype(), we were only interested in this method, so we specified it in the dictionary passed to the Mock class, along with its return value.

The mocked_urlretrieve function inspects its first argument, url, and, based on its value, either copies a tar.gz file into a target location (indicated by filename) for further inspection, or raises an IOError exception, or returns an HTML document with a '404 Not Found' error. This illustrates the usefulness of mocking: it avoids going to an external resource (a Web server in this case) to retrieve a file, and instead it copies it from the file system to another location on the file system; it simulates an exception that would otherwise be hard to reproduce consistently; and it returns an error which also would be hard to reproduce. Now all that remains is to exercise this mocking functionality in some unit tests, and this is exactly what test_index_url_download.py does, by exercising 3 test cases: valid URL, invalid URL (404 error) and unreachable server. Just to exemplify, here's how the "Connection refused" exception is tested:

try:
self.cheesecake = Cheesecake(url='connection_refused',
sandbox=default_temp_directory, logfile=logfile)
assert False, "Should throw a CheesecakeError."
except CheesecakeError, e:
print str(e)
msg = "Error: [Errno socket error] (111, 'Connection refused')\n"
msg += "Detailed info available in log file %s" % logfile
assert str(e) == msg

You might have a question at this point: how did we make our application aware of the mocked version of urlretrieve? In Java, where the mock object techniques originated, this is usually done by what is called "dependency injection". This simply means that the mocked object is passed to the object under test (OUT) either via the OUT's constructor, or via a setter method of the OUT's. In Python, this is absolutely unnecessary, because of one honking great idea called namespaces. Here's how Michał did it:
import cheesecake.cheesecake_index as cheesecake_index
from _helper_cheesecake import mocked_urlretrieve
cheesecake_index.urlretrieve = mocked_urlretrieve
What happens here is that the urlretrieve name used inside the cheesecake_index module is simply reassigned and pointed to the mocked_urlretrieve function. Very simple and elegant. This way, the OUT, in our case the cheesecake_index module, is completely unchanged and blissfully unaware of any mocked version of urlretrieve. It is only in the unit tests that we reassign urlretrieve to its mocked version. Further proof, if you needed one, of Python's vast superiority over Java :-)

The second example is courtesy of Karen Mishler from ARINC. She used the python-mock module to mock an interaction with an external XML-RPC server that produces avionics data. In this case, the module that gets mocked is xmlrpclib (I changed around some names of servers and methods and I got rid of some information which is not important for this example):

fakeResults = {
"Request":('|returncode|0|/returncode|',
'|machineid|fakeServer:81:4080|/machineid|'),
"Results":('|returncode|0|/returncode|',
'|origin|ABC|/origin|\n|destination|DEF|/destination|\n'),
}
mockServer = Mock(fakeResults)
xmlrpclib = Mock({"Server":mockServer})

(I replaced the XML tag brackets with | because Blogger had issues with the tags....Beta software indeed)

Karen mocked the Server object used by xmlrpclib to return a handle to the XML-RPC server. When the application calls xmlrpclib.Server, it will get back the mockServer object. When the application then calls the Request or Results methods on this object, it will get back the canned data specified in the fakeResults dictionary. This completely avoids the network traffic to and from the real XML-RPC server, and allows the application to consume specific data about which the unit tests can make more meaningful assertions.

The third example doesn't use mocking per se, but instead illustrates a pattern sometimes called "Fake Object"; that is, replacing an object that your application depends on with a more lightweight and faster version to be used during testing. A good example is using an in-memory database instead of a file system-based database. This is usually done to speed up the unit tests and thus have more frequent continuous integration runs.

The MailOnnaStick application that Titus and I presented at our PyCon06 tutorial uses Durus as the back-end for storing mail message indexes. In the normal functionality of the application, we store the data on the file system using the FileStorage functionality in Durus (see the db.py module). However, Durus also provides MemoryStorage, which we decided to use for our unit tests via the mockdb.py module. In this case, mockdb is actually a misnomer, since we're not actually mocking or stubbing out methods of the FileStorage version, but instead we're reimplementing that functionality using the faster MemoryStorage. You can see how we use mockdb in our unit tests by looking at the test_index.py unit test module. Python namespaces come to the rescue again, since we don't have to make index.py, the consumer of the database functionality, aware of any mocking-related changes, except inside the unit test. In the test_index.py unit test, we reassign the index.db name to mockdb:
from mos import index, mockdb
index.db = mockdb
Speaking of patterns, I found very thorough explanations of unit testing patterns at the xUnit Patterns Web site. Sometimes the explanations are too thorough, if I may say so -- too much hair splitting going on -- but overall it's a good resource if you're interested in the more subtle nuances of Test Stubs, Test Doubles, Mock Objects, Test Spies, etc.

Mock testing is being used pretty heavily in Behavior-Driven Development (BDD), which I keep hearing about lately. I haven't looked too much into BDD so far, but from the little I've read about it, it seems to me that it's "just" syntactic sugar on top of the normal Test-Driven Development process. They do emphasize good naming for the unit tests, which if done to the letter turns the list of unit tests into a specification for the behavior of the application under test (hence the B in BDD). I think this can be achieved by properly naming your unit test, without necessarily resorting to tools such as RSpec. But I may be wrong, and maybe BDD is a pretty radical departure from TDD -- I don't know yet. It's worth checking it out in any case.

I'll finish by listing some Web sites and articles related to mock testing. Enjoy!

Mind maps and testing

Jonathan Kohl, whose blog posts are always very insightful, writes about using mind maps to visualize software testing mnemonics (FCC CUTS VIDS; each letter represents an area of functionality within a product where testing efforts can be applied.) He finds that a mind map goes beyond the linearity of a list of mnemonics and gives testers a home base from which they can venture out into the product and explore/test new areas. Jonathan's findings match my experiences in using mind maps.

Thursday, December 14, 2006

"The Problem with JUnit" article

Simon Peter Chappell posted a blog entry on "The Problem with JUnit". The title is a bit misleading, since Simon doesn't really have a problem with JUnit per se. His concern is that this tool/framework is so ubiquitous in the Java world, that people new to unit testing think that by simply using it, they're done, they're "agile", they're practicing TDD.

Simon's point is that JUnit is just a tool, and as such it cannot magically make you write good unit tests. This matches my experience: writing unit tests is hard. It's less important what tool or framework you use; what matters is that you cover as many scenarios as possible in your unit tests. What's more, unit tests are definitely necessary, but also definitely not sufficient for a sound testing strategy. You also need comprehensive automated functional and integration tests, and even (gasp) GUI tests. Just keep in mind Jason Huggins's FDA-approved testing pyramid.

Simon talks about how JUnit beginners are comfortable with "happy path" scenarios, but are often clueless about testing exceptions and other "sad path" conditions. This might partly be due to the different mindset that developers and testers have. When you write tests, you need to put your tester hat on and try breaking your software, as well as making sure it does what it's supposed to do.

In the Python testing world, we are fortunate to have a multitude of unit test tools, from the standard library unittest and doctest to tools and frameworks such as py.test, nose, Testoob, testosterone, and many others (see the Unit Testing Tools section of the PTTT for more details). There is no tool that rules them all, such as JUnit in the Java world, and I think this is a good thing, since it allows people to look at different ways to write their unit tests, each with their own strengths and weaknesses. But tools are not enough, as Simon points out, and what we need are more articles/tutorials/howtos on techniques and strategies for writing good tests, be they unit, functional, etc. I'm personally looking forward to read Roy Osherove's book "The Art of Unit Testing" when it will be ready. You may also be interested in some of my articles on testing and other topics. And the MailOnnaStick tutorial wiki might give you some ideas too.

Switched to Blogger Beta

I apologize if your RSS feed reader is suddenly swamped with posts from my blog. It's hopefully a one-time thing due to my having switched my blog to Blogger Beta.

Wednesday, December 13, 2006

Hungry for cheesecake?

If you are, search for "cheesecake" using Google Code Search. If you do, you'll get a unit test from the Cheesecake project as the very first result. Clearly, Google have their act together! :-)

Tuesday, December 05, 2006

"Scrum and XP From the Trenches" report

This just in via the InfoQ blog: a report (PDF) written by Henrik Kniberg with the intriguing title "Scrum and XP From the Trenches". Haven't read all of it yet, but the quote from the report included at the end of the InfoQ blog post caught my attention:

"I've probably given you the impression that we have testers in all Scrum teams, that we have a huge acceptance test team for each product, that we release after each sprint, etc., etc. Well, we don't. We've sometimes managed to do this stuff, and we've seen that it works when we do. But we are still far from an acceptable quality assurance process, and we still have a lot to learn there."

Testing is hard. But testing can also be fun!

Friday, December 01, 2006

"Performance Testing with JUnitPerf" article

Andrew Glover, who has been publishing a series of articles related to code quality on IBM developerWorks, talks about "Peformance Testing with JUnitPerf". The idea is to decorate your unit tests with timing constraints, so that they also become performance tests. If you want to do the same in Python, I happen to know about pyUnitPerf, the Python port of JUnitPerf. Here is a blog post/tutorial I wrote a while ago on pyUnitPerf.

PyCon news

I was very glad to see that the 3 proposals I submitted to PyCon07 were accepted: a "Testing Tools in Python" tutorial presented jointly with Titus, a "Testing Tools Panel" that I will moderate, and a talk on the Pybots project. The complete list of accepted talks and panels is here.

Here are the brief description and the outline for the Testing Tools tutorial that Titus and I will present. We will cover much more than just testing tools actually -- we'll talk about test and development techniques and strategies. It should be as good or better than the one we gave last year, which attracted a lot of people.

The Testing Tools Panel has a Wiki page. If you're interested in attending, please consider adding questions or topics of interest to you. If there is enough interest, I'm thinking about also organizing a BoF session on Testing Tools and Techniques, since the panel's duration will be only 45 minutes.

Finally, my Pybots talk will consist of an overview of the Pybots project: I will talk about the setup of the Pybots buildbot farm, about the issues that the Pybots farm has helped uncover,
and also about lessons learned in building, sustaining and growing an open-source community project.

The program for PyCon07 looks very solid, with a lot of interesting talks and tutorials. I'm very much looking forward to the 4 days I'll spend in beautiful Addison, TX :-)

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...