Posterous theme by Cory Watilo

Reference counting and the probe effect

Recently I've had the pleasure of writing a Python extension in C. After putting it to use (it's used on our automated tests with Robot), people reported it consumed all available memory. I had only conducted some small scale testing, so I can't say I was too surprised to hear nightly runs failed spectacularly.

My tests were of the "let's verify this is happening as intended" persuasion. For instance, "let's create this ssh connection and give this couple of commands, then verify the expected results". However, my extension was used in various very different ways, like "let's create these two ssh connections and run this bunch of commands on the first - those that succeed must also be given on the second, the others should be reported. Repeat multiple times in a single test suite".

Unexpected use of your code can lead to the most enlightening revelations about the system you work with; in this case, Python's reference counting. As you may or may not know, Python has two ways of managing memory, a decision which many see as a major design flaw:

  • reference counting - releasing memory when the reference counter reaches zero, ie. no references to an object exist
  • garbage collection - optional, on-by-default, a supplement to take care of cyclic references

In my case, the likely candidates for this ill behaviour were rather few. While many interesting program events can cause stuff to happen internally (function calls, returning from a function, the first execution in a file, raising an exception, deep recursion, etc.), I was given hints: deep recursion was a no-no, and the tests involved many yet short-lived iterations - two candidates of which my test cases had neither.

My extension also has this thing; it uses callbacks to refer to Python functions that are called on (for the moment) two occassions:

  1. A file is opened - its contents are inspected to decide if it's valid input and/or should be logged
  2. The results from certain commands are filtered and/or used by other Python functions

Shortly after giving up trying to make sense of valgrind's output, I thought I'd just take a look into the extension:

  • Putting debug statements in the dealloc functions of Python Type objects. This tells me when are they getting called, if at all. They will be called when a Py_DECREF() call on my Python object decrements its refcount to zero.
  • Next, I had to identify the objects that were suspect to be leaking (i.e. objects of which the reference counts were not reaching zero when I expected them to) and write a simple routine that dumps their refcounts. This routine can be  inside the C module or in Python - depending upon how the code is organized. In C you can find a PyObject's refcount by its member ob_refcnt. In Python you pass the object to sys.getrefcount() function. Calling this refcount dumping routine from various places in my code and monitoring how the reference count of these objects varies can help identify the "hotspots".
  • RTFM. Understanding the meaning of borrowed reference and new reference was critical. Soon, I would find the places where I was supposed to call Py_DECREF(), but didn't..

Turns out the memory leak was the combined effect of two things. The first one has to do with Python's C API, which applies the concept of ownership on a reference. Depending on the API function you call, you may or may not own the reference to the object you get. When you own an object, you must not forget to Py_DECREF() it or give it to something that will steal it. Likewise, when you do not own an object, you must Py_INCREF() it to keep it and possibly to use it during your function. The reference counts on some of my objects were astronomical - this is a strong indicator your error lies in a loop structure. This was rather easy to track and Py_DECREF() objects correctly.

The second one had to do with the way I structured my objects: some Python objects I was tracking (mostly functions) were kept as values in a dictionary, registered as callback handlers. I also registered all such objects upon instantiation in a pure Python class that was to be shared among several processes. This had two side-effects. One, it increases the reference count of the parent object - turns out that this object, immediately after allocation had a high refcount:

>>> object = MyClass()
>>> print(sys.getrefcount(object))
15

This came as a bit of a surprise, as I wasn't expecting an object with no more than ten callbacks to have so many references. The fact of the matter was, that I was not inspecting my objects like I did above. Instead, I was running small tests while my extension was instrumented like I mentioned above; in many cases, the code actually looked more like this:

for index in range(all):
    sys.getrefcount(dict.values()[index]) 

Ding! The probing code creates a list of values, incrementing the reference count of the objects I'm tracking. Additionally, centralized registering of all the objects mysteriously increased the reference count - without bearing in mind the references Python keeps in a frame's .f_locals or .f_globals. I still haven't figured out if sys.getrefcount() keeps a temporary reference and whether it reports it as well. I also tried to print some extra object information in the loop, right before printing the reference count - if your tp_repr function has a bug, it will be reflected on the subsequent call to sys.getrefcount(). A lot of faults being revealed in only three lines, don't you think so too? :-) It might not be necessary to change this behaviour as it was mostly debugging code, but taking those extra references into account helps.

This was my first real world experience with a reference counting memory manager. Everything I'd done before involved either manually managed or garbage collected languages/implementations, and I got a really good deal out of this experience. I suggest everyone to try and write their own Python extensions, however simple, as it can be so very revealing to a lot of interesting concepts.