isinstance() considered harmful

By Kragen Javier Sitaker. Last substantive update around 2002-03-01, this paragraph added 2016-11-30.

This is mostly in reference to Python programs, and it may be more or less true with reference to programs in other languages.

Python is an object-oriented programming language, which is a buzzword, and therefore can mean a wide variety of different things. What it actually means, in Python's case, is that you can implement your objects any way you please; it's the interface they support that determines whether or not they'll work with existing code --- that is, the methods they implement and the semantics of those methods.

For example, you can write the first version of a data type in pure Python; you can then rewrite it as a Python extension that creates CObjects, and expect that it will work everywhere the Python version did, as long as it's implemented correctly. This is a useful property; it often results in being able to apply old code to completely new problems.

For this promise to hold, the code your object works with must refrain from peeking behind the interface the object supposedly supports. There are probably some times that this is not desirable; transparent persistence and object migration systems can probably work better most of the time by not respecting published interfaces.

Some languages, like Java and C++, offer explicit interface support. Python is not among them. It offers implied interfaces in places where other languages would use explicit interfaces. This has a variety of effects, good and bad.

In Python, what classes your object is derived from is not a part of your object's interface.

Every use of isinstance is a violation of this promise, large or small. Whenever isinstance is used, control flow forks; one type of object goes down one code path, and other types of object go down the other --- even if they implement the same interface!

Bjarne Stroustrup often cited concerns like these when defending C++'s decision not to provide isinstance. (Now, of course, with RTTI in the C++ standard, C++ does provide isinstance.)

Sometimes, of course, violating this promise is worth the payoffs --- isinstance, like goto, is not pure evil. But it is a trap for new programmers. Beware! Don't use isinstance unless you know what you're doing. It can make your code non-extensible and break it in strange ways down the line.

Reasons for using isinstance

Isinstance is used for a variety of reasons:

to determine whether an object supports a particular interface
to keep people from writing possibly incorrect programs, or to indicate possible bugs (type-checking)
to test the implementation of an object to see if certain optimizations can be applied
to write regression tests with knowledge of the implementation
to define things that should really be methods outside of the class they operate on
for no apparent reason whatsoever

To determine whether an object supports an interface

Using isinstance() to determine whether an object supports a particular interface is always a bad idea. You are, in essence, including inheritance from a particular class in the interface; if anyone wants to implement that interface, they must derive their class from the class you specify, which means they'll inherit its bugs, they'll probably have to understand its invariants, and, in Python, they'll have to be careful not to collide with names it defines. (This is especially bad if one of those names is __getattr__ or __setattr__, but that's another problem.)

It's not just overly conservative; it's also overly liberal. Someone can override methods in the interface with broken methods --- in Python, they can even override them with non-callable objects. An object's derivation from a particular class doesn't guarantee that it implements all the protocol that class does. (Still, breaking protocols your base classes implement is almost always a bad idea.)

Type-checking to find bugs

Using isinstance() for type-checking to find bugs is a special case of the above. The bug you might catch is that the object being checked doesn't implement the interface the rest of your code expects.

Here's a perfect example from distutils.cmd.Command.__init__:

        if not isinstance(dist, Distribution):
            raise TypeError, "dist must be a Distribution instance"

In fact, all Command requires of 'dist' is that it have attributes verbose, dry_run, get_command_obj, reinitialize_command, and run_command, with the appropriate semantics. There's no reason that it should have to be derived from Distribution, which is a monster class nearly a thousand lines long.

Determining whether optimizations are applicable

Using isinstance() to determine whether or not a particular abstraction-violating optimization is applicable is a reasonable thing to do, but it is often better to package that optimization into a method and test for its existence:

    try: optmeth = obj.optmeth
    except AttributeError:
         do_it_the_slow_way(obj, stuff)
         return
    optmeth(stuff)

You can also use hasattr() for this test.

This allows your abstraction-violating optimization to work without violating abstraction --- simply by providing a more intimate interface into your objects.

There's an example of this in the standard library in test_coercion.py:

    class CoerceNumber:
        def __init__(self, arg):
            self.arg = arg
        def __coerce__(self, other):
            if isinstance(other, CoerceNumber):
                return self.arg, other.arg
            else:
                return (self.arg, other)

The __coerce__ method would be better written as:

    def __coerce__(self, other):
        num = self.arg
        try: return num, other.arg
        except AttributeError: return num, other

As it's written, if you had instances of two textual copies of the CoerceNumber class, you wouldn't even be able to add them together in Python 1.5, and in Python 2, the interpreter would have to call both of their __coerce__ methods to add them together. (Something like this is actually not uncommon; see the comments below about reload.)

Probably the best way to write this in Python 2.x --- although it's marginally slower --- is as follows:

     def __coerce__(self, other): return self.arg, other

Of course, this probably doesn't matter; it isn't very likely that someone will try to inherit from CoerceNumber or add a CoerceNumber instance to an instance of some other similar type, since it is, after all, only a test case.

(__coerce__ is kind of a tough routine to write, in general, though, and seems like one place where dispatching on the type of other arguments might be the least of all possible evils.)

More instances of this are scattered through the UserDict, UserList, and UserString modules.

Adding methods to someone else's classes

Sometimes one person wants to write a piece of code that accesses some data in someone else's class. So they check to see if the argument they're being applied to is of the correct type, and then proceed to mess with it in ways its public interface doesn't allow.

Generally, this is asking for trouble down the road, but it can be useful for short-term hacks, or for code that is unlikely to change.

Sometimes people even do it to their own classes, and that's just bad code. Here's an example from distutils.command.build_ext:

        # The python library is always needed on Windows.  For MSVC, this
        # is redundant, since the library is mentioned in a pragma in
        # config.h that MSVC groks.  The other Windows compilers all seem
        # to need it mentioned explicitly, though, so that's what we do.
        # Append '_d' to the python import library on debug builds.
        from distutils.msvccompiler import MSVCCompiler
        if sys.platform == "win32" and \
           not isinstance(self.compiler, MSVCCompiler):
            template = "python%d%d"
            if self.debug:
                template = template + '_d'
            pythonlib = (template %
                   (sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
            # don't extend ext.libraries, it may be shared with other
            # extensions, it is a reference to the original list
            return ext.libraries + [pythonlib]

This stuff should probably live in a method of the compiler object, so you can say

    return self.compiler.get_libraries(ext.libraries)

but possibly the best solution is to say

    if sys.platform == "win32" and not self.compiler.ismsvc():

or even

    if sys.platform == "win32" and not hasattr(self.compiler, 'msvc'):

Writing regression tests

If you implement an __add__ method that returns a new instance of the same class, and you decide to change it to return an instance of a different class, you should probably modify your regression tests to cover the new class. So it's perfectly reasonable for them to fail in this case. However, they should probably compare __class__ rather than using isinstance().

For no apparent reason

I've seen isinstance used to check whether something was None or a real object, and I've seen it used in places where it apparently had no real effect.

reload

Python's reload() function reloads modules of code into the running interpreter, re-executing their contents. This generally results in functions and classes in those modules being replaced with fresh versions, often identical fresh versions. References to those functions and classes elsewhere are not replaced, however, so old objects retain their old classes.

The result is that the following code prints 0 twice, assuming that everything works perfectly:

    import somemod
    x = somemod.someclass()
    reload(somemod)
    print isinstance(x, somemod.someclass)
    y = somemod.someclass()
    print isinstance(y, x.__class__)

This means that any code whose correctness depends on being able to tell that x is an instance of somemod.someclass will fail in the presence of reloading, and any code whose performance depends on it will perform poorly in the presence of reloading.

Prevalence of isinstance in the standard library

It appears to be recognized that isinstance is usually a bad idea; rather like Euclid's parallel postulate, the proof of this is in the frequency of its use.

In Python 2.1.1, there are 98586 lines in 447 .py files in the standard library; 68 of those lines mention 'isinstance'.

19 of those 68 are in the test suite, which contains 19364 of the total lines; seven of these (10% of the total standard-library use!) are verifying that isinstance() itself works.

24 of these are in User*.py, which uses isinstance to see if it's safe to bypass public interfaces and use other objects' data members directly. They should use try-except or hasattr instead.