Python and Threads vs Microthreads(Tasklets) vs Greenlets
Lately I’ve been noticing a lot of questions on forums asking what exactly are the differences between the Threads, Microthread and Greenthread concurrency models. Questions like
- How do they differ in implementation?
- Do the number microthreads/greenlets have an upper limit as regular threads normally do?
- What the the benefits/disadvantages of one over the other? Which one should I use?
Out of this group of concurrency models, let’s begin with the most commonly known: threads. Threads in Python are implemented using regular Posix threads. That is, each thread in Python maps to a system-level thread, and the system kernel is aware of and responsible for maintaining these threads. This includes pre-empting the thread when it is running, scheduling the thread’s next time-slot, as well as handling context-switches (swapping the thread’s state from the CPU registers with another one etc.)
A characteristic of threads, is that the more you have running, the more tasks the kernel’s scheduler must juggle at the same time. When you have too many threads, performance will take a hit because the slice of execution time each thread receives becomes comparable to the amount of time it takes to actually switch between threads - the switching overhead becomes a major bottleneck. Using threads, you want to keep the number running to a reasonable amount of 100 or less (this is one of the use cases of running a ThreadPool).
Because each Python thread is mapped to and managed by the kernel, there is the added overhead of switching to from userspace (where most user programs spend their time in) to kernel space and back whenever a context switch occurs. Again this is a relatively expensive operation, and one that contributes to the problem of running too many threads at once.
Even though threads are considered light weight, there are much lighter options as we will see.
The Stackless Python (a compatible fork of Python with heavy modifications to the core of the Python interpreter) project introduced Microthreads (amongst other new features) under the name Tasklets.
The project’s approach to threading is to hide threads from the kernel, and handle all scheduling and context-switching within the Python interpreter itself. Historically, this has been useful for VM’s or interpreters adding support to an OS that did not natively support threads. (This approach was used by Java 1.1 for the Solaris OS which had this limitation). Even on OS’es that do support threads natively, this approach has a few advantages. Namely, the cost of overhead from switching between threads is drastically reduced - no longer does the execution need to switch from userspace to the kernel and back. Stackless handles both scheduling and the context-switching of its microthreads in the interpreter.
Despite some performance benefits, Stackless has remained a separate project from the mainline code. There are several reasons for this (which you can read about here), one of them being that the changes made were non-trivial and broke several Python extensions. Unless your code will run on machines where you have control of the interpreter being used, you will probably want to stay with the reference implementation of Python and avoid microthreads. However, there is a way to get some of the benefits of Stackless Python with the regular Python interpreter - keep reading…
Whereas Microthreads required major changes to the Python interpreter, Greenlets are a subset of Microthreads, and can be installed via a Python extension (Stackless is too complex to become an extension). The concept of greenlets are actually extracted from the Stackless project, and bring along similar advantages such as managing threads in the interpreter as opposed to the kernel. There is one major difference however - a greenlet does not have implicit scheduling.
The lack of a scheduler means you have complete control of when one greenlet switches to another. This is known as a cooperative concurrency model since each greenlet must voluntarily give up its execution in order to continuously switch amongst each other . As with microthreads, there is very low overhead when switching (as well as creating) and because of this, you can have a very high number of greenlets relative to threads.
A benefit of the cooperative threading model is its deterministic nature. You know exactly where one greenlet will exit and another will begin. This allows you to avoid the use of locks on shared data structures to handle race conditions.
If you think about it, you may notice that greenlets are pseudo concurrent (even in an interpreter without a GIL) - this means there can never be more than one greenlet running and it is simply a way of expressing flow in a program. Greenlet behaviour could be emulated using a huge if/else block structure and some loops, but of course this approach would not be very elegant.
One additional note: What happens if your greenlet hits a blocking method call before it can yield to another greenlet? Your program execution will grind to a halt until the blocking call returns. There are some very nice libraries that let you get around this. If you have blocking I/O (socket and file) calls you will want to consider GEvent which provides greenlets but also monkey-patches I/O calls to become non-blocking. See more here.
In the last section, I mentioned that Greenlets are pseudo-concurrent, meaning only one is really running at a given time. So this means that Greenlets are at a disadvantage compared to microthreads and threads right? Well in theory yes, but in Python no. The issue with Python and concurrency is the infamous GIL (global interpreter lock), which effectively disables more than one thread/microthread from running at a time. When running your code in the Python interpreter, you cannot take advantage of a multi-processor system. So that puts Greenlets on even ground with Microthreads and Regular threads. Now the question becomes more of do you need high performance while performing a large number of context-switches with lots of threads? Are you ready to be very hands-on and control exactly when your greenlets give up execution? Greenlets are the solution for you.
If on the other hand you are more interested in having the system schedule your threads for you (don’t forget you will need locks!) while you run a few tens of threads - and you do not have any high performance requirements, then consider threads. Even though it may be tempting to go with greenlets because they are more lightweight, there are many cases where it just doesn’t make sense - I’m using plain threading at my current job since threads and pre-emptive switching model the problem domain well - and I don’t need very high performance.
Of course Stackless is also an option if you are willing to install a custom version of the interpreter. With this you get the benefits of Greenlets as well as implicit scheduling.
There is one more potential solution that I haven’t mentioned in this article. If you have a multicore processor, the only way to take advantage of it in Python is to use processes. Python exposes a process API that is almost the same as threads API - however that’s where the similarities end. Each process is launched with its own Python interpreter, meaning you avoid the GIL issue. To make max use of your system, you will want the same number of processes as you have CPU cores. Note that because your code is running in different processes, they do not have access to each other’s variables, and so some sort of communication method will need to be devised (which has its own performance drawbacks).
There is no one-size fits all option for concurrency in Python. Carefully weigh the benefits of each and choose what’s right for you.