The evil of threads
For some time now, I’ve been convinced of the inherent evil of threads. I’ve gone to some lengths to mitigate their use in projects over the last few years. For example, my dnsjnio package uses java.nio to conduct many thousands of concurrent DNS queries in just a couple of threads. This package has been pretty stable for the last couple of years, and I know that it is in reasonably heavy use at several sites (at least).
So, I was very surprised to discover it has a nasty race condition!
I was alerted to this by a very helpful user. I’m still in the discovery phase, so certainly no fix yet, I’m afraid. However, the problem seems to be a race condition between a timer thread (which looks after query timeouts), and the select thread, which handles the I/O. A separate thread is used for timers because so many queries may be outstanding at any one time - and the select thread is busy enough handling I/O to worry about ordering timeouts. If a timeout is generated immediately after data has been received, but not yet promulgated to the user, then a connection can be closed twice, leading a NullPointerException (oops!).
Of course, some sort of lightweight Actors implementation would have been ideal here - in Erlang, I would have an Erlang process maintain the timeout queue, and the select() function ask it for the next timeout before starting the loop. And with a message-passing system, it would be simple to detect that the close had already been processed, as messages are processed sequentially.
Away from the details of this particular issue, I think there’s a broader point to make. It seems that even a relatively simple project, with lots of use, that has been designed to avoid the evils of threading, is still likely to contain some nasty threading bugs.
I will be coding future projects to avoid threading altogether. Well, that’s maybe a bit extreme. But I’d certainly want to design all my threads around blocking queues, with no other means of interaction.
My favoured design for my next project involves a collaboration of many communicating sequential processes - one on each core. Each process may be written in whatever language is best for the functionality of that process - C for speed-critical areas and the JVM or Erlang for most others. Of course, once you design your system like this, it’s then easy to move some components onto the network.
Some might invoke the generic tenth rule, and say that I may as well use Erlang for all of the nodes. I would agree, if there weren’t such a small amount of communication required between the nodes (for this project). And if my need for speed in certain areas wasn’t so acute.
I shall also be looking for other ways to learn from the functional languages - for example, enforcing immutability (perhaps with something like this).
If anyone is particularly interested in this, I’d recommend this paper which I was pointed to recently. I agree with almost everything in it - especially the recommendation to avoid C where at all possible.
Anyway, enough rambling - I’d better get back to fixing my nasty threading bug!
- Algorithms , DNS , Java , Applications , Erlang
- Comments(1)

