This article is about some specific things I learned about Apache ZooKeeper. Apache ZooKeeper is useful if you are writing a distributed application and need coordination between processes for things like configuration, group membership, leader election, locking and the like.
ZooKeeper has quite good documentation, and I have provided pointers where applicable, but maybe writing from my own experience can help shorten someone else’s learning curve.
Connection loss and session expiration
A central issue you need to think about when implementing ZooKeeper is how to deal with connection loss and session expiration. Since ZooKeeper is the central coordination service, once an application loses its connection with ZooKeeper it is left in an uncertain situation.
To survive temporary network hickups, or the death of the particular ZooKeeper server you are connected to, ZooKeeper has the notion of a session timeout. If you are able to reconnect to one of the servers in the ZooKeeper ensemble within this timeout, the ephemeral nodes and watches created by the application will not be lost. Various reasons for connection problems are described in the Troubleshooting page on the wiki.
ZooKeeper reports the loss of connection to the application as soon as it happens, through a Disconnected event sent to all watchers. Once the connection is reestablished, it reports a SyncConnected event. Even if the connection would only be lost for a very small amount of time, you will get these two events.
If an application only succeeds to reconnect to a ZooKeeper server after the session timeout passed, it will get an Expired event. The important thing to note here is that this event is produced by the server. If the ZooKeeper client cannot connect to the server for any amount of time, even if it is much longer than the session timeout (like hours), your application will not get any event besides the initial Disconnected event. At first I did not understand why the ZooKeeper client cannot produce this event locally, if it has not been able to reestablish the connection within the session timeout. Turns out there is a reason for it (same here).
How to deal with connection loss is pretty much a decision to be made for each application. If the application is just some sort of ‘worker’ it could pause until the connection is back. If the application is elected as leader, can it continue to assume it is the leader when it is not longer connect to ZooKeeper? See this post on active & passive leaders and also this one for some guidelines. If the application is a server, should it continue to accept client requests knowing it might be partitioned from other servers or not have the latest configuration information from ZooKeeper? These are the sorts of things you need to decide on. Here is another post suggesting some best practices. Once you delve into the mailing list archives you will find plenty of discussions related to dealing with connection loss.
Session expiration is less common than connection loss and harder to recover from, as you need to handle reconnection yourself by creating a new ZooKeeper handle. An easy solution is to simply restart the application at this point.
Dealing with ConnectionLoss exceptions
When connection loss occurs, any running Zookeeper operation (like getChildren, get/setData, create, delete) will throw a ConnectionLoss exception. Somehow you have to deal with these. One way is to retry the operation until it succeeds (= until the connection is back). For example in the lock recipe found in the ZooKeeper source tree, they use this pattern, see the ProtocolSupport.retryOperation method. This technique is also present in the ZKClient library.
When an operation throws a ConnectionLoss exception, you are not sure if it succeeded or not. The connection might have failed before or after it got through to the server. So you have to consider this when retrying the operation. A create might fail with a NodeExists exception, a delete might fail with a NoNode exception.
Suppose you implement a ZooKeeper-based lock using the creation of ephemeral sequence nodes. If you retry the operation when it fails, you could have created two ephemeral nodes, but you will not know the name of the first. You can solve this by looping over the znodes and checking the Stat.ephemeralOwner field. If you would not retry the operation, and e.g. simply exit the lock-taking procedure, you might have left an epehemeral node and hence make it impossible for others to obtain the lock.
The ErrorHandling page on the wiki contains some more detail on this topic.
Storing multiple properties in the data of one node
Given ZooKeeper’s tree model, it might be tempting to store an entity as a znode with subnodes for each of the properties of the entity. This allows for fine-grained reading, writing, and watching.
But with this approach you cannot create the entity atomically. As a consequence, clients can see it in a partially-created, partially-updated or partially-deleted state. If you want to watch the complete state of the entity, you would need to install watchers on all of the individual nodes. So at second sight, this becomes quite complex.
Therefore, I found it easier to store certain configuration entities in their entirety in the data of one node. In my case, I stored it as JSON.
Here is a mail about this topic.
If you are in a situation where storing it all in one znode is not desirable, another technique is the one of the ready znode described in the ZooKeeper paper.
The single event thread
Each ZooKeeper handle (= ZooKeeper object instance) has one thread for dispatching the events to all the watchers. So watchers are called in sequence. If you perform some time consuming action in a watcher, all the other watchers will have to wait before they get a chance to process their event.
Thus: do not do anything time-consuming in a watcher, such as IO, waiting on a lock, etc. One case to watch out for is not to do something which in itself might again wait for a ZooKeeper event. I made for example this mistake trying to take a ZooKeeper-based lock (of which the implementation details where in another class) in a watcher callback. In this particular situation, it was easy enough to push work on a queue that is processed by a different thread.
If you use multiple ZooKeeper handles in your application, each has its own event dispatching thread, so you only have to worry about this for all the users on one particular ZooKeeper handle. It seems common practice to have just one ZooKeeper handle though.
The event thread is also described in the Programmer’s Guide.
Registering the same watcher object multiple times for the same event
If you register the same watcher object (= the exact same instance) for the same event multiple times, it will be called only once. Thus if you do:
Watcher w = …
The watcher w will be called only once when the children change. See the Programmer’s Guide where it mentions “A watch object, or function/context pair, will only be triggered once for a given notification.”.
I found this very handy in the following situation. Suppose you install a watch, it gets triggered, and in the Watcher callback you reinstall the same watch. Since watchers are one-time triggers this is a common thing to do. Now it could be that reinstalling the watch fails due to a ConnectionLoss exception. As mentioned earlier, if you get a ConnectionLoss exception, you do not know if the operation succeeded, so you do not know if installing the watch succeeded. And since you are in a watcher callback, doing something blocking like retrying the operation until it succeeds is not a good idea.
Therefore, in such case you can simply install the watcher everytime you get a SyncConnected event, without worrying that the watcher will then be called twice when an event occurs.
Many ZooKeeper’s API methods throw InterruptedException. This is a checked exception, so you are forced to do something with it. This might seem annoying at first, but InterruptedException is actually a very useful exception: when a method throws InterruptedException, it tells you there is a possibility to interrupt it.
What is the best way to handle InterruptedException? Well, there is a nice article about this by Brian Goetz. Basically, you should make sure not to stop the interruption of the thread. Either by throwing on the InterruptedException, or if this is not possible, by re-enabling the interrupted flag (call Thread.currentThread().interrupt).
To locally test your application against a ZooKeeper ensemble, for example to test how it reacts to session moves, be sure to check out zkconf: it enables to set up an ensemble in seconds. Do not forget to use odd numbers of servers (see ZooKeeper admin guide).
As developer, be sure to also read the ZooKeeper Administrator’s Guide. There is useful information in there such as the “four letter word” commands you can use to query some information and metrics. Also check out zktop.