Monday, October 20, 2014

Spikes in RabbitMQ + NodeJS latency test

Measuring things is a fun way to confront assumptions we put everywhere. In software development products tend to be based on tens of assumptions, such as that network will be fast enough, REST or database transaction time short enough, success rate for external API calls within reasonable range. Knowing the limits of those assumptions is always worth at least one beer.

The other day I was looking how long it takes my message queue, RabbitMQ, to pass messages from producer to consumer. RabbitMQ has loads of options: queues, topics, durability, acknowledgements, clustering to name a few. It's not obvious how those different options affect latency.  I wrote two little NodeJS processes - producer and consumer, based on samples from RabbitMQ tutorial, to measure how long it takes a message to fly from one process to the other. Those processes would just run many times with various configs to produce data - big latency table.

Unfortunately, the very basic scenario turned out to be a puzzle. As fast as possible, producer inserts small (few bytes) non-durable messages directly into queue. Consumer has unlimited prefetchCount and uses auto-ack to eat from that queue. All processes run on single host, so networking is negligible (I assume!). For four runs, 500 messages each, latency can be visualised as follows:



While most messages spend 1-3ms in travel there are a few that took 10-50ms. I'm not quite sure yet where do these spikes come from. There's few options like RabbitMQ fault, NodeJS fault, AMQP library, some garbage collector (Erlang and NodeJS have it), JS testing code fault or the environment.
Environment seems not the case, because every result looks similar to the above, and I ran dozen of them on few different machines. GC seems unlikely because for so small data amount, memory should hardly notice anything.

I'm going to run similar tests using producer+consumer pair written in some other language and we'll see what comes out of it, but maybe you my reader have answers?