Speed up your code with micro-batching

Usually when doing any IO, either over network or against disk, there are always at least two options.

  1. Execute each IO operation by itself.
  2. Execute all the IO operation together.

Each of this two operation comes with tradeoffs.

If you execute each IO operation by itself, the latency, for each operation, will be smaller. But the overall throughput will be lower. Doing each operation is faster, but complete all of them is slower.

On the other hand, if you do all the operations together, the latency will be higher, but the overall throughput will be higher. You will wait more for the result of the operation, but all the operation will finish earlier.

Of course, these two options are just the extremes of a range of possibilities.

You could create IO requests with any number of elements, between 1 and, well, all you got.

A useful approach it is to group request together in small batches. This approach is colloquially referred to as micro batching.

Micro batching means to put in a single request multiple IO operations, not ALL of them, but not even just one request.

Micro batching allows achieving very interesting performance tradeoffs. Getting a latency similar to the case of a single IO operation AND throughput closer to the one of batching all the requests together.

Picking the correct size - or how to quantify micro

When approaching micro-batching, the obvious question is: "How big should the batches actually be?"

And, of course, there is not a clear answer, and it depends. It depends on the system you are building and on the SLAs that are imposed.

A good approach to determine the actual batch size, is to make is similar to the overall latency of the IO operations.

Each IO system has some latency, which is the sum of:

  1. Getting the request data to the IO subsystem - if we are talking about network IO, this is the time necessary for a packet to reach the server from the client. If we are talking about disk IO, it is the time necessary to go from the system call to the driver of the disk.
  2. Actually executing the request - for instance the time necessary to the server to compute the response, or the time necessary to the disk to read data.
  3. Getting the response back from the IO subsystem.

So we have communication latency times 2 (point 1 and 3) and then execution latency (point 2).

The communication latency does not really change much between micro-batching and executing a single request.

The execution latency increase proportionally with the size of the micro-batch.

Under the assumption that the communication latency is much higher than the execution latency, a reasonable assumption in cloud application. A first approximation is to use a size of micro-batching such that the execution latency is roughly equal to the communication latency, or at least, in the same ballpark.

This allows a modest increase of the total latency while allowing a huge number of requests to be executed together.

Other constrains need to be considered.

For instance the IO subsystem could not support any number of batched requests, you may realize that the ideal size for your specific use case is roughly 1000, but the subsystem supports batch of up to 100 requests.

Using micro-batching increase, albeit just slightly, the overall complexity of the system. Sending a new request for each IO operation is of course simpler. And it is also simpler to send all the requests together in the same call.

Micro-batching does not really help when the communication latency is similar to the execution latency.

Avoid waiting for requests to fulfil your micro-batch. If you don't have enough IO operations to execute, it is preferable to send a micro-batch that is not completely full, than to artificially wait.

Overall, micro-batching is a great way to improve the latency and the overall performance of any IO bound system.