Split a slice or array in a defined number of chunks in golang - but any language really
This is a story of an algorithm that is extremely useful, and it seems super simple to implement. Until you don't try to implement it yourself, and the reality hits you in the face.
Suppose you got a slice or array n
elements, and you want to split it into m
chunks.
So, for instance, we got a slice of 9 elements, and we would like to split it into 3 chunks.
Easy, right?
9 / 3 = 3, hence you go and create 3 slices, and populate them with the elements.
[1, 2, 3, 4, 5, 6, 7, 8, 9] => [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Simple, what if the slice is of 8 elements?
[1, 2, 3, 4, 5, 6, 7, 8] => [[1, 2, 3], [4, 5, 6], [7, 8]]
Still ok! Of course, we cannot expect to always have numbers that divide perfectly.
But how should we split it?
We cannot just divide the length of the slice by the number of chunks, and hope it works.
8/3 = 2.666 that get truncate to 2. But we should use 3.
Ok, then ceil(8/3) = 3
. Where ceil
is the function that returns the smaller integer greater or equal than its input.
Ok it works for 8 and 3. But what about dividing an array of length 128 into 12 chunks?
ceil(128/12) = 11
but in this way the last chunk will be very different from all the others. The first 11 chunks will contain 11 elements each, while the last chunk will contain only (128 - (11 * 11)) = 7 elements.
And this problem get worse and worse as we increase the number of chunks.
Ideally we would like to have all the chunks of the same size. If this is not possible, the difference in size between the biggest and the smallest chunk should be 1.
The trick for solving this problem, is to use multiplication and division to our advantage.
A naive approach will do something like:
var result [][]int
n := len(array) / numberOfChunks
for i := 0; i < numberOfChunks; i++ {
min := i * n
max := (i + 1) * n
result = append(result, array[min:max])
}
But this won't work, failing in the traps highlighted above.
The solution it is to avoid cache the size of the chunks, and decide, at each iteration, how long the chunk is going to be.
var result [][]int
for i := 0; i < numberOfChunks; i++ {
min := (i * len(array) / numberOfChunks)
max := ((i + 1) * len(array)) / numberOfChunks
result = append(result, array[min:max])
}
Simplifying, the reason for this behavior, is that division is more accurate with bigger numbers.
Hence, the result of i*len(array) / numberOfChunk
is more accurate than the result of i*n
even if n := len(array) / numberOfChunk
Overall, the second code will yield chunks always of similar size, with maximum one element of difference between the biggest and smallest element.
The core of the algorithm is the snippet above, but I created a tiny github gist, with the more complete function taking care of all edge conditions and multiple tests.