-[WIP]
+Existing machine learning frameworks and libraries represent batching, and other properties of data, only implicitly. Your machine learning data is a large
+N
+-dimensional array, which may have a shape like:
+
100 × 50 × 256 × 256
+
+Typically, this might represent that you have (say) a batch of 100 samples, where each sample is a 50-long sequence of 256×256 images. This is great for performance, but array operations often become much more cumbersome as a result. Especially if you manipulate dimensions at runtime as an optimisation, debugging models can become extremely fiddly, with a proliferation of
+X × Y × Z
+ arrays and no information about where they came from.
+
+
+Flux introduces a new approach where the batch dimension is represented explicitly as part of the data. For example:
+
+Batches are represented the way we
+
+think
+
+ about them; as an list of data points. We can do all the usual array operations with them, including getting the first with
+xs[1]
+, iterating over them and so on. The trick is that under the hood, the data is batched into a single array:
+
+When we put a
+Batch
+ object into a model, the model is ultimately working with a single array, which means there's no performance overhead and we get the full benefit of standard batching.
+
+
+Turning a set of vectors into a matrix is fairly easy anyway, so what's the big deal? Well, it gets more interesting as we start working with more complex data. Say we were working with 4×4 images:
+
+flatten
+ is simple enough over a single data point, but flattening a batched data set is more complex and you end up needing arcane array operations like
+mapslices
+. A
+Batch
+ can just handle this for you for free, and more importantly it ensures that your operations are
+
+correct
+
+ – that you haven't mixed up your batch and data dimensions, or used the wrong array op, and so on.
+
+This is identical to
+Batch
+ so far; but where it gets interesting is that you can actually nest these types:
+
+
julia> xs = Batch([x1, x2])
+2-element Batch of Seq of Vector{Int64}:
+ [[0,1,0,0],[1,0,0,0],[0,0,0,1]]
+ [[0,0,1,0],[0,0,0,1],[0,0,1,0]]
+
+Again, this represents itself intuitively as a list-of-lists-of-lists, but
+rawbatch
+ shows that the real underlying value is an
+Array{Int64,3}
+ of shape
+2×3×4
+.
+
+The design of batching is still a fairly early work in progress, though it's used in a few places in the system. For example, all Flux models expect to be given
+Batch
+ objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it's convenient to call a model with a single data point like
+f([1,2,3])
+.
+
+
+Right now, the
+Batch
+ or
+Seq
+ types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:
+
+
+
+
+Code can be written in a batch-agnostic way, i.e. as if working with a single data point, with batching happening independently.
+
+
+
+
+Automatic batching can be done with correctness assured, reducing programmer errors when manipulating dimensions.
+
+
+
+
+Optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.
+
+
+
+
+This also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations.
+