diff --git a/latest/apis/backends.html b/latest/apis/backends.html index 3167c5d9..dece983e 100644 --- a/latest/apis/backends.html +++ b/latest/apis/backends.html @@ -140,7 +140,7 @@ Backends - + diff --git a/latest/apis/batching.html b/latest/apis/batching.html index 9d985349..711162b9 100644 --- a/latest/apis/batching.html +++ b/latest/apis/batching.html @@ -145,7 +145,7 @@ Batching - + @@ -298,17 +298,12 @@ Right now, the +

+Here's a more detailed illustration of how it might look for code to be "generic across batching". Take for example a weight matrix +W + times a vector +x +, as used in a logistic regression or a simple neural network: +

+
   W    *   x  =>   y
+(10×28) * (28) => (10)
+

+If we want to work with a batch of 50 +x +s, one option is to stack the data into a matrix of size +28 × 50 +. +

+
   W    *    x    =>    y
+(10×28) * (28×50) => (10×50)
+

+This works, but we may find that it's slow or doesn't fit well with the rest of the model, which batches on the first dimension. For that reason we may instead want to put the data in a +50 × 28 + matrix and alter the code as follows: +

+
   x    *    W'   =>    y
+(50×28) * (28×10) => (50×10)
+

+to make the shapes work out. This code change is not ideal; in more complex cases it can become fiddly and error-prone, and it means that the code is less reusable, tied to a particular implementation strategy. +

+

+There's an alternative. We keep the same code, but represent the batched +x +s as either a +Batch{Vector,1} + or a +Batch{Vector,2} +, depending on how the data is stacked. Then we can simply overload +* + as follows: +

+
*(W::Matrix, x::Batch{Vector,1}) = x * W'
+*(W::Matrix, x::Batch{Vector,2}) = W * x
+

+This means that we can always write +W*x +, and the code is reusable in a larger network regardless of the overall batching approach. Moreover, Julia's type system ensures there's no runtime cost to doing this, and we can compile the code appropriately for backends like TensorFlow as well. +