diff --git a/latest/apis/backends.html b/latest/apis/backends.html
index ebdd1f99..e15d2db7 100644
--- a/latest/apis/backends.html
+++ b/latest/apis/backends.html
@@ -129,7 +129,7 @@ Backends
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/apis/backends.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/apis/backends.md">
             <span class="fa">
 
             </span>
diff --git a/latest/apis/batching.html b/latest/apis/batching.html
index 162b6460..d6a9a046 100644
--- a/latest/apis/batching.html
+++ b/latest/apis/batching.html
@@ -83,7 +83,23 @@ Other APIs
               <a class="toctext" href="batching.html">
 Batching
               </a>
-              <ul class="internal"></ul>
+              <ul class="internal">
+                <li>
+                  <a class="toctext" href="#Basics-1">
+Basics
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Sequences-and-Nesting-1">
+Sequences and Nesting
+                  </a>
+                </li>
+                <li>
+                  <a class="toctext" href="#Future-Work-1">
+Future Work
+                  </a>
+                </li>
+              </ul>
             </li>
             <li>
               <a class="toctext" href="backends.html">
@@ -129,7 +145,7 @@ Batching
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/apis/batching.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/apis/batching.md">
             <span class="fa">
 
             </span>
@@ -143,9 +159,164 @@ Batching
 Batching
         </a>
       </h1>
+      <h2>
+        <a class="nav-anchor" id="Basics-1" href="#Basics-1">
+Basics
+        </a>
+      </h2>
       <p>
-[WIP]
+Existing machine learning frameworks and libraries represent batching, and other properties of data, only implicitly. Your machine learning data is a large 
+<code>N</code>
+-dimensional array, which may have a shape like:
       </p>
+<pre><code class="language-julia">100 × 50 × 256 × 256</code></pre>
+      <p>
+Typically, this might represent that you have (say) a batch of 100 samples, where each sample is a 50-long sequence of 256×256 images. This is great for performance, but array operations often become much more cumbersome as a result. Especially if you manipulate dimensions at runtime as an optimisation, debugging models can become extremely fiddly, with a proliferation of 
+<code>X × Y × Z</code>
+ arrays and no information about where they came from.
+      </p>
+      <p>
+Flux introduces a new approach where the batch dimension is represented explicitly as part of the data. For example:
+      </p>
+<pre><code class="language-julia">julia&gt; xs = Batch([[1,2,3], [4,5,6]])
+2-element Batch of Vector{Int64}:
+ [1,2,3]
+ [4,5,6]</code></pre>
+      <p>
+Batches are represented the way we 
+        <em>
+think
+        </em>
+ about them; as an list of data points. We can do all the usual array operations with them, including getting the first with 
+<code>xs[1]</code>
+, iterating over them and so on. The trick is that under the hood, the data is batched into a single array:
+      </p>
+<pre><code class="language-julia">julia&gt; rawbatch(xs)
+2×3 Array{Int64,2}:
+ 1  2  3
+ 4  5  6</code></pre>
+      <p>
+When we put a 
+<code>Batch</code>
+ object into a model, the model is ultimately working with a single array, which means there&#39;s no performance overhead and we get the full benefit of standard batching.
+      </p>
+      <p>
+Turning a set of vectors into a matrix is fairly easy anyway, so what&#39;s the big deal? Well, it gets more interesting as we start working with more complex data. Say we were working with 4×4 images:
+      </p>
+<pre><code class="language-julia">julia&gt; xs = Batch([[1 2; 3 4], [5 6; 7 8]])
+2-element Flux.Batch of Array{Int64,2}:
+ [1 2; 3 4]
+ [5 6; 7 8]</code></pre>
+      <p>
+The raw batch array is much messier, and harder to recognise:
+      </p>
+<pre><code class="language-julia">julia&gt; rawbatch(xs)
+2×2×2 Array{Int64,3}:
+[:, :, 1] =
+ 1  3
+ 5  7
+
+[:, :, 2] =
+ 2  4
+ 6  8</code></pre>
+      <p>
+Furthermore, because the batches acts like a list of arrays, we can use simple and familiar operations on it:
+      </p>
+<pre><code class="language-julia">julia&gt; map(flatten, xs)
+2-element Array{Array{Int64,1},1}:
+ [1,3,2,4]
+ [5,7,6,8]</code></pre>
+      <p>
+<code>flatten</code>
+ is simple enough over a single data point, but flattening a batched data set is more complex and you end up needing arcane array operations like 
+<code>mapslices</code>
+. A 
+<code>Batch</code>
+ can just handle this for you for free, and more importantly it ensures that your operations are 
+        <em>
+correct
+        </em>
+ – that you haven&#39;t mixed up your batch and data dimensions, or used the wrong array op, and so on.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Sequences-and-Nesting-1" href="#Sequences-and-Nesting-1">
+Sequences and Nesting
+        </a>
+      </h2>
+      <p>
+As well as 
+<code>Batch</code>
+, there&#39;s a structure called 
+<code>Seq</code>
+ which behaves very similarly. Let&#39;s say we have two one-hot encoded DNA sequences:
+      </p>
+<pre><code class="language-julia">julia&gt; x1 = Seq([[0,1,0,0], [1,0,0,0], [0,0,0,1]]) # [A, T, C, G]
+julia&gt; x2 = Seq([[0,0,1,0], [0,0,0,1], [0,0,1,0]])
+
+julia&gt; rawbatch(x1)
+3×4 Array{Int64,2}:
+ 0  1  0  0
+ 1  0  0  0
+ 0  0  0  1</code></pre>
+      <p>
+This is identical to 
+<code>Batch</code>
+ so far; but where it gets interesting is that you can actually nest these types:
+      </p>
+<pre><code class="language-julia">julia&gt; xs = Batch([x1, x2])
+2-element Batch of Seq of Vector{Int64}:
+ [[0,1,0,0],[1,0,0,0],[0,0,0,1]]
+ [[0,0,1,0],[0,0,0,1],[0,0,1,0]]</code></pre>
+      <p>
+Again, this represents itself intuitively as a list-of-lists-of-lists, but 
+<code>rawbatch</code>
+ shows that the real underlying value is an 
+<code>Array{Int64,3}</code>
+ of shape 
+<code>2×3×4</code>
+.
+      </p>
+      <h2>
+        <a class="nav-anchor" id="Future-Work-1" href="#Future-Work-1">
+Future Work
+        </a>
+      </h2>
+      <p>
+The design of batching is still a fairly early work in progress, though it&#39;s used in a few places in the system. For example, all Flux models expect to be given 
+<code>Batch</code>
+ objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it&#39;s convenient to call a model with a single data point like 
+<code>f([1,2,3])</code>
+.
+      </p>
+      <p>
+Right now, the 
+<code>Batch</code>
+ or 
+<code>Seq</code>
+ types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:
+      </p>
+      <ul>
+        <li>
+          <p>
+Code can be written in a batch-agnostic way, i.e. as if working with a single data point, with batching happening independently.
+          </p>
+        </li>
+        <li>
+          <p>
+Automatic batching can be done with correctness assured, reducing programmer errors when manipulating dimensions.
+          </p>
+        </li>
+        <li>
+          <p>
+Optimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.
+          </p>
+        </li>
+        <li>
+          <p>
+This also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations.
+          </p>
+        </li>
+      </ul>
       <footer>
         <hr/>
         <a class="previous" href="../models/debugging.html">
diff --git a/latest/contributing.html b/latest/contributing.html
index 958dad86..b0f0beff 100644
--- a/latest/contributing.html
+++ b/latest/contributing.html
@@ -126,7 +126,7 @@ Contributing &amp; Help
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/contributing.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/contributing.md">
             <span class="fa">
 
             </span>
diff --git a/latest/examples/logreg.html b/latest/examples/logreg.html
index 99192b78..205d61a1 100644
--- a/latest/examples/logreg.html
+++ b/latest/examples/logreg.html
@@ -129,7 +129,7 @@ Logistic Regression
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/examples/logreg.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/examples/logreg.md">
             <span class="fa">
 
             </span>
diff --git a/latest/index.html b/latest/index.html
index c3eb3331..4aa49216 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -132,7 +132,7 @@ Home
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/index.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/index.md">
             <span class="fa">
 
             </span>
diff --git a/latest/internals.html b/latest/internals.html
index d9472c2d..734a52bf 100644
--- a/latest/internals.html
+++ b/latest/internals.html
@@ -126,7 +126,7 @@ Internals
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/internals.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/internals.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/basics.html b/latest/models/basics.html
index 787c7f9d..318e7234 100644
--- a/latest/models/basics.html
+++ b/latest/models/basics.html
@@ -145,7 +145,7 @@ Model Building Basics
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/models/basics.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/models/basics.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/debugging.html b/latest/models/debugging.html
index e1a469d8..e7d718ad 100644
--- a/latest/models/debugging.html
+++ b/latest/models/debugging.html
@@ -129,7 +129,7 @@ Debugging
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/models/debugging.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/models/debugging.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/recurrent.html b/latest/models/recurrent.html
index 790a50a6..96c2ee04 100644
--- a/latest/models/recurrent.html
+++ b/latest/models/recurrent.html
@@ -129,7 +129,7 @@ Recurrence
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/models/recurrent.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/models/recurrent.md">
             <span class="fa">
 
             </span>
diff --git a/latest/models/templates.html b/latest/models/templates.html
index e8085b15..6b0af60e 100644
--- a/latest/models/templates.html
+++ b/latest/models/templates.html
@@ -145,7 +145,7 @@ Model Templates
               </a>
             </li>
           </ul>
-          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/bcaea150aa478223ef1d54b21aaab3dd6fa61493/docs/src/models/templates.md">
+          <a class="edit-page" href="https://github.com/MikeInnes/Flux.jl/tree/b0a097316cba8f52245c755e4898a0311db734d2/docs/src/models/templates.md">
             <span class="fa">
 
             </span>
diff --git a/latest/search_index.js b/latest/search_index.js
index d202b2a5..b1e87ba0 100644
--- a/latest/search_index.js
+++ b/latest/search_index.js
@@ -149,7 +149,31 @@ var documenterSearchIndex = {"docs": [
     "page": "Batching",
     "title": "Batching",
     "category": "section",
-    "text": "[WIP]"
+    "text": ""
+},
+
+{
+    "location": "apis/batching.html#Basics-1",
+    "page": "Batching",
+    "title": "Basics",
+    "category": "section",
+    "text": "Existing machine learning frameworks and libraries represent batching, and other properties of data, only implicitly. Your machine learning data is a large N-dimensional array, which may have a shape like:100 × 50 × 256 × 256Typically, this might represent that you have (say) a batch of 100 samples, where each sample is a 50-long sequence of 256×256 images. This is great for performance, but array operations often become much more cumbersome as a result. Especially if you manipulate dimensions at runtime as an optimisation, debugging models can become extremely fiddly, with a proliferation of X × Y × Z arrays and no information about where they came from.Flux introduces a new approach where the batch dimension is represented explicitly as part of the data. For example:julia> xs = Batch([[1,2,3], [4,5,6]])\n2-element Batch of Vector{Int64}:\n [1,2,3]\n [4,5,6]Batches are represented the way we think about them; as an list of data points. We can do all the usual array operations with them, including getting the first with xs[1], iterating over them and so on. The trick is that under the hood, the data is batched into a single array:julia> rawbatch(xs)\n2×3 Array{Int64,2}:\n 1  2  3\n 4  5  6When we put a Batch object into a model, the model is ultimately working with a single array, which means there's no performance overhead and we get the full benefit of standard batching.Turning a set of vectors into a matrix is fairly easy anyway, so what's the big deal? Well, it gets more interesting as we start working with more complex data. Say we were working with 4×4 images:julia> xs = Batch([[1 2; 3 4], [5 6; 7 8]])\n2-element Flux.Batch of Array{Int64,2}:\n [1 2; 3 4]\n [5 6; 7 8]The raw batch array is much messier, and harder to recognise:julia> rawbatch(xs)\n2×2×2 Array{Int64,3}:\n[:, :, 1] =\n 1  3\n 5  7\n\n[:, :, 2] =\n 2  4\n 6  8Furthermore, because the batches acts like a list of arrays, we can use simple and familiar operations on it:julia> map(flatten, xs)\n2-element Array{Array{Int64,1},1}:\n [1,3,2,4]\n [5,7,6,8]flatten is simple enough over a single data point, but flattening a batched data set is more complex and you end up needing arcane array operations like mapslices. A Batch can just handle this for you for free, and more importantly it ensures that your operations are correct – that you haven't mixed up your batch and data dimensions, or used the wrong array op, and so on."
+},
+
+{
+    "location": "apis/batching.html#Sequences-and-Nesting-1",
+    "page": "Batching",
+    "title": "Sequences and Nesting",
+    "category": "section",
+    "text": "As well as Batch, there's a structure called Seq which behaves very similarly. Let's say we have two one-hot encoded DNA sequences:julia> x1 = Seq([[0,1,0,0], [1,0,0,0], [0,0,0,1]]) # [A, T, C, G]\njulia> x2 = Seq([[0,0,1,0], [0,0,0,1], [0,0,1,0]])\n\njulia> rawbatch(x1)\n3×4 Array{Int64,2}:\n 0  1  0  0\n 1  0  0  0\n 0  0  0  1This is identical to Batch so far; but where it gets interesting is that you can actually nest these types:julia> xs = Batch([x1, x2])\n2-element Batch of Seq of Vector{Int64}:\n [[0,1,0,0],[1,0,0,0],[0,0,0,1]]\n [[0,0,1,0],[0,0,0,1],[0,0,1,0]]Again, this represents itself intuitively as a list-of-lists-of-lists, but rawbatch shows that the real underlying value is an Array{Int64,3} of shape 2×3×4."
+},
+
+{
+    "location": "apis/batching.html#Future-Work-1",
+    "page": "Batching",
+    "title": "Future Work",
+    "category": "section",
+    "text": "The design of batching is still a fairly early work in progress, though it's used in a few places in the system. For example, all Flux models expect to be given Batch objects which are unwrapped into raw arrays for the computation. Models will convert their arguments if necessary, so it's convenient to call a model with a single data point like f([1,2,3]).Right now, the Batch or Seq types always stack along the left-most dimension. In future, this will be customisable, and Flux will provide implementations of common functions that are generic across the batch dimension. This brings the following benefits:Code can be written in a batch-agnostic way, i.e. as if working with a single data point, with batching happening independently.\nAutomatic batching can be done with correctness assured, reducing programmer errors when manipulating dimensions.\nOptimisations, like switching batch dimensions, can be expressed by the programmer with compiler support; fewer code changes are required and optimisations are guaranteed not to break the model.\nThis also opens the door for more automatic optimisations, e.g. having the compiler explore the search base of possible batching combinations."
 },
 
 {