diff --git a/docs/make.jl b/docs/make.jl
index e67de41c..f72237bc 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -22,10 +22,12 @@ makedocs(modules=[Flux, NNlib],
                      "DataLoader" => "data/dataloader.md"],
                   "Training Models" =>
                     ["Optimisers" => "training/optimisers.md",
+                     "Loss Functions" => "training/loss_functions.md",
                      "Training" => "training/training.md"],
                   "GPU Support" => "gpu.md",
                   "Saving & Loading" => "saving.md",
                   "The Julia Ecosystem" => "ecosystem.md",
+                  "Utility Functions" => "utilities.md",
                   "Performance Tips" => "performance.md",
                   "Community" => "community.md"],
          )
diff --git a/docs/src/training/loss_functions.md b/docs/src/training/loss_functions.md
new file mode 100644
index 00000000..ed002a41
--- /dev/null
+++ b/docs/src/training/loss_functions.md
@@ -0,0 +1,13 @@
+# Loss Functions
+
+The following functions provide basic loss (or cost) functions.
+
+```@docs
+Flux.mse
+Flux.crossentropy
+Flux.logitcrossentropy
+Flux.binarycrossentropy
+Flux.logitbinarycrossentropy
+Flux.normalise
+```
+
diff --git a/docs/src/training/training.md b/docs/src/training/training.md
index 903b8197..1fe10783 100644
--- a/docs/src/training/training.md
+++ b/docs/src/training/training.md
@@ -15,7 +15,7 @@ Flux.Optimise.train!
 
 There are plenty of examples in the [model zoo](https://github.com/FluxML/model-zoo).
 
-## Loss Functions
+## Loss
 
 The objective function must return a number representing how far the model is from its target – the *loss* of the model. The `loss` function that we defined in [basics](../models/basics.md) will work as an objective. We can also define an objective in terms of some model:
 
@@ -32,6 +32,7 @@ Flux.train!(loss, ps, data, opt)
 ```
 
 The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
+For a list of all built-in loss functions, check out the [reference](loss_functions.md).
 
 At first glance it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility, and the possibility of optimizing the calculations.
 
diff --git a/docs/src/utilities.md b/docs/src/utilities.md
new file mode 100644
index 00000000..d788e69f
--- /dev/null
+++ b/docs/src/utilities.md
@@ -0,0 +1,43 @@
+# Utility Functions
+
+Flux contains some utility functions for working with data; these functions
+help create inputs for your models or batch your dataset.
+Other functions can be used to initialize your layers or to regularly execute
+callback functions.
+
+## Working with Data
+
+```@docs
+Flux.unsqueeze
+Flux.stack
+Flux.unstack
+Flux.chunk
+Flux.frequencies
+Flux.batch
+Flux.batchseq
+Base.rpad(v::AbstractVector, n::Integer, p)
+```
+
+## Layer Initialization
+
+These are primarily useful if you are planning to write your own layers.
+Flux initializes convolutional layers and recurrent cells with `glorot_uniform`
+by default.
+To change the default on an applicable layer, pass the desired function with the
+`init` keyword. For example:
+```jldoctest; setup = :(using Flux)
+julia> conv = Conv((3, 3), 1 => 8, relu; init=Flux.glorot_normal)
+Conv((3, 3), 1=>8, relu)
+```
+
+```@docs
+Flux.glorot_uniform
+Flux.glorot_normal
+```
+
+## Callback Helpers
+
+```@docs
+Flux.throttle
+```
+