is a first-class feature in Flux and recurrent models are very easy to build and use. Recurrences are often illustrated as cycles or self-dependencies in the graph; they can also be thought of as a hidden output from / input to the network. For example, for a sequence of inputs
<code>x1, x2, x3 ...</code>
we produce predictions as follows:
</p>
<pre><codeclass="language-julia">y1 = f(W, x1) # `f` is the model, `W` represents the parameters
y2 = f(W, x2)
y3 = f(W, x3)
...</code></pre>
<p>
Each evaluation is independent and the prediction made for a given input will always be the same. That makes a lot of sense for, say, MNIST images, but less sense when predicting a sequence. For that case we introduce the hidden state:
</p>
<pre><codeclass="language-julia">y1, s = f(W, x1, s)
y2, s = f(W, x2, s)
y3, s = f(W, x3, s)
...</code></pre>
<p>
The state
<code>s</code>
allows the prediction to depend not only on the current input
<code>x</code>
but also on the history of past inputs.
</p>
<p>
The simplest recurrent network looks as follows in Flux, and it should be familiar if you've seen the equations defining an RNN before:
</p>
<pre><codeclass="language-julia">@net type Recurrent
Wxy; Wyy; by
y
function (x)
y = tanh( x * Wxy + y{-1} * Wyy + by )
end
end</code></pre>
<p>
The only difference from a regular feed-forward layer is that we create a variable
<code>y</code>
which is defined as depending on itself. The
<code>y{-1}</code>
syntax means "take the value of
<code>y</code>
from the previous run of the network".
</p>
<p>
Using recurrent layers is straightforward and no different feedforard ones in terms of the
<code>Chain</code>
macro etc. For example:
</p>
<pre><codeclass="language-julia">model = Chain(
Affine(784, 20), σ
Recurrent(20, 30),
Recurrent(30, 15))</code></pre>
<p>
Before using the model we need to unroll it. This happens with the
This call creates an unrolled, feed-forward version of the model which accepts N (= 20) inputs and generates N predictions at a time. Essentially, the model is replicated N times and Flux ties the hidden outputs
<code>y</code>
to hidden inputs.
</p>
<p>
Here's a more complex recurrent layer, an LSTM, and again it should be familiar if you've seen the