docs updates
This commit is contained in:
parent
366efa92ab
commit
fedee95b14
|
@ -10,12 +10,11 @@ makedocs(modules=[Flux],
|
|||
"Models" =>
|
||||
["Basics" => "models/basics.md",
|
||||
"Recurrence" => "models/recurrence.md",
|
||||
"Layers" => "models/layers.md"],
|
||||
"Layer Reference" => "models/layers.md"],
|
||||
"Contributing & Help" => "contributing.md"])
|
||||
|
||||
deploydocs(
|
||||
repo = "github.com/FluxML/Flux.jl.git",
|
||||
# modules = [Flux],
|
||||
target = "build",
|
||||
osname = "linux",
|
||||
julia = "0.6",
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Contributing
|
||||
# Contributing & Help
|
||||
|
||||
If you need help, please ask on the [Julia forum](https://discourse.julialang.org/), the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning), or Flux's [Gitter](https://gitter.im/FluxML/Lobby).
|
||||
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
# Model-Building Basics
|
||||
|
||||
## Taking Gradients
|
||||
|
||||
Consider a simple linear regression, which tries to predict an output array `y` from an input `x`. (It's a good idea to follow this example in the Julia repl.)
|
||||
|
@ -31,14 +33,14 @@ back!(l)
|
|||
```julia
|
||||
grad(W)
|
||||
|
||||
W.data .-= grad(W)
|
||||
W.data .-= 0.1grad(W)
|
||||
|
||||
loss(x, y) # ~ 2.5
|
||||
```
|
||||
|
||||
The loss has decreased a little, meaning that our prediction `x` is closer to the target `y`. If we have some data we can already try [training the model](../training/training.html).
|
||||
|
||||
All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, not all models look like this – they might have millions of parameters or complex control flow, and Flux provides ways to manage this complexity. Let's see what that looks like.
|
||||
All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can *look* very different – they might have millions of parameters or complex control flow, and there are ways to manage this complexity. Let's see what that looks like.
|
||||
|
||||
## Building Layers
|
||||
|
||||
|
|
|
@ -0,0 +1,114 @@
|
|||
## Recurrent Cells
|
||||
|
||||
In the simple feedforward case, our model `m` is a simple function from various inputs `xᵢ` to predictions `yᵢ`. (For example, each `x` might be an MNIST digit and each `y` a digit label.) Each prediction is completely independent of any others, and using the same `x` will always produce the same `y`.
|
||||
|
||||
```julia
|
||||
y₁ = f(x₁)
|
||||
y₂ = f(x₂)
|
||||
y₃ = f(x₃)
|
||||
# ...
|
||||
```
|
||||
|
||||
Recurrent networks introduce a *hidden state* that gets carried over each time we run the model. The model now takes the old `h` as an input, and produces a new `h` as output, each time we run it.
|
||||
|
||||
```julia
|
||||
h = # ... initial state ...
|
||||
y₁, h = f(x₁, h)
|
||||
y₂, h = f(x₂, h)
|
||||
y₃, h = f(x₃, h)
|
||||
# ...
|
||||
```
|
||||
|
||||
Information stored in `h` is preserved for the next prediction, allowing it to function as a kind of memory. This also means that the prediction made for a given `x` depends on all the inputs previously fed into the model.
|
||||
|
||||
(This might be important if, for example, each `x` represents one word of a sentence; the model's interpretation of the word "bank" should change if the previous input was "river" rather than "investment".)
|
||||
|
||||
Flux's RNN support closely follows this mathematical perspective. The most basic RNN is as close as possible to a standard `Dense` layer, and the output and hidden state are the same. By convention, the hidden state is the first input and output.
|
||||
|
||||
```julia
|
||||
Wxh = randn(5, 10)
|
||||
Whh = randn(5, 5)
|
||||
b = randn(5)
|
||||
|
||||
function rnn(h, x)
|
||||
h = tanh.(Wxh * x .+ Whh * h .+ b)
|
||||
return h, h
|
||||
end
|
||||
|
||||
x = rand(10) # dummy data
|
||||
h = rand(5) # initial hidden state
|
||||
|
||||
h, y = rnn(h, x)
|
||||
```
|
||||
|
||||
If you run the last line a few times, you'll notice the output `y` changing slightly even though the input `x` is the same.
|
||||
|
||||
We sometimes refer to functions like `rnn` above, which explicitly manage state, as recurrent *cells*. There are various recurrent cells available, which are documented in the [layer reference](layers.html). The hand-written example above can be replaced with:
|
||||
|
||||
```julia
|
||||
using Flux
|
||||
|
||||
m = Flux.RNNCell(10, 5)
|
||||
|
||||
x = rand(10) # dummy data
|
||||
h = rand(5) # initial hidden state
|
||||
|
||||
h, y = rnn(h, x)
|
||||
```
|
||||
|
||||
## Stateful Models
|
||||
|
||||
For the most part, we don't want to manage hidden states ourselves, but to treat our models as being stateful. Flux provides the `Recur` wrapper to do this.
|
||||
|
||||
```julia
|
||||
x = rand(10)
|
||||
h = rand(5)
|
||||
|
||||
m = Flux.Recur(rnn, h)
|
||||
|
||||
y = m(x)
|
||||
```
|
||||
|
||||
The `Recur` wrapper stores the state between runs in the `m.state` field.
|
||||
|
||||
If you use the `RNN(10, 5)` constructor – as opposed to `RNNCell` – you'll see that it's simply a wrapped cell.
|
||||
|
||||
```julia
|
||||
julia> RNN(10, 5)
|
||||
Recur(RNNCell(Dense(15, 5)))
|
||||
```
|
||||
|
||||
## Sequences
|
||||
|
||||
Often we want to work with sequences of inputs, rather than individual `x`s.
|
||||
|
||||
```julia
|
||||
seq = [rand(10) for i = 1:10]
|
||||
```
|
||||
|
||||
With `Recur`, applying our model to each element of a sequence is trivial:
|
||||
|
||||
```julia
|
||||
map(m, seq) # returns a list of 5-element vectors
|
||||
```
|
||||
|
||||
To make this a bit more convenient, Flux has the `Seq` type. This is just a list, but tagged so that we know it's meant to be used as a sequence of data points.
|
||||
|
||||
```julia
|
||||
seq = Seq([rand(10) for i = 1:10])
|
||||
m(seq) # returns a new Seq of length 10
|
||||
```
|
||||
|
||||
When we apply the model `m` to a seq, it gets mapped over every item in the sequence in order. This is just like the code above, but often more convenient.
|
||||
|
||||
## Truncating Gradients
|
||||
|
||||
By default, calculating the gradients in a recurrent layer involves the entire history. For example, if we call the model on 100 inputs, calling `back!` will calculate the gradient for those 100 calls. If we then calculate another 10 inputs we have to calculate 110 gradients – this accumulates and quickly becomes expensive.
|
||||
|
||||
To avoid this we can *truncate* the gradient calculation, forgetting the history.
|
||||
|
||||
```julia
|
||||
truncate!(m)
|
||||
```
|
||||
|
||||
Calling `truncate!` wipes the slate clean, so we can call the model with more inputs without building up an expensive gradient computation.
|
|
@ -9,7 +9,7 @@ on a given input.
|
|||
|
||||
m = Chain(Dense(10, 5), Dense(5, 2))
|
||||
x = rand(10)
|
||||
m(x) = m[2](m[1](x))
|
||||
m(x) == m[2](m[1](x))
|
||||
|
||||
`Chain` also supports indexing and slicing, e.g. `m[2]` or `m[1:end-1]`.
|
||||
"""
|
||||
|
@ -42,6 +42,9 @@ end
|
|||
Creates a traditional `Dense` layer with parameters `W` and `b`.
|
||||
|
||||
y = σ.(W * x .+ b)
|
||||
|
||||
The input `x` must be a vector of length `in`, or a batch of vectors represented
|
||||
as an `in × N` matrix. The out `y` will be a vector or batch of length `in`.
|
||||
"""
|
||||
struct Dense{F,S,T}
|
||||
σ::F
|
||||
|
|
Loading…
Reference in New Issue