docs updates
This commit is contained in:
parent
366efa92ab
commit
fedee95b14
@ -10,12 +10,11 @@ makedocs(modules=[Flux],
|
|||||||
"Models" =>
|
"Models" =>
|
||||||
["Basics" => "models/basics.md",
|
["Basics" => "models/basics.md",
|
||||||
"Recurrence" => "models/recurrence.md",
|
"Recurrence" => "models/recurrence.md",
|
||||||
"Layers" => "models/layers.md"],
|
"Layer Reference" => "models/layers.md"],
|
||||||
"Contributing & Help" => "contributing.md"])
|
"Contributing & Help" => "contributing.md"])
|
||||||
|
|
||||||
deploydocs(
|
deploydocs(
|
||||||
repo = "github.com/FluxML/Flux.jl.git",
|
repo = "github.com/FluxML/Flux.jl.git",
|
||||||
# modules = [Flux],
|
|
||||||
target = "build",
|
target = "build",
|
||||||
osname = "linux",
|
osname = "linux",
|
||||||
julia = "0.6",
|
julia = "0.6",
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
# Contributing
|
# Contributing & Help
|
||||||
|
|
||||||
If you need help, please ask on the [Julia forum](https://discourse.julialang.org/), the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning), or Flux's [Gitter](https://gitter.im/FluxML/Lobby).
|
If you need help, please ask on the [Julia forum](https://discourse.julialang.org/), the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning), or Flux's [Gitter](https://gitter.im/FluxML/Lobby).
|
||||||
|
|
||||||
|
@ -1,3 +1,5 @@
|
|||||||
|
# Model-Building Basics
|
||||||
|
|
||||||
## Taking Gradients
|
## Taking Gradients
|
||||||
|
|
||||||
Consider a simple linear regression, which tries to predict an output array `y` from an input `x`. (It's a good idea to follow this example in the Julia repl.)
|
Consider a simple linear regression, which tries to predict an output array `y` from an input `x`. (It's a good idea to follow this example in the Julia repl.)
|
||||||
@ -31,14 +33,14 @@ back!(l)
|
|||||||
```julia
|
```julia
|
||||||
grad(W)
|
grad(W)
|
||||||
|
|
||||||
W.data .-= grad(W)
|
W.data .-= 0.1grad(W)
|
||||||
|
|
||||||
loss(x, y) # ~ 2.5
|
loss(x, y) # ~ 2.5
|
||||||
```
|
```
|
||||||
|
|
||||||
The loss has decreased a little, meaning that our prediction `x` is closer to the target `y`. If we have some data we can already try [training the model](../training/training.html).
|
The loss has decreased a little, meaning that our prediction `x` is closer to the target `y`. If we have some data we can already try [training the model](../training/training.html).
|
||||||
|
|
||||||
All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, not all models look like this – they might have millions of parameters or complex control flow, and Flux provides ways to manage this complexity. Let's see what that looks like.
|
All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can *look* very different – they might have millions of parameters or complex control flow, and there are ways to manage this complexity. Let's see what that looks like.
|
||||||
|
|
||||||
## Building Layers
|
## Building Layers
|
||||||
|
|
||||||
|
@ -0,0 +1,114 @@
|
|||||||
|
## Recurrent Cells
|
||||||
|
|
||||||
|
In the simple feedforward case, our model `m` is a simple function from various inputs `xᵢ` to predictions `yᵢ`. (For example, each `x` might be an MNIST digit and each `y` a digit label.) Each prediction is completely independent of any others, and using the same `x` will always produce the same `y`.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
y₁ = f(x₁)
|
||||||
|
y₂ = f(x₂)
|
||||||
|
y₃ = f(x₃)
|
||||||
|
# ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Recurrent networks introduce a *hidden state* that gets carried over each time we run the model. The model now takes the old `h` as an input, and produces a new `h` as output, each time we run it.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
h = # ... initial state ...
|
||||||
|
y₁, h = f(x₁, h)
|
||||||
|
y₂, h = f(x₂, h)
|
||||||
|
y₃, h = f(x₃, h)
|
||||||
|
# ...
|
||||||
|
```
|
||||||
|
|
||||||
|
Information stored in `h` is preserved for the next prediction, allowing it to function as a kind of memory. This also means that the prediction made for a given `x` depends on all the inputs previously fed into the model.
|
||||||
|
|
||||||
|
(This might be important if, for example, each `x` represents one word of a sentence; the model's interpretation of the word "bank" should change if the previous input was "river" rather than "investment".)
|
||||||
|
|
||||||
|
Flux's RNN support closely follows this mathematical perspective. The most basic RNN is as close as possible to a standard `Dense` layer, and the output and hidden state are the same. By convention, the hidden state is the first input and output.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
Wxh = randn(5, 10)
|
||||||
|
Whh = randn(5, 5)
|
||||||
|
b = randn(5)
|
||||||
|
|
||||||
|
function rnn(h, x)
|
||||||
|
h = tanh.(Wxh * x .+ Whh * h .+ b)
|
||||||
|
return h, h
|
||||||
|
end
|
||||||
|
|
||||||
|
x = rand(10) # dummy data
|
||||||
|
h = rand(5) # initial hidden state
|
||||||
|
|
||||||
|
h, y = rnn(h, x)
|
||||||
|
```
|
||||||
|
|
||||||
|
If you run the last line a few times, you'll notice the output `y` changing slightly even though the input `x` is the same.
|
||||||
|
|
||||||
|
We sometimes refer to functions like `rnn` above, which explicitly manage state, as recurrent *cells*. There are various recurrent cells available, which are documented in the [layer reference](layers.html). The hand-written example above can be replaced with:
|
||||||
|
|
||||||
|
```julia
|
||||||
|
using Flux
|
||||||
|
|
||||||
|
m = Flux.RNNCell(10, 5)
|
||||||
|
|
||||||
|
x = rand(10) # dummy data
|
||||||
|
h = rand(5) # initial hidden state
|
||||||
|
|
||||||
|
h, y = rnn(h, x)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Stateful Models
|
||||||
|
|
||||||
|
For the most part, we don't want to manage hidden states ourselves, but to treat our models as being stateful. Flux provides the `Recur` wrapper to do this.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
x = rand(10)
|
||||||
|
h = rand(5)
|
||||||
|
|
||||||
|
m = Flux.Recur(rnn, h)
|
||||||
|
|
||||||
|
y = m(x)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `Recur` wrapper stores the state between runs in the `m.state` field.
|
||||||
|
|
||||||
|
If you use the `RNN(10, 5)` constructor – as opposed to `RNNCell` – you'll see that it's simply a wrapped cell.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
julia> RNN(10, 5)
|
||||||
|
Recur(RNNCell(Dense(15, 5)))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Sequences
|
||||||
|
|
||||||
|
Often we want to work with sequences of inputs, rather than individual `x`s.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
seq = [rand(10) for i = 1:10]
|
||||||
|
```
|
||||||
|
|
||||||
|
With `Recur`, applying our model to each element of a sequence is trivial:
|
||||||
|
|
||||||
|
```julia
|
||||||
|
map(m, seq) # returns a list of 5-element vectors
|
||||||
|
```
|
||||||
|
|
||||||
|
To make this a bit more convenient, Flux has the `Seq` type. This is just a list, but tagged so that we know it's meant to be used as a sequence of data points.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
seq = Seq([rand(10) for i = 1:10])
|
||||||
|
m(seq) # returns a new Seq of length 10
|
||||||
|
```
|
||||||
|
|
||||||
|
When we apply the model `m` to a seq, it gets mapped over every item in the sequence in order. This is just like the code above, but often more convenient.
|
||||||
|
|
||||||
|
## Truncating Gradients
|
||||||
|
|
||||||
|
By default, calculating the gradients in a recurrent layer involves the entire history. For example, if we call the model on 100 inputs, calling `back!` will calculate the gradient for those 100 calls. If we then calculate another 10 inputs we have to calculate 110 gradients – this accumulates and quickly becomes expensive.
|
||||||
|
|
||||||
|
To avoid this we can *truncate* the gradient calculation, forgetting the history.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
truncate!(m)
|
||||||
|
```
|
||||||
|
|
||||||
|
Calling `truncate!` wipes the slate clean, so we can call the model with more inputs without building up an expensive gradient computation.
|
@ -9,7 +9,7 @@ on a given input.
|
|||||||
|
|
||||||
m = Chain(Dense(10, 5), Dense(5, 2))
|
m = Chain(Dense(10, 5), Dense(5, 2))
|
||||||
x = rand(10)
|
x = rand(10)
|
||||||
m(x) = m[2](m[1](x))
|
m(x) == m[2](m[1](x))
|
||||||
|
|
||||||
`Chain` also supports indexing and slicing, e.g. `m[2]` or `m[1:end-1]`.
|
`Chain` also supports indexing and slicing, e.g. `m[2]` or `m[1:end-1]`.
|
||||||
"""
|
"""
|
||||||
@ -42,6 +42,9 @@ end
|
|||||||
Creates a traditional `Dense` layer with parameters `W` and `b`.
|
Creates a traditional `Dense` layer with parameters `W` and `b`.
|
||||||
|
|
||||||
y = σ.(W * x .+ b)
|
y = σ.(W * x .+ b)
|
||||||
|
|
||||||
|
The input `x` must be a vector of length `in`, or a batch of vectors represented
|
||||||
|
as an `in × N` matrix. The out `y` will be a vector or batch of length `in`.
|
||||||
"""
|
"""
|
||||||
struct Dense{F,S,T}
|
struct Dense{F,S,T}
|
||||||
σ::F
|
σ::F
|
||||||
|
Loading…
Reference in New Issue
Block a user