1028: Common questions answered in docs r=CarloLucibello a=dhairyagandhi96

cc @MikeInnes 

1070: Prevent breakage due to new `active` field in normalise layers r=CarloLucibello a=ianshmean

Prevents breakage where the normalise structs, such as `BatchNorm`, have been directly defined but missing the new `active` field

cc. @darsnack 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Ian <i.r.butterworth@gmail.com>
This commit is contained in:
bors[bot] 2020-03-04 00:10:39 +00:00 committed by GitHub
commit 94ba1e8ede
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 81 additions and 3 deletions

View File

@ -1,6 +1,6 @@
name = "Flux"
uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c"
version = "0.10.2"
version = "0.10.3"
[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"

View File

@ -8,6 +8,7 @@ makedocs(modules=[Flux, NNlib],
"Recurrence" => "models/recurrence.md",
"Regularisation" => "models/regularisation.md",
"Model Reference" => "models/layers.md",
"Advanced Model Building" => "models/advanced.md",
"NNlib" => "models/nnlib.md"],
"Handling Data" =>
["One-Hot Encoding" => "data/onehot.md",

View File

@ -0,0 +1,61 @@
# Advanced Model Building and Customisation
Here we will try and describe usage of some more advanced features that Flux provides to give more control over model building.
## Customising Parameter Collection for a Model
Taking reference from our example `Affine` layer from the [basics](basics.md#Building-Layers-1).
By default all the fields in the `Affine` type are collected as its parameters, however, in some cases it may be desired to hold other metadata in our "layers" that may not be needed for training, and are hence supposed to be ignored while the parameters are collected. With Flux, it is possible to mark the fields of our layers that are trainable in two ways.
The first way of achieving this is through overloading the `trainable` function.
```julia-repl
julia> @functor Affine
julia> a = Affine(rand(3,3), rand(3))
Affine{Array{Float64,2},Array{Float64,1}}([0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955])
julia> Flux.params(a) # default behavior
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955]])
julia> Flux.trainable(a::Affine) = (a.W, a.b,)
julia> Flux.params(a)
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297]])
```
Only the fields returned by `trainable` will be collected as trainable parameters of the layer when calling `Flux.params`.
Another way of achieving this is through the `@functor` macro directly. Here, we can mark the fields we are interested in by grouping them in the second argument:
```julia
Flux.@functor Affine (W,)
```
However, doing this requires the `struct` to have a corresponding constructor that accepts those parameters.
## Freezing Layer Parameters
When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`.
Consider the simple multi-layer model where we want to omit optimising the first two `Dense` layers. This setup would look something like so:
```julia
m = Chain(
Dense(784, 64, σ),
Dense(64, 32),
Dense(32, 10), softmax)
ps = Flux.params(m[3:end])
```
`ps` now holds a reference to only the parameters of the layers passed to it.
During training, now the gradients would only be applied to the last `Dense` layer (and the `softmax` layer, but that is stateless so doesn't have any parameters), so only that would have its parameters changed.
`Flux.params` also takes multiple inputs to make it easy to collect parameters from heterogenous models with a single call. A simple demonstration would be if we wanted to omit optimising the second `Dense` layer in the previous example. It would look something like this:
```julia
Flux.params(m[1], m[3:end])
```

View File

@ -69,8 +69,8 @@ b = rand(2)
predict(x) = W*x .+ b
function loss(x, y)
= predict(x)
sum((y .- ).^2)
ŷ = predict(x)
sum((y .- ŷ).^2)
end
x, y = rand(5), rand(2) # Dummy data
@ -220,6 +220,8 @@ Flux.@functor Affine
This enables a useful extra set of functionality for our `Affine` layer, such as [collecting its parameters](../training/optimisers.md) or [moving it to the GPU](../gpu.md).
For some more helpful tricks, including parameter freezing, please checkout the [advanced usage guide](advacned.md).
## Utility functions
Flux provides some utility functions to help you generate models in an automated fashion.

View File

@ -41,6 +41,8 @@ The model to be trained must have a set of tracked parameters that are used to c
Such an object contains a reference to the model's parameters, not a copy, such that after their training, the model behaves according to their updated values.
Handling all the parameters on a layer by layer basis is explained in the [Layer Helpers](../models/basics.md) section. Also, for freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).
## Datasets
The `data` argument provides a collection of data to train with (usually a set of inputs `x` and target outputs `y`). For example, here's a dummy data set with only one data point:

View File

@ -40,6 +40,9 @@ mutable struct Dropout{F,D}
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
Dropout(p, dims) = Dropout(p, dims, nothing)
function Dropout(p; dims = :)
@assert 0 p 1
Dropout{typeof(p),typeof(dims)}(p, dims, nothing)
@ -157,6 +160,9 @@ mutable struct BatchNorm{F,V,W,N}
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
BatchNorm(λ, β, γ, μ, σ², ϵ, momentum) = BatchNorm(λ, β, γ, μ, σ², ϵ, momentum, nothing)
BatchNorm(chs::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
BatchNorm(λ, initβ(chs), initγ(chs),
@ -251,6 +257,9 @@ mutable struct InstanceNorm{F,V,W,N}
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
InstanceNorm(λ, β, γ, μ, σ², ϵ, momentum) = InstanceNorm(λ, β, γ, μ, σ², ϵ, momentum, nothing)
InstanceNorm(chs::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
InstanceNorm(λ, initβ(chs), initγ(chs),
@ -342,6 +351,9 @@ mutable struct GroupNorm{F,V,W,N,T}
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
GroupNorm(G, λ, β, γ, μ, σ², ϵ, momentum) = GroupNorm(G, λ, β, γ, μ, σ², ϵ, momentum, nothing)
GroupNorm(chs::Integer, G::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
GroupNorm(G, λ, initβ(chs), initγ(chs),