docs mostly fixed
This commit is contained in:
parent
ddf06af0b9
commit
de2049450b
|
@ -1,5 +1,5 @@
|
|||
# Community
|
||||
|
||||
All Flux users are welcome to join our community on the [Julia forum](https://discourse.julialang.org/), the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning), or Flux's [Gitter](https://gitter.im/FluxML/Lobby). If you have questions or issues we'll try to help you out.
|
||||
All Flux users are welcome to join our community on the [Julia forum](https://discourse.julialang.org/), or the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning). If you have questions or issues we'll try to help you out.
|
||||
|
||||
If you're interested in hacking on Flux, the [source code](https://github.com/FluxML/Flux.jl) is open and easy to understand -- it's all just the same Julia code you work with normally. You might be interested in our [intro issues](https://github.com/FluxML/Flux.jl/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22) to get started.
|
||||
|
|
|
@ -1,14 +1,6 @@
|
|||
# GPU Support
|
||||
|
||||
## Installation
|
||||
|
||||
To get GPU support for NVIDIA graphics cards, you need to install `CuArrays.jl`
|
||||
|
||||
**Steps needed**
|
||||
|
||||
1. Install [NVIDIA toolkit](https://developer.nvidia.com/cuda-downloads)
|
||||
2. Install [NVIDIA cuDNN library](https://developer.nvidia.com/cudnn)
|
||||
3. In Julia's terminal run `]add CuArrays`
|
||||
NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the [CuArrays](https://github.com/JuliaGPU/CuArrays.jl) readme.
|
||||
|
||||
## GPU Usage
|
||||
|
||||
|
|
|
@ -59,7 +59,6 @@ swish
|
|||
These layers don't affect the structure of the network but may improve training times or reduce overfitting.
|
||||
|
||||
```@docs
|
||||
Flux.testmode!
|
||||
BatchNorm
|
||||
Dropout
|
||||
AlphaDropout
|
||||
|
|
|
@ -101,26 +101,4 @@ m = Chain(LSTM(10, 15), Dense(15, 5))
|
|||
m.(seq)
|
||||
```
|
||||
|
||||
## Truncating Gradients
|
||||
|
||||
By default, calculating the gradients in a recurrent layer involves its entire history. For example, if we call the model on 100 inputs, we'll have to calculate the gradient for those 100 calls. If we then calculate another 10 inputs we have to calculate 110 gradients – this accumulates and quickly becomes expensive.
|
||||
|
||||
To avoid this we can *truncate* the gradient calculation, forgetting the history.
|
||||
|
||||
```julia
|
||||
truncate!(m)
|
||||
```
|
||||
|
||||
Calling `truncate!` wipes the slate clean, so we can call the model with more inputs without building up an expensive gradient computation.
|
||||
|
||||
`truncate!` makes sense when you are working with multiple chunks of a large sequence, but we may also want to work with a set of independent sequences. In this case the hidden state should be completely reset to its original value, throwing away any accumulated information. `reset!` does this for you.
|
||||
|
||||
In general, when training with recurrent layers in your model, you'll want to call `reset!` or `truncate!` for each loss calculation:
|
||||
|
||||
```julia
|
||||
function loss(x,y)
|
||||
l = Flux.mse(m(x), y)
|
||||
Flux.reset!(m)
|
||||
return l
|
||||
end
|
||||
```
|
||||
Finally, we can reset the hidden state of the cell back to its initial value using `reset!(m)`.
|
||||
|
|
|
@ -15,6 +15,8 @@ loss(x, y) = crossentropy(softmax(m(x)), y)
|
|||
We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b`.
|
||||
|
||||
```julia
|
||||
using LinearAlgebra
|
||||
|
||||
penalty() = norm(m.W) + norm(m.b)
|
||||
loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()
|
||||
```
|
||||
|
@ -48,15 +50,17 @@ loss(rand(28^2), rand(10))
|
|||
One can also easily add per-layer regularisation via the `activations` function:
|
||||
|
||||
```julia
|
||||
julia> using Flux: activations
|
||||
|
||||
julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
|
||||
Chain(Dense(10, 5, NNlib.σ), Dense(5, 2), NNlib.softmax)
|
||||
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
|
||||
|
||||
julia> activations(c, rand(10))
|
||||
3-element Array{Any,1}:
|
||||
param([0.71068, 0.831145, 0.751219, 0.227116, 0.553074])
|
||||
param([0.0330606, -0.456104])
|
||||
param([0.61991, 0.38009])
|
||||
Float32[0.84682214, 0.6704139, 0.42177814, 0.257832, 0.36255655]
|
||||
Float32[0.1501253, 0.073269576]
|
||||
Float32[0.5192045, 0.48079553]
|
||||
|
||||
julia> sum(norm, ans)
|
||||
2.639678767773633 (tracked)
|
||||
2.1166067f0
|
||||
```
|
||||
|
|
|
@ -204,7 +204,6 @@ A 'ResNet'-type skip-connection with identity shortcut would simply be
|
|||
SkipConnection(layer, (a,b) -> a + b)
|
||||
```
|
||||
"""
|
||||
|
||||
struct SkipConnection
|
||||
layers
|
||||
connection #user can pass arbitrary connections here, such as (a,b) -> a + b
|
||||
|
|
|
@ -22,8 +22,6 @@ A Dropout layer. For each input, either sets that input to `0` (with probability
|
|||
`p`) or scales it by `1/(1-p)`. The `dims` argument is to specified the unbroadcasted
|
||||
dimensions, i.e. `dims=1` does dropout along columns and `dims=2` along rows. This is
|
||||
used as a regularisation, i.e. it reduces overfitting during training. see also [`dropout`](@ref).
|
||||
|
||||
Does nothing to the input once in [`testmode!`](@ref).
|
||||
"""
|
||||
mutable struct Dropout{F,D}
|
||||
p::F
|
||||
|
@ -297,7 +295,6 @@ m = Chain(Conv((3,3), 1=>32, leakyrelu;pad = 1),
|
|||
|
||||
Link : https://arxiv.org/pdf/1803.08494.pdf
|
||||
"""
|
||||
|
||||
mutable struct GroupNorm{F,V,W,N,T}
|
||||
G::T # number of groups
|
||||
λ::F # activation function
|
||||
|
|
Loading…
Reference in New Issue