Create performance tips docs section (#615)
* Create performance_tips.jl * Rename performance_tips.jl to performance_tips.md * add perf tips * Update docs/src/performance_tips.md Co-Authored-By: oxinabox <oxinabox@ucc.asn.au> * Update docs/src/performance_tips.md Co-Authored-By: oxinabox <oxinabox@ucc.asn.au> * Update make.jl * Update and rename performance_tips.md to performance.md * spelling * Update docs/src/performance.md Co-Authored-By: oxinabox <oxinabox@ucc.asn.au> * Update docs/src/performance.md Co-Authored-By: oxinabox <oxinabox@ucc.asn.au> * Update performance.md * Update performance.md * Update docs/src/performance.md Co-Authored-By: oxinabox <oxinabox@ucc.asn.au> * Update docs/src/performance.md Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
This commit is contained in:
parent
78876a14b3
commit
ebf50f4e1c
@ -19,6 +19,7 @@ makedocs(modules=[Flux, NNlib],
|
|||||||
"One-Hot Encoding" => "data/onehot.md",
|
"One-Hot Encoding" => "data/onehot.md",
|
||||||
"GPU Support" => "gpu.md",
|
"GPU Support" => "gpu.md",
|
||||||
"Saving & Loading" => "saving.md",
|
"Saving & Loading" => "saving.md",
|
||||||
|
"Performance Tips" => "performance.md",
|
||||||
"Internals" =>
|
"Internals" =>
|
||||||
["Backpropagation" => "internals/tracker.md"],
|
["Backpropagation" => "internals/tracker.md"],
|
||||||
"Community" => "community.md"])
|
"Community" => "community.md"])
|
||||||
|
76
docs/src/performance.md
Normal file
76
docs/src/performance.md
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
# Performance Tips
|
||||||
|
|
||||||
|
All the usual [Julia performance tips apply](https://docs.julialang.org/en/v1/manual/performance-tips/).
|
||||||
|
As always [profiling your code](https://docs.julialang.org/en/v1/manual/profile/#Profiling-1) is generally a useful way of finding bottlenecks.
|
||||||
|
Below follow some Flux specific tips/reminders.
|
||||||
|
|
||||||
|
## Don't use more precision than you need.
|
||||||
|
|
||||||
|
Flux works great with all kinds of number types.
|
||||||
|
But often you do not need to be working with say `Float64` (let alone `BigFloat`).
|
||||||
|
Switching to `Float32` can give you a significant speed up,
|
||||||
|
not because the operations are faster, but because the memory usage is halved.
|
||||||
|
Which means allocations occur much faster.
|
||||||
|
And you use less memory.
|
||||||
|
|
||||||
|
|
||||||
|
## Make sure your custom activation functions preserve the type of their inputs
|
||||||
|
Not only should your activation functions be [type-stable](https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1),
|
||||||
|
they should also preserve the type of their inputs.
|
||||||
|
|
||||||
|
A very artificial example using an activatioon function like
|
||||||
|
|
||||||
|
```
|
||||||
|
my_tanh(x) = Float64(tanh(x))
|
||||||
|
```
|
||||||
|
|
||||||
|
will result in performance on `Float32` input orders of magnitude slower than the normal `tanh` would,
|
||||||
|
because it results in having to use slow mixed type multiplication in the dense layers.
|
||||||
|
|
||||||
|
Which means if you change your data say from `Float64` to `Float32` (which should give a speedup: see above),
|
||||||
|
you will see a large slow-down
|
||||||
|
|
||||||
|
This can occur sneakily, because you can cause type-promotion by interacting with a numeric literals.
|
||||||
|
E.g. the following will have run into the same problem as above:
|
||||||
|
|
||||||
|
```
|
||||||
|
leaky_tanh(x) = 0.01x + tanh(x)
|
||||||
|
```
|
||||||
|
|
||||||
|
While one could change your activation function (e.g. to use `0.01f0x`) to avoid this when ever your inputs change,
|
||||||
|
the idiomatic (and safe way) is to use `oftype`.
|
||||||
|
|
||||||
|
```
|
||||||
|
leaky_tanh(x) = oftype(x/1, 0.01) + tanh(x)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Evaluate batches as Matrices of features, rather than sequences of Vector features
|
||||||
|
|
||||||
|
While it can sometimes be tempting to process your observations (feature vectors) one at a time
|
||||||
|
e.g.
|
||||||
|
```julia
|
||||||
|
function loss_total(xs::AbstractVector{<:Vector}, ys::AbstractVector{<:Vector})
|
||||||
|
sum(zip(xs, ys)) do (x, y_target)
|
||||||
|
y_pred = model(x) # evaluate the model
|
||||||
|
return loss(y_pred, y_target)
|
||||||
|
end
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
It is much faster to concatenate them into a matrix,
|
||||||
|
as this will hit BLAS matrix-matrix multiplication, which is much faster than the equivalent sequence of matrix-vector multiplications.
|
||||||
|
Even though this means allocating new memory to store them contiguously.
|
||||||
|
|
||||||
|
```julia
|
||||||
|
x_batch = reduce(hcat, xs)
|
||||||
|
y_batch = reduce(hcat, ys)
|
||||||
|
...
|
||||||
|
function loss_total(x_batch::Matrix, y_batch::Matrix)
|
||||||
|
y_preds = model(x_batch)
|
||||||
|
sum(loss.(y_preds, y_batch))
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
When doing this kind of concatenation use `reduce(hcat, xs)` rather than `hcat(xs...)`.
|
||||||
|
This will avoid the splatting penality, and will hit the optimised `reduce` method.
|
Loading…
Reference in New Issue
Block a user