Tell people sto stop using global models

2020-03-14 18:09:06 +00:00
1 changed files with 53 additions and 0 deletions
--- a/docs/src/performance.md
+++ b/docs/src/performance.md
@ -4,6 +4,59 @@ All the usual [Julia performance tips apply](https://docs.julialang.org/en/v1/ma
 As always [profiling your code](https://docs.julialang.org/en/v1/manual/profile/#Profiling-1) is generally a useful way of finding bottlenecks.
 Below follow some Flux specific tips/reminders.

+## Don't write loss functions that use a non-constant globally declared model.
+This is a special case of one of the most important [Julia Performance Tips](https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-global-variables-1).
+Non-constant global variables are slow.
+We repeat it here as it is a common mistake.
+
+This advice is appliable also to writing callbacks, and more generally to all Julia code.
+
+### Don't write:
+```julia
+data = ...
+m = Chain(Dense(784, 32, σ), Dense(32, 10), softmax)
+loss(x, y) = Flux.mse(m(x), y)
+
+Flux.train!(loss, Flux.params(m), data, Descent(0.1))
+```
+In this bad code, the model `m` is a non-constant global.
+It is being used inside the function `loss`, which is one of the most performance critical parts of this code.
+It will be slow, as the compiler can't rely on `m` always being the same type -- it is a mutable global, it could change at any time.
+
+### Correct alternatives:
+#### Mark the model `const`
+```julia
+data = ...
+const m = Chain(Dense(784, 32, σ), Dense(32, 10), softmax)
+loss(x, y) = Flux.mse(m(x), y)
+
+Flux.train!(loss, Flux.params(m), data, Descent(0.1))
+```
+Similarly anything else that is a non-constant global that is used in functions should also be made constant
+
+#### Put everything in a main function:
+For more flexibility, you could even make this take `m` as a argument -- it doesn't matter of `m` was originally declared as a non-const global once it has been passed in as a argument because it then becomes a local variable.
+```julia
+function main(data)
+    m = Chain(Dense(784, 32, σ), Dense(32, 10), softmax)
+    loss(x, y) = Flux.mse(m(x), y)
+
+    Flux.train!(loss, Flux.params(m), data, Descent(0.1))
+end
+```
+
+#### Make the loss function actually close over `m`.
+Closures can be very useful.
+
+```julia
+data = ...
+m = Chain(Dense(784, 32, σ), Dense(32, 10), softmax)
+get_loss_function(mdl) = (x, y) -> Flux.mse(mdl(x), y)
+
+Flux.train!(get_loss_function(m), Flux.params(m), data, Descent(0.1))
+```
+This example is particularly applicable to callbacks.
+
 ## Don't use more precision than you need

 Flux works great with all kinds of number types.