</script><linkhref="https://cdnjs.cloudflare.com/ajax/libs/normalize/4.2.0/normalize.min.css"rel="stylesheet"type="text/css"/><linkhref="https://fonts.googleapis.com/css?family=Lato|Roboto+Mono"rel="stylesheet"type="text/css"/><linkhref="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css"rel="stylesheet"type="text/css"/><linkhref="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/default.min.css"rel="stylesheet"type="text/css"/><script>documenterBaseURL=".."</script><scriptsrc="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.2.0/require.min.js"data-main="../assets/documenter.js"></script><scriptsrc="../siteinfo.js"></script><scriptsrc="../../versions.js"></script><linkhref="../assets/documenter.css"rel="stylesheet"type="text/css"/><linkhref="../assets/flux.css"rel="stylesheet"type="text/css"/></head><body><navclass="toc"><h1>Flux</h1><selectid="version-selector"onChange="window.location.href=this.value"style="visibility: hidden"></select><formclass="search"id="search-form"action="../search/"><inputid="search-query"name="q"type="text"placeholder="Search docs"/></form><ul><li><aclass="toctext"href="../">Home</a></li><li><spanclass="toctext">Building Models</span><ul><li><aclass="toctext"href="../models/basics/">Basics</a></li><li><aclass="toctext"href="../models/recurrence/">Recurrence</a></li><li><aclass="toctext"href="../models/regularisation/">Regularisation</a></li><li><aclass="toctext"href="../models/layers/">Model Reference</a></li></ul></li><li><spanclass="toctext">Training Models</span><ul><li><aclass="toctext"href="../training/optimisers/">Optimisers</a></li><li><aclass="toctext"href="../training/training/">Training</a></li></ul></li><li><aclass="toctext"href="../data/onehot/">One-Hot Encoding</a></li><li><aclass="toctext"href="../gpu/">GPU Support</a></li><li><aclass="toctext"href="../saving/">Saving & Loading</a></li><liclass="current"><aclass="toctext"href>Performance Tips</a><ulclass="internal"><li><aclass="toctext"href="#Don't-use-more-precision-than-you-need.-1">Don't use more precision than you need.</a></li><li><aclass="toctext"href="#Make-sure-your-custom-activation-functions-preserve-the-type-of-their-inputs-1">Make sure your custom activation functions preserve the type of their inputs</a></li><li><aclass="toctext"href="#Evaluate-batches-as-Matrices-of-features,-rather-than-sequences-of-Vector-features-1">Evaluate batches as Matrices of features, rather than sequences of Vector features</a></li></ul></li><li><spanclass="toctext">Internals</span><ul><li><aclass="toctext"href="../internals/tracker/">Backpropagation</a></li></ul></li><li><aclass="toctext"href="../community/">Community</a></li></ul></nav><articleid="docs"><header><nav><ul><li><ahref>Performance Tips</a></li></ul><aclass="edit-page"href="https://github.com/FluxML/Flux.jl/blob/master/docs/src/performance.md"><spanclass="fa"></span> Edit on GitHub</a></nav><hr/><divid="topbar"><span>Performance Tips</span><aclass="fa fa-bars"href="#"></a></div></header><h1><aclass="nav-anchor"id="Performance-Tips-1"href="#Performance-Tips-1">Performance Tips</a></h1><p>All the usual <ahref="https://docs.julialang.org/en/v1/manual/performance-tips/">Julia performance tips apply</a>. As always <ahref="https://docs.julialang.org/en/v1/manual/profile/#Profiling-1">profiling your code</a> is generally a useful way of finding bottlenecks. Below follow some Flux specific tips/reminders.</p><h2><aclass="nav-anchor"id="Don't-use-more-precision-than-you-need.-1"href="#Don't-use-more-precision-than-you-need.-1">Don't use more precision than you need.</a></h2><p>Flux works great with all kinds of number types. But often you do not need to be working with say <code>Float64</code> (let alone <code>BigFloat</code>). Switching to <code>Float32</code> can give you a significant speed up, not because the operations are faster, but because the memory usage is halved. Which means allocations occur much faster. And yo
end</code></pre><p>It is much faster to concatenate them into a matrix, as this will hit BLAS matrix-matrix multiplication, which is much faster than the equivalent sequence of matrix-vector multiplications. Even though this means allocating new memory to store them contiguously.</p><pre><codeclass="language-julia">x_batch = reduce(hcat, xs)
y_batch = reduce(hcat, ys)
...
function loss_total(x_batch::Matrix, y_batch::Matrix)
end</code></pre><p>When doing this kind of concatenation use <code>reduce(hcat, xs)</code> rather than <code>hcat(xs...)</code>. This will avoid the splatting penalty, and will hit the optimised <code>reduce</code> method.</p><footer><hr/><aclass="previous"href="../saving/"><spanclass="direction">Previous</span><spanclass="title">Saving & Loading</span></a><aclass="next"href="../internals/tracker/"><spanclass="direction">Next</span><spanclass="title">Backpropagation</span></a></footer></article></body></html>