diff --git a/previews/PR1110/community/index.html b/previews/PR1110/community/index.html index 5ab4fc7d..8350f4cd 100644 --- a/previews/PR1110/community/index.html +++ b/previews/PR1110/community/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Community

All Flux users are welcome to join our community on the Julia forum, or the slack (channel #machine-learning). If you have questions or issues we'll try to help you out.

If you're interested in hacking on Flux, the source code is open and easy to understand – it's all just the same Julia code you work with normally. You might be interested in our intro issues to get started.

+

Community

All Flux users are welcome to join our community on the Julia forum, or the slack (channel #machine-learning). If you have questions or issues we'll try to help you out.

If you're interested in hacking on Flux, the source code is open and easy to understand – it's all just the same Julia code you work with normally. You might be interested in our intro issues to get started.

diff --git a/previews/PR1110/data/dataloader/index.html b/previews/PR1110/data/dataloader/index.html index 144dc2b5..ce27eee4 100644 --- a/previews/PR1110/data/dataloader/index.html +++ b/previews/PR1110/data/dataloader/index.html @@ -29,4 +29,4 @@ end # train for 10 epochs using IterTools: ncycle -Flux.train!(loss, ps, ncycle(train_loader, 10), opt)source +Flux.train!(loss, ps, ncycle(train_loader, 10), opt)source diff --git a/previews/PR1110/data/onehot/index.html b/previews/PR1110/data/onehot/index.html index 8383a107..2fc85adf 100644 --- a/previews/PR1110/data/onehot/index.html +++ b/previews/PR1110/data/onehot/index.html @@ -37,4 +37,4 @@ julia> onecold(ans, [:a, :b, :c]) 3-element Array{Symbol,1}: :b :a - :b

Note that these operations returned OneHotVector and OneHotMatrix rather than Arrays. OneHotVectors behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.

+ :b

Note that these operations returned OneHotVector and OneHotMatrix rather than Arrays. OneHotVectors behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.

diff --git a/previews/PR1110/ecosystem/index.html b/previews/PR1110/ecosystem/index.html index be780635..c4f1d6d7 100644 --- a/previews/PR1110/ecosystem/index.html +++ b/previews/PR1110/ecosystem/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

The Julia Ecosystem

One of the main strengths of Julia lies in an ecosystem of packages globally providing a rich and consistent user experience.

This is a non-exhaustive list of Julia packages, nicely complementing Flux in typical machine learning and deep learning workflows:

This tight integration among Julia pakages is shown in some of the examples in the model-zoo repository.

+

The Julia Ecosystem

One of the main strengths of Julia lies in an ecosystem of packages globally providing a rich and consistent user experience.

This is a non-exhaustive list of Julia packages, nicely complementing Flux in typical machine learning and deep learning workflows:

This tight integration among Julia pakages is shown in some of the examples in the model-zoo repository.

diff --git a/previews/PR1110/gpu/index.html b/previews/PR1110/gpu/index.html index 5c4daf9e..14f1d29f 100644 --- a/previews/PR1110/gpu/index.html +++ b/previews/PR1110/gpu/index.html @@ -47,4 +47,4 @@ julia> x |> cpu 10-element Array{Float32,1}: 0.235164 ⋮ - 0.192538 + 0.192538 diff --git a/previews/PR1110/index.html b/previews/PR1110/index.html index 2c47d70f..7b7056db 100644 --- a/previews/PR1110/index.html +++ b/previews/PR1110/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Flux: The Julia Machine Learning Library

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

  • Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
  • You could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
  • Play nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

Installation

Download Julia 1.0 or later, if you haven't already. You can add Flux from using Julia's package manager, by typing ] add Flux in the Julia prompt.

If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details.

Learning Flux

There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand Flux's source code, which is intended to be concise, legible and a good reference for more advanced concepts.

+

Flux: The Julia Machine Learning Library

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

  • Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
  • You could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
  • Play nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

Installation

Download Julia 1.0 or later, if you haven't already. You can add Flux from using Julia's package manager, by typing ] add Flux in the Julia prompt.

If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details.

Learning Flux

There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand Flux's source code, which is intended to be concise, legible and a good reference for more advanced concepts.

diff --git a/previews/PR1110/models/advanced/index.html b/previews/PR1110/models/advanced/index.html index de0210f9..53252105 100644 --- a/previews/PR1110/models/advanced/index.html +++ b/previews/PR1110/models/advanced/index.html @@ -24,4 +24,4 @@ Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.66245 ) ps = Flux.params(m[3:end])

The Zygote.Params object ps now holds a reference to only the parameters of the layers passed to it.

During training, the gradients will only be computed for (and applied to) the last Dense layer, therefore only that would have its parameters changed.

Flux.params also takes multiple inputs to make it easy to collect parameters from heterogenous models with a single call. A simple demonstration would be if we wanted to omit optimising the second Dense layer in the previous example. It would look something like this:

Flux.params(m[1], m[3:end])

Sometimes, a more fine-tuned control is needed. We can freeze a specific parameter of a specific layer which already entered a Params object ps, by simply deleting it from ps:

ps = params(m)
-delete!(ps, m[2].b) 
+delete!(ps, m[2].b) diff --git a/previews/PR1110/models/basics/index.html b/previews/PR1110/models/basics/index.html index cb159009..7263335a 100644 --- a/previews/PR1110/models/basics/index.html +++ b/previews/PR1110/models/basics/index.html @@ -110,4 +110,4 @@ model2(rand(10)) # => 2-element vector

This quickly starts to m(rand(10))

Likewise, Chain will happily work with any Julia function.

m = Chain(x -> x^2, x -> x+1)
 
-m(5) # => 26

Layer helpers

Flux provides a set of helpers for custom layers, which you can enable by calling

Flux.@functor Affine

This enables a useful extra set of functionality for our Affine layer, such as collecting its parameters or moving it to the GPU.

For some more helpful tricks, including parameter freezing, please checkout the advanced usage guide.

Utility functions

Flux provides some utility functions to help you generate models in an automated fashion.

outdims enables you to calculate the spatial output dimensions of layers like Conv when applied to input images of a given size. Currently limited to the following layers:

Missing docstring.

Missing docstring for outdims. Check Documenter's build log for details.

+m(5) # => 26

Layer helpers

Flux provides a set of helpers for custom layers, which you can enable by calling

Flux.@functor Affine

This enables a useful extra set of functionality for our Affine layer, such as collecting its parameters or moving it to the GPU.

For some more helpful tricks, including parameter freezing, please checkout the advanced usage guide.

Utility functions

Flux provides some utility functions to help you generate models in an automated fashion.

outdims enables you to calculate the spatial output dimensions of layers like Conv when applied to input images of a given size. Currently limited to the following layers:

Missing docstring.

Missing docstring for outdims. Check Documenter's build log for details.

diff --git a/previews/PR1110/models/layers/index.html b/previews/PR1110/models/layers/index.html index cb4d98b1..adc5ce1a 100644 --- a/previews/PR1110/models/layers/index.html +++ b/previews/PR1110/models/layers/index.html @@ -50,4 +50,4 @@ size(sm(x)) == (5, 5, 11, 10)

$chs$ is the number of channels, the channel dimension of your input. For an array of N dimensions, the (N-1)th index is the channel dimension.

$G$ is the number of groups along which the statistics would be computed. The number of channels must be an integer multiple of the number of groups.

Use testmode! during inference.

Example:

m = Chain(Conv((3,3), 1=>32, leakyrelu;pad = 1),
           GroupNorm(32,16)) # 32 channels, 16 groups (G = 16), thus 2 channels per group used

Link : https://arxiv.org/pdf/1803.08494.pdf

source

Testmode

Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides testmode!. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.

Flux.testmode!Function
testmode!(m, mode = true)

Set a layer or model's test mode (see below). Using :auto mode will treat any gradient computation as training.

Note: if you manually set a model into test mode, you need to manually place it back into train mode during training phase.

Possible values include:

  • false for training
  • true for testing
  • :auto or nothing for Flux to detect the mode automatically
source
Flux.trainmode!Function
trainmode!(m, mode = true)

Set a layer of model's train mode (see below). Symmetric to testmode! (i.e. `trainmode!(m, mode) == testmode!(m, !mode)).

Note: if you manually set a model into train mode, you need to manually place it into test mode during testing phase.

Possible values include:

  • true for training
  • false for testing
  • :auto or nothing for Flux to detect the mode automatically
source

Cost Functions

Flux.maeFunction
mae(ŷ, y)

Return the mean of absolute error sum(abs.(ŷ .- y)) / length(y)

source
Flux.mseFunction
mse(ŷ, y)

Return the mean squared error sum((ŷ .- y).^2) / length(y).

source
Flux.msleFunction
msle(ŷ, y; ϵ=eps(eltype(ŷ)))

Returns the mean of the squared logarithmic errors sum((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2) / length(y). The ϵ term provides numerical stability.

This error penalizes an under-predicted estimate greater than an over-predicted estimate.

source
Flux.huber_lossFunction
huber_loss(ŷ, y; δ=1.0)

Computes the mean of the Huber loss given the prediction and true values y. By default, δ is set to 1.0.

                | 0.5*|ŷ - y|,   for |ŷ - y| <= δ
   Hubber loss = |
-                |  δ*(|ŷ - y| - 0.5*δ),  otherwise

Huber Loss.

source
Flux.crossentropyFunction
crossentropy(ŷ, y; weight=1)

Return the crossentropy computed as -sum(y .* log.(ŷ) .* weight) / size(y, 2).

See also logitcrossentropy, binarycrossentropy.

source
Flux.logitcrossentropyFunction
logitcrossentropy(ŷ, y; weight=1)

Return the crossentropy computed after a softmax operation:

-sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2)

See also crossentropy, binarycrossentropy.

source
Flux.binarycrossentropyFunction
binarycrossentropy(ŷ, y; ϵ=eps(ŷ))

Return -y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ). The ϵ term provides numerical stability.

Typically, the prediction is given by the output of a sigmoid activation.

source
Flux.logitbinarycrossentropyFunction
logitbinarycrossentropy(ŷ, y)

logitbinarycrossentropy(ŷ, y) is mathematically equivalent to binarycrossentropy(σ(ŷ), y) but it is more numerically stable.

See also binarycrossentropy, sigmoid, logsigmoid.

source
Flux.kldivergenceFunction
kldivergence(ŷ, y)

KLDivergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

KL Divergence.

source
Flux.poissonFunction
poisson(ŷ, y)

Poisson loss function is a measure of how the predicted distribution diverges from the expected distribution. Returns sum(ŷ .- y .* log.(ŷ)) / size(y, 2)

Poisson Loss.

source
Flux.hingeFunction
hinge(ŷ, y)

Measures the loss given the prediction and true labels y (containing 1 or -1). Returns sum((max.(0, 1 .- ŷ .* y))) / size(y, 2)

Hinge Loss See also squared_hinge.

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y)

Computes squared hinge loss given the prediction and true labels y (conatining 1 or -1). Returns sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)

See also hinge.

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Loss function used in Image Segmentation. Calculates loss based on dice coefficient. Similar to F1_score. Returns 1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)

V-Net: Fully Convolutional Neural Networks forVolumetric Medical Image Segmentation

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Used with imbalanced data to give more weightage to False negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Returns 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β*(1 .- y) .* ŷ + (1 - β)*y .* (1 .- ŷ)) + 1)

Tversky loss function for image segmentation using 3D fully convolutional deep networks

source
+ | δ*(|ŷ - y| - 0.5*δ), otherwise

Huber Loss.

source
Flux.crossentropyFunction
crossentropy(ŷ, y; weight=1)

Return the crossentropy computed as -sum(y .* log.(ŷ) .* weight) / size(y, 2).

See also logitcrossentropy, binarycrossentropy.

source
Flux.logitcrossentropyFunction
logitcrossentropy(ŷ, y; weight=1)

Return the crossentropy computed after a softmax operation:

-sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2)

See also crossentropy, binarycrossentropy.

source
Flux.binarycrossentropyFunction
binarycrossentropy(ŷ, y; ϵ=eps(ŷ))

Return -y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ). The ϵ term provides numerical stability.

Typically, the prediction is given by the output of a sigmoid activation.

source
Flux.logitbinarycrossentropyFunction
logitbinarycrossentropy(ŷ, y)

logitbinarycrossentropy(ŷ, y) is mathematically equivalent to binarycrossentropy(σ(ŷ), y) but it is more numerically stable.

See also binarycrossentropy, sigmoid, logsigmoid.

source
Flux.kldivergenceFunction
kldivergence(ŷ, y)

KLDivergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

KL Divergence.

source
Flux.poissonFunction
poisson(ŷ, y)

Poisson loss function is a measure of how the predicted distribution diverges from the expected distribution. Returns sum(ŷ .- y .* log.(ŷ)) / size(y, 2)

Poisson Loss.

source
Flux.hingeFunction
hinge(ŷ, y)

Measures the loss given the prediction and true labels y (containing 1 or -1). Returns sum((max.(0, 1 .- ŷ .* y))) / size(y, 2)

Hinge Loss See also squared_hinge.

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y)

Computes squared hinge loss given the prediction and true labels y (conatining 1 or -1). Returns sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)

See also hinge.

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Loss function used in Image Segmentation. Calculates loss based on dice coefficient. Similar to F1_score. Returns 1 - 2*sum(|ŷ .* y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)

V-Net: Fully Convolutional Neural Networks forVolumetric Medical Image Segmentation

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Used with imbalanced data to give more weightage to False negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Returns 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β*(1 .- y) .* ŷ + (1 - β)*y .* (1 .- ŷ)) + 1)

Tversky loss function for image segmentation using 3D fully convolutional deep networks

source
diff --git a/previews/PR1110/models/nnlib/index.html b/previews/PR1110/models/nnlib/index.html index 3775d554..0b58a134 100644 --- a/previews/PR1110/models/nnlib/index.html +++ b/previews/PR1110/models/nnlib/index.html @@ -28,4 +28,4 @@ a = randomly sampled from uniform distribution U(l, u)

Randomized batched_adjoint(A)

Equivalent to applying transpose or adjoint to each matrix A[:,:,k].

These exist to control how batched_mul behaves, as it operated on such matrix slices of an array with ndims(A)==3.

BatchedTranspose{T, N, S} <: AbstractBatchedMatrix{T, N}
 BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

NNlib.batched_transposeFunction
batched_transpose(A::AbstractArray{T,3})
 batched_adjoint(A)

Equivalent to applying transpose or adjoint to each matrix A[:,:,k].

These exist to control how batched_mul behaves, as it operated on such matrix slices of an array with ndims(A)==3.

BatchedTranspose{T, N, S} <: AbstractBatchedMatrix{T, N}
-BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

+BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

diff --git a/previews/PR1110/models/recurrence/index.html b/previews/PR1110/models/recurrence/index.html index 4c2f2ded..fdaaa4f8 100644 --- a/previews/PR1110/models/recurrence/index.html +++ b/previews/PR1110/models/recurrence/index.html @@ -39,4 +39,4 @@ m = Flux.Recur(rnn, h) y = m(x)

The Recur wrapper stores the state between runs in the m.state field.

If you use the RNN(10, 5) constructor – as opposed to RNNCell – you'll see that it's simply a wrapped cell.

julia> RNN(10, 5)
 Recur(RNNCell(10, 5, tanh))

Sequences

Often we want to work with sequences of inputs, rather than individual xs.

seq = [rand(10) for i = 1:10]

With Recur, applying our model to each element of a sequence is trivial:

m.(seq) # returns a list of 5-element vectors

This works even when we've chain recurrent layers into a larger model.

m = Chain(LSTM(10, 15), Dense(15, 5))
-m.(seq)

Finally, we can reset the hidden state of the cell back to its initial value using reset!(m).

+m.(seq)

Finally, we can reset the hidden state of the cell back to its initial value using reset!(m).

diff --git a/previews/PR1110/models/regularisation/index.html b/previews/PR1110/models/regularisation/index.html index 3b49c5d3..d64c5aee 100644 --- a/previews/PR1110/models/regularisation/index.html +++ b/previews/PR1110/models/regularisation/index.html @@ -36,4 +36,4 @@ julia> activations(c, rand(10)) Float32[0.5192045, 0.48079553] julia> sum(norm, ans) -2.1166067f0 +2.1166067f0 diff --git a/previews/PR1110/performance/index.html b/previews/PR1110/performance/index.html index 84ffe14e..ee643a11 100644 --- a/previews/PR1110/performance/index.html +++ b/previews/PR1110/performance/index.html @@ -17,4 +17,4 @@ y_batch = reduce(hcat, ys) function loss_total(x_batch::Matrix, y_batch::Matrix) y_preds = model(x_batch) sum(loss.(y_preds, y_batch)) -end

When doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penalty, and will hit the optimised reduce method.

+end

When doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penalty, and will hit the optimised reduce method.

diff --git a/previews/PR1110/saving/index.html b/previews/PR1110/saving/index.html index 8eaca004..5dbbaaed 100644 --- a/previews/PR1110/saving/index.html +++ b/previews/PR1110/saving/index.html @@ -47,4 +47,4 @@ evalcb = throttle(30) do # Show loss @save "model-checkpoint.bson" model end

This will update the "model-checkpoint.bson" file every thirty seconds.

You can get more advanced by saving a series of models throughout training, for example

@save "model-$(now()).bson" model

will produce a series of models like "model-2018-03-06T02:57:10.41.bson". You could also store the current test set loss, so that it's easy to (for example) revert to an older copy of the model if it starts to overfit.

@save "model-$(now()).bson" model loss = testloss()

You can even store optimiser state alongside the model, to resume training exactly where you left off.

opt = ADAM()
-@save "model-$(now()).bson" model opt
+@save "model-$(now()).bson" model opt diff --git a/previews/PR1110/search/index.html b/previews/PR1110/search/index.html index 5ce84700..a37f2088 100644 --- a/previews/PR1110/search/index.html +++ b/previews/PR1110/search/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Loading search...

    +

    Loading search...

      diff --git a/previews/PR1110/training/optimisers/index.html b/previews/PR1110/training/optimisers/index.html index 43cb751a..07aeeafd 100644 --- a/previews/PR1110/training/optimisers/index.html +++ b/previews/PR1110/training/optimisers/index.html @@ -81,4 +81,4 @@ for t = 1:10^5 end loss(rand(10)) # around 0.9

      In this manner it is possible to compose optimisers for some added flexibility.

      Decays

      Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.

      Flux.Optimise.ExpDecayType
      ExpDecay(eta, decay, decay_step, clip)

      Discount the learning rate eta by a multiplicative factor decay every decay_step till a minimum of clip.

      Parameters

      • Learning Rate (eta): Defaults to 0.001.
      • decay: Factor by which the learning rate is discounted. Defaults to 0.1.
      • decay_step: Schedules decay operations by setting number of steps between two decay operations. Defaults to 1000.
      • clip: Minimum value of learning rate. Defaults to 1e-4.

      Example

      To apply exponential decay to an optimiser:

      Optimiser(ExpDecay(..), Opt(..))
      -opt = Optimiser(ExpDecay(), ADAM())
      source
      Flux.Optimise.InvDecayType
      InvDecay(γ)

      Applies inverse time decay to an optimiser, i.e., the effective step size at iteration n is eta / (1 + γ * n) where eta is the initial step size. The wrapped optimiser's step size is not modified.

      Parameters

      • gamma (γ): Defaults to 0.001

      Example

      Optimiser(InvDecay(..), Opt(..))
      source
      Flux.Optimise.WeightDecayType
      WeightDecay(wd)

      Decays the weight by wd

      Parameters

      • weight decay (wd): 0
      source
      +opt = Optimiser(ExpDecay(), ADAM())source
      Flux.Optimise.InvDecayType
      InvDecay(γ)

      Applies inverse time decay to an optimiser, i.e., the effective step size at iteration n is eta / (1 + γ * n) where eta is the initial step size. The wrapped optimiser's step size is not modified.

      Parameters

      • gamma (γ): Defaults to 0.001

      Example

      Optimiser(InvDecay(..), Opt(..))
      source
      Flux.Optimise.WeightDecayType
      WeightDecay(wd)

      Decays the weight by wd

      Parameters

      • weight decay (wd): 0
      source
      diff --git a/previews/PR1110/training/training/index.html b/previews/PR1110/training/training/index.html index ec4d058b..7c5973fb 100644 --- a/previews/PR1110/training/training/index.html +++ b/previews/PR1110/training/training/index.html @@ -51,4 +51,4 @@ end

      You could simplify this further, for example by hard-coding in the loss function.

      +end

      You could simplify this further, for example by hard-coding in the loss function.