diff --git a/previews/PR1152/community/index.html b/previews/PR1152/community/index.html index c8afbf68..4fe8a611 100644 --- a/previews/PR1152/community/index.html +++ b/previews/PR1152/community/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Community

All Flux users are welcome to join our community on the Julia forum, or the slack (channel #machine-learning). If you have questions or issues we'll try to help you out.

If you're interested in hacking on Flux, the source code is open and easy to understand – it's all just the same Julia code you work with normally. You might be interested in our intro issues to get started.

+

Community

All Flux users are welcome to join our community on the Julia forum, or the slack (channel #machine-learning). If you have questions or issues we'll try to help you out.

If you're interested in hacking on Flux, the source code is open and easy to understand – it's all just the same Julia code you work with normally. You might be interested in our intro issues to get started.

diff --git a/previews/PR1152/data/dataloader/index.html b/previews/PR1152/data/dataloader/index.html index 8e000186..759779f7 100644 --- a/previews/PR1152/data/dataloader/index.html +++ b/previews/PR1152/data/dataloader/index.html @@ -36,4 +36,4 @@ end # train for 10 epochs using IterTools: ncycle -Flux.train!(loss, ps, ncycle(train_loader, 10), opt)source +Flux.train!(loss, ps, ncycle(train_loader, 10), opt)source diff --git a/previews/PR1152/data/onehot/index.html b/previews/PR1152/data/onehot/index.html index c5eb5495..c22c62a5 100644 --- a/previews/PR1152/data/onehot/index.html +++ b/previews/PR1152/data/onehot/index.html @@ -55,4 +55,4 @@ julia> onecold(ans, [:a, :b, :c]) 3×3 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}: 0 1 0 1 0 1 - 0 0 0source + 0 0 0source diff --git a/previews/PR1152/datasets/index.html b/previews/PR1152/datasets/index.html index 429f1309..47af4917 100644 --- a/previews/PR1152/datasets/index.html +++ b/previews/PR1152/datasets/index.html @@ -26,4 +26,4 @@ julia> labels[1] images(:test)

Load the MNIST images.

Each image is a 28×28 array of Gray colour values (see Colors.jl).

Return the 60,000 training images by default; pass :test to retrieve the 10,000 test images.

source
Flux.Data.MNIST.labelsMethod
labels()
 labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.FashionMNIST.imagesMethod
images()
 images(:test)

Load the Fashion-MNIST images.

Each image is a 28×28 array of Gray colour values (see Colors.jl).

Return the 60,000 training images by default; pass :test to retrieve the 10,000 test images.

source
Flux.Data.FashionMNIST.labelsMethod
labels()
-labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.CMUDict.phonesMethod
phones()

Return a Vector containing the phones used in the CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.symbolsMethod
symbols()

Return a Vector containing the symbols used in the CMU Pronouncing Dictionary. A symbol is a phone with optional auxiliary symbols, indicating for example the amount of stress on the phone.

source
Flux.Data.CMUDict.rawdictMethod
rawdict()

Return the unfiltered CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.cmudictMethod
cmudict()

Return a filtered CMU Pronouncing Dictionary.

It is filtered so each word contains only ASCII characters and a combination of word characters (as determined by the regex engine using \w), '-' and '.'.

source
Flux.Data.Sentiment.trainMethod
train()

Return the train split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.testMethod
test()

Return the test split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.devMethod
dev()

Return the dev split of the Stanford Sentiment Treebank. The data is in treebank format.

source
+labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.CMUDict.phonesMethod
phones()

Return a Vector containing the phones used in the CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.symbolsMethod
symbols()

Return a Vector containing the symbols used in the CMU Pronouncing Dictionary. A symbol is a phone with optional auxiliary symbols, indicating for example the amount of stress on the phone.

source
Flux.Data.CMUDict.rawdictMethod
rawdict()

Return the unfiltered CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.cmudictMethod
cmudict()

Return a filtered CMU Pronouncing Dictionary.

It is filtered so each word contains only ASCII characters and a combination of word characters (as determined by the regex engine using \w), '-' and '.'.

source
Flux.Data.Sentiment.trainMethod
train()

Return the train split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.testMethod
test()

Return the test split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.devMethod
dev()

Return the dev split of the Stanford Sentiment Treebank. The data is in treebank format.

source
diff --git a/previews/PR1152/ecosystem/index.html b/previews/PR1152/ecosystem/index.html index 879d4aca..b2361914 100644 --- a/previews/PR1152/ecosystem/index.html +++ b/previews/PR1152/ecosystem/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

The Julia Ecosystem

One of the main strengths of Julia lies in an ecosystem of packages globally providing a rich and consistent user experience.

This is a non-exhaustive list of Julia packages, nicely complementing Flux in typical machine learning and deep learning workflows:

This tight integration among Julia pakages is shown in some of the examples in the model-zoo repository.

+

The Julia Ecosystem

One of the main strengths of Julia lies in an ecosystem of packages globally providing a rich and consistent user experience.

This is a non-exhaustive list of Julia packages, nicely complementing Flux in typical machine learning and deep learning workflows:

This tight integration among Julia pakages is shown in some of the examples in the model-zoo repository.

diff --git a/previews/PR1152/gpu/index.html b/previews/PR1152/gpu/index.html index b48269d5..fde61302 100644 --- a/previews/PR1152/gpu/index.html +++ b/previews/PR1152/gpu/index.html @@ -47,4 +47,4 @@ julia> x |> cpu 10-element Array{Float32,1}: 0.235164 ⋮ - 0.192538 + 0.192538 diff --git a/previews/PR1152/index.html b/previews/PR1152/index.html index 8b769a12..f9ad903d 100644 --- a/previews/PR1152/index.html +++ b/previews/PR1152/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Flux: The Julia Machine Learning Library

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

  • Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
  • You could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
  • Play nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

Installation

Download Julia 1.0 or later, if you haven't already. You can add Flux from using Julia's package manager, by typing ] add Flux in the Julia prompt.

If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details.

Learning Flux

There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand Flux's source code, which is intended to be concise, legible and a good reference for more advanced concepts.

+

Flux: The Julia Machine Learning Library

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

  • Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
  • You could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
  • Play nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

Installation

Download Julia 1.0 or later, if you haven't already. You can add Flux from using Julia's package manager, by typing ] add Flux in the Julia prompt.

If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details.

Learning Flux

There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand Flux's source code, which is intended to be concise, legible and a good reference for more advanced concepts.

diff --git a/previews/PR1152/models/advanced/index.html b/previews/PR1152/models/advanced/index.html index d5e1dd91..a351b8ae 100644 --- a/previews/PR1152/models/advanced/index.html +++ b/previews/PR1152/models/advanced/index.html @@ -24,4 +24,4 @@ Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.66245 ) ps = Flux.params(m[3:end])

The Zygote.Params object ps now holds a reference to only the parameters of the layers passed to it.

During training, the gradients will only be computed for (and applied to) the last Dense layer, therefore only that would have its parameters changed.

Flux.params also takes multiple inputs to make it easy to collect parameters from heterogenous models with a single call. A simple demonstration would be if we wanted to omit optimising the second Dense layer in the previous example. It would look something like this:

Flux.params(m[1], m[3:end])

Sometimes, a more fine-tuned control is needed. We can freeze a specific parameter of a specific layer which already entered a Params object ps, by simply deleting it from ps:

ps = params(m)
-delete!(ps, m[2].b) 
+delete!(ps, m[2].b) diff --git a/previews/PR1152/models/basics/index.html b/previews/PR1152/models/basics/index.html index 49b0cf64..302f369d 100644 --- a/previews/PR1152/models/basics/index.html +++ b/previews/PR1152/models/basics/index.html @@ -115,4 +115,4 @@ outdims(m, (10, 10)) == (6, 6)source
outdims(l::Conv, isize::Tuple)

Calculate the output dimensions given the input dimensions isize. Batch size and channel size are ignored as per NNlib.jl.

m = Conv((3, 3), 3 => 16)
 outdims(m, (10, 10)) == (8, 8)
-outdims(m, (10, 10, 1, 3)) == (8, 8)
source
+outdims(m, (10, 10, 1, 3)) == (8, 8)source diff --git a/previews/PR1152/models/layers/index.html b/previews/PR1152/models/layers/index.html index 93d34332..87e652e2 100644 --- a/previews/PR1152/models/layers/index.html +++ b/previews/PR1152/models/layers/index.html @@ -95,4 +95,4 @@ Huber loss = | 3-element Array{Float64,1}: 1.4243970973475661 0.35231664672364094 - 0.8616703662235443source
Flux.kldivergenceFunction
kldivergence(ŷ, y)

Return the Kullback-Leibler divergence between the given probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source
Flux.poissonFunction
poisson(ŷ, y)

Return how much the predicted distribution diverges from the expected Poisson distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).

More information..

source
Flux.hingeFunction
hinge(ŷ, y)

Return the hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2).

See also: squared_hinge

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y)

Return the squared hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2).

See also: hinge

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as: 1 - 2sum(|ŷ . y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)

source
+ 0.8616703662235443source
Flux.kldivergenceFunction
kldivergence(ŷ, y)

Return the Kullback-Leibler divergence between the given probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source
Flux.poissonFunction
poisson(ŷ, y)

Return how much the predicted distribution diverges from the expected Poisson distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).

More information..

source
Flux.hingeFunction
hinge(ŷ, y)

Return the hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2).

See also: squared_hinge

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y)

Return the squared hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2).

See also: hinge

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as: 1 - 2sum(|ŷ . y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)

source
diff --git a/previews/PR1152/models/nnlib/index.html b/previews/PR1152/models/nnlib/index.html index 2ede29e6..018b4734 100644 --- a/previews/PR1152/models/nnlib/index.html +++ b/previews/PR1152/models/nnlib/index.html @@ -28,4 +28,4 @@ a = randomly sampled from uniform distribution U(l, u)

Randomized batched_adjoint(A)

Equivalent to applying transpose or adjoint to each matrix A[:,:,k].

These exist to control how batched_mul behaves, as it operated on such matrix slices of an array with ndims(A)==3.

BatchedTranspose{T, N, S} <: AbstractBatchedMatrix{T, N}
 BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

NNlib.batched_transposeFunction
batched_transpose(A::AbstractArray{T,3})
 batched_adjoint(A)

Equivalent to applying transpose or adjoint to each matrix A[:,:,k].

These exist to control how batched_mul behaves, as it operated on such matrix slices of an array with ndims(A)==3.

BatchedTranspose{T, N, S} <: AbstractBatchedMatrix{T, N}
-BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

+BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

diff --git a/previews/PR1152/models/recurrence/index.html b/previews/PR1152/models/recurrence/index.html index 8b4d9178..35c6a0e0 100644 --- a/previews/PR1152/models/recurrence/index.html +++ b/previews/PR1152/models/recurrence/index.html @@ -39,4 +39,4 @@ m = Flux.Recur(rnn, h) y = m(x)

The Recur wrapper stores the state between runs in the m.state field.

If you use the RNN(10, 5) constructor – as opposed to RNNCell – you'll see that it's simply a wrapped cell.

julia> RNN(10, 5)
 Recur(RNNCell(10, 5, tanh))

Sequences

Often we want to work with sequences of inputs, rather than individual xs.

seq = [rand(10) for i = 1:10]

With Recur, applying our model to each element of a sequence is trivial:

m.(seq) # returns a list of 5-element vectors

This works even when we've chain recurrent layers into a larger model.

m = Chain(LSTM(10, 15), Dense(15, 5))
-m.(seq)

Finally, we can reset the hidden state of the cell back to its initial value using reset!(m).

+m.(seq)

Finally, we can reset the hidden state of the cell back to its initial value using reset!(m).

diff --git a/previews/PR1152/models/regularisation/index.html b/previews/PR1152/models/regularisation/index.html index 5009da7d..2dc42d42 100644 --- a/previews/PR1152/models/regularisation/index.html +++ b/previews/PR1152/models/regularisation/index.html @@ -36,4 +36,4 @@ julia> activations(c, rand(10)) Float32[0.5192045, 0.48079553] julia> sum(norm, ans) -2.1166067f0
Flux.activationsFunction
activations(c::Chain, input)

Calculate the forward results of each layers in Chain c with input as model input.

source
+2.1166067f0
Flux.activationsFunction
activations(c::Chain, input)

Calculate the forward results of each layers in Chain c with input as model input.

source
diff --git a/previews/PR1152/performance/index.html b/previews/PR1152/performance/index.html index f317c5fd..f272f89e 100644 --- a/previews/PR1152/performance/index.html +++ b/previews/PR1152/performance/index.html @@ -17,4 +17,4 @@ y_batch = reduce(hcat, ys) function loss_total(x_batch::Matrix, y_batch::Matrix) y_preds = model(x_batch) sum(loss.(y_preds, y_batch)) -end

When doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penalty, and will hit the optimised reduce method.

+end

When doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penalty, and will hit the optimised reduce method.

diff --git a/previews/PR1152/saving/index.html b/previews/PR1152/saving/index.html index 10f055ac..92c64851 100644 --- a/previews/PR1152/saving/index.html +++ b/previews/PR1152/saving/index.html @@ -47,4 +47,4 @@ evalcb = throttle(30) do # Show loss @save "model-checkpoint.bson" model end

This will update the "model-checkpoint.bson" file every thirty seconds.

You can get more advanced by saving a series of models throughout training, for example

@save "model-$(now()).bson" model

will produce a series of models like "model-2018-03-06T02:57:10.41.bson". You could also store the current test set loss, so that it's easy to (for example) revert to an older copy of the model if it starts to overfit.

@save "model-$(now()).bson" model loss = testloss()

You can even store optimiser state alongside the model, to resume training exactly where you left off.

opt = ADAM()
-@save "model-$(now()).bson" model opt
+@save "model-$(now()).bson" model opt diff --git a/previews/PR1152/search/index.html b/previews/PR1152/search/index.html index 37b16c01..dbdac208 100644 --- a/previews/PR1152/search/index.html +++ b/previews/PR1152/search/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Loading search...

    +

    Loading search...

      diff --git a/previews/PR1152/training/optimisers/index.html b/previews/PR1152/training/optimisers/index.html index 6513977e..e7fd53ce 100644 --- a/previews/PR1152/training/optimisers/index.html +++ b/previews/PR1152/training/optimisers/index.html @@ -88,4 +88,4 @@ end loss(rand(10)) # around 0.9

      In this manner it is possible to compose optimisers for some added flexibility.

      Decays

      Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.

      Flux.Optimise.ExpDecayType
      ExpDecay(η = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4)

      Discount the learning rate η by the factor decay every decay_step steps till a minimum of clip.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • decay: Factor by which the learning rate is discounted.
      • decay_step: Schedule decay operations by setting the number of steps between two decay operations.
      • clip: Minimum value of learning rate.

      Examples

      To apply exponential decay to an optimiser:

      Optimiser(ExpDecay(..), Opt(..))
       
      -opt = Optimiser(ExpDecay(), ADAM())
      source
      Flux.Optimise.InvDecayType
      InvDecay(γ = 0.001)

      Apply inverse time decay to an optimiser, so that the effective step size at iteration n is eta / (1 + γ * n) where eta is the initial step size. The wrapped optimiser's step size is not modified.

      Examples

      Optimiser(InvDecay(..), Opt(..))
      source
      Flux.Optimise.WeightDecayType
      WeightDecay(wd = 0)

      Decay weights by wd.

      Parameters

      • Weight decay (wd)
      source
      +opt = Optimiser(ExpDecay(), ADAM())source
      Flux.Optimise.InvDecayType
      InvDecay(γ = 0.001)

      Apply inverse time decay to an optimiser, so that the effective step size at iteration n is eta / (1 + γ * n) where eta is the initial step size. The wrapped optimiser's step size is not modified.

      Examples

      Optimiser(InvDecay(..), Opt(..))
      source
      Flux.Optimise.WeightDecayType
      WeightDecay(wd = 0)

      Decay weights by wd.

      Parameters

      • Weight decay (wd)
      source
      diff --git a/previews/PR1152/training/training/index.html b/previews/PR1152/training/training/index.html index 5abe1fda..b25f88cb 100644 --- a/previews/PR1152/training/training/index.html +++ b/previews/PR1152/training/training/index.html @@ -55,4 +55,4 @@ end

      You could simplify this further, for example by hard-coding in the loss function.

      +end

      You could simplify this further, for example by hard-coding in the loss function.

      diff --git a/previews/PR1152/utilities/index.html b/previews/PR1152/utilities/index.html index c368ccae..ffb8588a 100644 --- a/previews/PR1152/utilities/index.html +++ b/previews/PR1152/utilities/index.html @@ -96,4 +96,4 @@ julia> θ ...

      The second return value re allows you to reconstruct the original network after making modifications to the weight vector (for example, with a hypernetwork).

      julia> re(θ .* 2)
       Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
      source

      Callback Helpers

      Flux.throttleFunction
      throttle(f, timeout; leading=true, trailing=false)

      Return a function that when invoked, will only be triggered at most once during timeout seconds.

      Normally, the throttled function will run as much as it can, without ever going more than once per wait duration; but if you'd like to disable the execution on the leading edge, pass leading=false. To enable execution on the trailing edge, pass trailing=true.

      source
      Flux.Optimise.stopFunction
      stop()

      Call Flux.stop() in a callback to indicate when a callback condition is met. This will trigger the train loop to stop and exit.

      Examples

      cb = function ()
         accuracy() > 0.9 && Flux.stop()
      -end
      source
      +endsource