From dc6cd29bc9b42e38231f152871caffbdf06c950e Mon Sep 17 00:00:00 2001 From: zeptodoctor <44736852+zeptodoctor@users.noreply.github.com> Date: Tue, 12 May 2020 15:14:25 +0000 Subject: [PATCH] build based on a84e08c --- dev/community/index.html | 2 +- dev/data/dataloader/index.html | 2 +- dev/data/onehot/index.html | 6 ++--- dev/datasets/index.html | 12 +++++----- dev/ecosystem/index.html | 2 +- dev/gpu/index.html | 2 +- dev/index.html | 2 +- dev/models/advanced/index.html | 2 +- dev/models/basics/index.html | 6 ++--- dev/models/layers/index.html | 34 ++++++++++++++-------------- dev/models/nnlib/index.html | 2 +- dev/models/recurrence/index.html | 2 +- dev/models/regularisation/index.html | 2 +- dev/performance/index.html | 2 +- dev/saving/index.html | 2 +- dev/search/index.html | 2 +- dev/training/optimisers/index.html | 30 ++++++++++++------------ dev/training/training/index.html | 6 ++--- dev/utilities/index.html | 24 ++++++++++---------- 19 files changed, 71 insertions(+), 71 deletions(-) diff --git a/dev/community/index.html b/dev/community/index.html index 1632b5d2..37583e39 100644 --- a/dev/community/index.html +++ b/dev/community/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Community

All Flux users are welcome to join our community on the Julia forum, or the slack (channel #machine-learning). If you have questions or issues we'll try to help you out.

If you're interested in hacking on Flux, the source code is open and easy to understand – it's all just the same Julia code you work with normally. You might be interested in our intro issues to get started.

+

Community

All Flux users are welcome to join our community on the Julia forum, or the slack (channel #machine-learning). If you have questions or issues we'll try to help you out.

If you're interested in hacking on Flux, the source code is open and easy to understand – it's all just the same Julia code you work with normally. You might be interested in our intro issues to get started.

diff --git a/dev/data/dataloader/index.html b/dev/data/dataloader/index.html index 2b63c850..93fe4def 100644 --- a/dev/data/dataloader/index.html +++ b/dev/data/dataloader/index.html @@ -29,4 +29,4 @@ end # train for 10 epochs using IterTools: ncycle -Flux.train!(loss, ps, ncycle(train_loader, 10), opt)source +Flux.train!(loss, ps, ncycle(train_loader, 10), opt)source diff --git a/dev/data/onehot/index.html b/dev/data/onehot/index.html index cececb72..469c5a85 100644 --- a/dev/data/onehot/index.html +++ b/dev/data/onehot/index.html @@ -35,11 +35,11 @@ julia> Flux.onehot(:c, [:a, :b, :c]) 3-element Flux.OneHotVector: 0 0 - 1source
Flux.onecoldFunction
onecold(y[, labels = 1:length(y)])

Inverse operations of onehot.

Examples

julia> Flux.onecold([true, false, false], [:a, :b, :c])
+ 1
source
Flux.onecoldFunction
onecold(y[, labels = 1:length(y)])

Inverse operations of onehot.

Examples

julia> Flux.onecold([true, false, false], [:a, :b, :c])
 :a
 
 julia> Flux.onecold([0.3, 0.2, 0.5], [:a, :b, :c])
-:c
source

Batches

onehotbatch creates a batch (matrix) of one-hot vectors, and onecold treats matrices as batches.

julia> using Flux: onehotbatch
+:c
source

Batches

onehotbatch creates a batch (matrix) of one-hot vectors, and onecold treats matrices as batches.

julia> using Flux: onehotbatch
 
 julia> onehotbatch([:b, :a, :b], [:a, :b, :c])
 3×3 Flux.OneHotMatrix:
@@ -55,4 +55,4 @@ julia> onecold(ans, [:a, :b, :c])
 3×3 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
  0  1  0
  1  0  1
- 0  0  0
source + 0 0 0source diff --git a/dev/datasets/index.html b/dev/datasets/index.html index 29f2264c..2e2f24d8 100644 --- a/dev/datasets/index.html +++ b/dev/datasets/index.html @@ -16,14 +16,14 @@ julia> features[:, 1] 5.1 3.5 1.4 - 0.2source
Flux.Data.Iris.labelsMethod
labels()

Get the labels of the iris dataset, a 150 element array of strings listing the species of each example.

julia> labels = Flux.Data.Iris.labels();
+ 0.2
source
Flux.Data.Iris.labelsMethod
labels()

Get the labels of the iris dataset, a 150 element array of strings listing the species of each example.

julia> labels = Flux.Data.Iris.labels();
 
 julia> summary(labels)
 "150-element Array{String,1}"
 
 julia> labels[1]
-"Iris-setosa"
source
Flux.Data.MNIST.imagesMethod
images()
-images(:test)

Load the MNIST images.

Each image is a 28×28 array of Gray colour values (see Colors.jl).

Return the 60,000 training images by default; pass :test to retrieve the 10,000 test images.

source
Flux.Data.MNIST.labelsMethod
labels()
-labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.FashionMNIST.imagesMethod
images()
-images(:test)

Load the Fashion-MNIST images.

Each image is a 28×28 array of Gray colour values (see Colors.jl).

Return the 60,000 training images by default; pass :test to retrieve the 10,000 test images.

source
Flux.Data.FashionMNIST.labelsMethod
labels()
-labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.CMUDict.phonesMethod
phones()

Return a Vector containing the phones used in the CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.symbolsMethod
symbols()

Return a Vector containing the symbols used in the CMU Pronouncing Dictionary. A symbol is a phone with optional auxiliary symbols, indicating for example the amount of stress on the phone.

source
Flux.Data.CMUDict.rawdictMethod
rawdict()

Return the unfiltered CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.cmudictMethod
cmudict()

Return a filtered CMU Pronouncing Dictionary.

It is filtered so each word contains only ASCII characters and a combination of word characters (as determined by the regex engine using \w), '-' and '.'.

source
Flux.Data.Sentiment.trainMethod
train()

Return the train split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.testMethod
test()

Return the test split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.devMethod
dev()

Return the dev split of the Stanford Sentiment Treebank. The data is in treebank format.

source
+"Iris-setosa"source
Flux.Data.MNIST.imagesMethod
images()
+images(:test)

Load the MNIST images.

Each image is a 28×28 array of Gray colour values (see Colors.jl).

Return the 60,000 training images by default; pass :test to retrieve the 10,000 test images.

source
Flux.Data.MNIST.labelsMethod
labels()
+labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.FashionMNIST.imagesMethod
images()
+images(:test)

Load the Fashion-MNIST images.

Each image is a 28×28 array of Gray colour values (see Colors.jl).

Return the 60,000 training images by default; pass :test to retrieve the 10,000 test images.

source
Flux.Data.FashionMNIST.labelsMethod
labels()
+labels(:test)

Load the labels corresponding to each of the images returned from images(). Each label is a number from 0-9.

Return the 60,000 training labels by default; pass :test to retrieve the 10,000 test labels.

source
Flux.Data.CMUDict.phonesMethod
phones()

Return a Vector containing the phones used in the CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.symbolsMethod
symbols()

Return a Vector containing the symbols used in the CMU Pronouncing Dictionary. A symbol is a phone with optional auxiliary symbols, indicating for example the amount of stress on the phone.

source
Flux.Data.CMUDict.rawdictMethod
rawdict()

Return the unfiltered CMU Pronouncing Dictionary.

source
Flux.Data.CMUDict.cmudictMethod
cmudict()

Return a filtered CMU Pronouncing Dictionary.

It is filtered so each word contains only ASCII characters and a combination of word characters (as determined by the regex engine using \w), '-' and '.'.

source
Flux.Data.Sentiment.trainMethod
train()

Return the train split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.testMethod
test()

Return the test split of the Stanford Sentiment Treebank. The data is in treebank format.

source
Flux.Data.Sentiment.devMethod
dev()

Return the dev split of the Stanford Sentiment Treebank. The data is in treebank format.

source
diff --git a/dev/ecosystem/index.html b/dev/ecosystem/index.html index c0d63d0a..b6956b99 100644 --- a/dev/ecosystem/index.html +++ b/dev/ecosystem/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

The Julia Ecosystem

One of the main strengths of Julia lies in an ecosystem of packages globally providing a rich and consistent user experience.

This is a non-exhaustive list of Julia packages, nicely complementing Flux in typical machine learning and deep learning workflows:

This tight integration among Julia pakages is shown in some of the examples in the model-zoo repository.

+

The Julia Ecosystem

One of the main strengths of Julia lies in an ecosystem of packages globally providing a rich and consistent user experience.

This is a non-exhaustive list of Julia packages, nicely complementing Flux in typical machine learning and deep learning workflows:

This tight integration among Julia pakages is shown in some of the examples in the model-zoo repository.

diff --git a/dev/gpu/index.html b/dev/gpu/index.html index fa11e41f..85715590 100644 --- a/dev/gpu/index.html +++ b/dev/gpu/index.html @@ -47,4 +47,4 @@ julia> x |> cpu 10-element Array{Float32,1}: 0.235164 ⋮ - 0.192538 + 0.192538 diff --git a/dev/index.html b/dev/index.html index fc7c7a02..34d9e229 100644 --- a/dev/index.html +++ b/dev/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Flux: The Julia Machine Learning Library

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

  • Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
  • You could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
  • Play nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

Installation

Download Julia 1.0 or later, if you haven't already. You can add Flux from using Julia's package manager, by typing ] add Flux in the Julia prompt.

If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details.

Learning Flux

There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand Flux's source code, which is intended to be concise, legible and a good reference for more advanced concepts.

+

Flux: The Julia Machine Learning Library

Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:

  • Doing the obvious thing. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work – and be fast.
  • You could have written Flux. All of it, from LSTMs to GPU kernels, is straightforward Julia code. When in doubt, it’s well worth looking at the source. If you need something different, you can easily roll your own.
  • Play nicely with others. Flux works well with Julia libraries from data frames and images to differential equation solvers, so you can easily build complex data processing pipelines that integrate Flux models.

Installation

Download Julia 1.0 or later, if you haven't already. You can add Flux from using Julia's package manager, by typing ] add Flux in the Julia prompt.

If you have CUDA you can also run ] add CuArrays to get GPU support; see here for more details.

Learning Flux

There are several different ways to learn Flux. If you just want to get started writing models, the model zoo gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand Flux's source code, which is intended to be concise, legible and a good reference for more advanced concepts.

diff --git a/dev/models/advanced/index.html b/dev/models/advanced/index.html index cdac8905..a2579b73 100644 --- a/dev/models/advanced/index.html +++ b/dev/models/advanced/index.html @@ -24,4 +24,4 @@ Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.66245 ) ps = Flux.params(m[3:end])

The Zygote.Params object ps now holds a reference to only the parameters of the layers passed to it.

During training, the gradients will only be computed for (and applied to) the last Dense layer, therefore only that would have its parameters changed.

Flux.params also takes multiple inputs to make it easy to collect parameters from heterogenous models with a single call. A simple demonstration would be if we wanted to omit optimising the second Dense layer in the previous example. It would look something like this:

Flux.params(m[1], m[3:end])

Sometimes, a more fine-tuned control is needed. We can freeze a specific parameter of a specific layer which already entered a Params object ps, by simply deleting it from ps:

ps = params(m)
-delete!(ps, m[2].b) 
+delete!(ps, m[2].b) diff --git a/dev/models/basics/index.html b/dev/models/basics/index.html index b7448ba7..4858f238 100644 --- a/dev/models/basics/index.html +++ b/dev/models/basics/index.html @@ -109,8 +109,8 @@ model2(rand(10)) # => 2-element vector

This quickly starts to m(rand(10))

Likewise, Chain will happily work with any Julia function.

m = Chain(x -> x^2, x -> x+1)
 
 m(5) # => 26

Layer helpers

Flux provides a set of helpers for custom layers, which you can enable by calling

Flux.@functor Affine

This enables a useful extra set of functionality for our Affine layer, such as collecting its parameters or moving it to the GPU.

For some more helpful tricks, including parameter freezing, please checkout the advanced usage guide.

Utility functions

Flux provides some utility functions to help you generate models in an automated fashion.

outdims enables you to calculate the spatial output dimensions of layers like Conv when applied to input images of a given size. Currently limited to the following layers:

Flux.outdimsFunction
outdims(c::Chain, isize)

Calculate the output dimensions given the input dimensions, isize.

m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32))
-outdims(m, (10, 10)) == (6, 6)
source
outdims(l::Dense, isize)

Calculate the output dimensions given the input dimensions, isize.

m = Dense(10, 5)
+outdims(m, (10, 10)) == (6, 6)
source
outdims(l::Dense, isize)

Calculate the output dimensions given the input dimensions, isize.

m = Dense(10, 5)
 outdims(m, (5, 2)) == (5,)
-outdims(m, (10,)) == (5,)
source
outdims(l::Conv, isize::Tuple)

Calculate the output dimensions given the input dimensions isize. Batch size and channel size are ignored as per NNlib.jl.

m = Conv((3, 3), 3 => 16)
+outdims(m, (10,)) == (5,)
source
outdims(l::Conv, isize::Tuple)

Calculate the output dimensions given the input dimensions isize. Batch size and channel size are ignored as per NNlib.jl.

m = Conv((3, 3), 3 => 16)
 outdims(m, (10, 10)) == (8, 8)
-outdims(m, (10, 10, 1, 3)) == (8, 8)
source
+outdims(m, (10, 10, 1, 3)) == (8, 8)source diff --git a/dev/models/layers/index.html b/dev/models/layers/index.html index c602f18a..ded7ec64 100644 --- a/dev/models/layers/index.html +++ b/dev/models/layers/index.html @@ -16,7 +16,7 @@ julia> m = Chain(Dense(10, 5), Dense(5, 2)); julia> x = rand(10); julia> m(x) == m[2](m[1](x)) -truesource
Flux.DenseType
Dense(in::Integer, out::Integer, σ = identity)

Create a traditional Dense layer with parameters W and b.

y = σ.(W * x .+ b)

The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be a vector or batch of length out.

Examples

```jldoctest; setup = :(using Random; Random.seed!(0)) julia> d = Dense(5, 2) Dense(5, 2)

julia> d(rand(5)) 2-element Array{Float32,1}: -0.16210233 0.12311903```

source

Convolution and Pooling Layers

These layers are used to build convolutional neural networks (CNNs).

Flux.ConvType
Conv(filter, in => out, σ = identity; init = glorot_uniform,
+true
source
Flux.DenseType
Dense(in::Integer, out::Integer, σ = identity)

Create a traditional Dense layer with parameters W and b.

y = σ.(W * x .+ b)

The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be a vector or batch of length out.

Examples

```jldoctest; setup = :(using Random; Random.seed!(0)) julia> d = Dense(5, 2) Dense(5, 2)

julia> d(rand(5)) 2-element Array{Float32,1}: -0.16210233 0.12311903```

source

Convolution and Pooling Layers

These layers are used to build convolutional neural networks (CNNs).

Flux.ConvType
Conv(filter, in => out, σ = identity; init = glorot_uniform,
      stride = 1, pad = 0, dilation = 1)
 
 filter = (2,2)
@@ -25,30 +25,30 @@ out = 16
 Conv((2, 2), 1=>16, relu)

Standard convolutional layer. filter should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

Accepts keyword arguments weight and bias to set the corresponding fields. Setting bias to Flux.Zeros() will switch bias off for the layer.

Takes the keyword arguments pad, stride and dilation. Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

Examples

Apply a Conv layer to a 1-channel input using a 2×2 window filter size, giving us a 16-channel output. Output is activated with ReLU.

filter = (2,2)
 in = 1
 out = 16
-Conv(filter, in => out, relu)
source
Flux.MaxPoolType
MaxPool(k; pad = 0, stride = k)

Max pooling layer. k is the size of the window for each dimension of the input.

Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

source
Flux.GlobalMaxPoolType
GlobalMaxPool()

Global max pooling layer.

Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing max pooling on the complete (w,h)-shaped feature maps.

source
Flux.MeanPoolType
MeanPool(k; pad = 0, stride = k)

Mean pooling layer. k is the size of the window for each dimension of the input.

Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

source
Flux.GlobalMeanPoolType
GlobalMeanPool()

Global mean pooling layer.

Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing mean pooling on the complete (w,h)-shaped feature maps.

source
Flux.DepthwiseConvType
DepthwiseConv(filter::Tuple, in=>out)
+Conv(filter, in => out, relu)
source
Flux.MaxPoolType
MaxPool(k; pad = 0, stride = k)

Max pooling layer. k is the size of the window for each dimension of the input.

Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

source
Flux.GlobalMaxPoolType
GlobalMaxPool()

Global max pooling layer.

Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing max pooling on the complete (w,h)-shaped feature maps.

source
Flux.MeanPoolType
MeanPool(k; pad = 0, stride = k)

Mean pooling layer. k is the size of the window for each dimension of the input.

Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

source
Flux.GlobalMeanPoolType
GlobalMeanPool()

Global mean pooling layer.

Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing mean pooling on the complete (w,h)-shaped feature maps.

source
Flux.DepthwiseConvType
DepthwiseConv(filter::Tuple, in=>out)
 DepthwiseConv(filter::Tuple, in=>out, activation)
 DepthwiseConv(filter, in => out, σ = identity; init = glorot_uniform,
-              stride = 1, pad = 0, dilation = 1)

Depthwise convolutional layer. filter should be a tuple like (2, 2). in and out specify the number of input and output channels respectively. Note that out must be an integer multiple of in.

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

Accepts keyword arguments weight and bias to set the corresponding fields. Setting bias to Flux.Zeros() will switch bias off for the layer.

Takes the keyword arguments pad, stride and dilation. Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

source
Flux.ConvTransposeType
ConvTranspose(filter, in=>out)
+              stride = 1, pad = 0, dilation = 1)

Depthwise convolutional layer. filter should be a tuple like (2, 2). in and out specify the number of input and output channels respectively. Note that out must be an integer multiple of in.

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

Accepts keyword arguments weight and bias to set the corresponding fields. Setting bias to Flux.Zeros() will switch bias off for the layer.

Takes the keyword arguments pad, stride and dilation. Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

source
Flux.ConvTransposeType
ConvTranspose(filter, in=>out)
 ConvTranspose(filter, in=>out, activation)
 ConvTranspose(filter, in => out, σ = identity; init = glorot_uniform,
-              stride = 1, pad = 0, dilation = 1)

Standard convolutional transpose layer. filter should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

Accepts keyword arguments weight and bias to set the corresponding fields. Setting bias to Flux.Zeros() will switch bias off for the layer.

Takes the keyword arguments pad, stride and dilation. Use pad=SamePad() to apply padding so that outputsize == stride * inputsize - stride + 1.

source
Flux.CrossCorType
CrossCor(filter, in=>out)
+              stride = 1, pad = 0, dilation = 1)

Standard convolutional transpose layer. filter should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

Accepts keyword arguments weight and bias to set the corresponding fields. Setting bias to Flux.Zeros() will switch bias off for the layer.

Takes the keyword arguments pad, stride and dilation. Use pad=SamePad() to apply padding so that outputsize == stride * inputsize - stride + 1.

source
Flux.CrossCorType
CrossCor(filter, in=>out)
 CrossCor(filter, in=>out, activation)
 CrossCor(filter, in => out, σ = identity; init = glorot_uniform,
          stride = 1, pad = 0, dilation = 1)

Standard cross convolutional layer. filter should be a tuple like (2, 2). in and out specify the number of input and output channels respectively.

Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.

Accepts keyword arguments weight and bias to set the corresponding fields. Setting bias to Flux.Zeros() will switch bias off for the layer.

Takes the keyword arguments pad, stride and dilation. Use pad=SamePad() to apply padding so that outputsize == inputsize / stride.

Examples

Apply a CrossCor layer to a 1-channel input using a 2×2 window filter size, giving us a 16-channel output. Output is activated with ReLU.

filter = (2,2)
 in = 1
 out = 16
-CrossCor((2, 2), 1=>16, relu)
source
Flux.flattenFunction
flatten(x::AbstractArray)

Transform (w, h, c, b)-shaped input into (w × h × c, b)-shaped output by linearizing all values for each element in the batch.

source

Recurrent Layers

Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).

Flux.RNNFunction
RNN(in::Integer, out::Integer, σ = tanh)

The most basic recurrent layer; essentially acts as a Dense layer, but with the output fed back into the input each time step.

source
Flux.LSTMFunction
LSTM(in::Integer, out::Integer)

Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.

See this article for a good overview of the internals.

source
Flux.GRUFunction
GRU(in::Integer, out::Integer)

Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.

See this article for a good overview of the internals.

source
Flux.RecurType
Recur(cell)

Recur takes a recurrent cell and makes it stateful, managing the hidden state in the background. cell should be a model of the form:

h, y = cell(h, x...)

For example, here's a recurrent network that keeps a running total of its inputs:

accum(h, x) = (h + x, x)
+CrossCor((2, 2), 1=>16, relu)
source
Flux.flattenFunction
flatten(x::AbstractArray)

Transform (w, h, c, b)-shaped input into (w × h × c, b)-shaped output by linearizing all values for each element in the batch.

source

Recurrent Layers

Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).

Flux.RNNFunction
RNN(in::Integer, out::Integer, σ = tanh)

The most basic recurrent layer; essentially acts as a Dense layer, but with the output fed back into the input each time step.

source
Flux.LSTMFunction
LSTM(in::Integer, out::Integer)

Long Short Term Memory recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.

See this article for a good overview of the internals.

source
Flux.GRUFunction
GRU(in::Integer, out::Integer)

Gated Recurrent Unit layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.

See this article for a good overview of the internals.

source
Flux.RecurType
Recur(cell)

Recur takes a recurrent cell and makes it stateful, managing the hidden state in the background. cell should be a model of the form:

h, y = cell(h, x...)

For example, here's a recurrent network that keeps a running total of its inputs:

accum(h, x) = (h + x, x)
 rnn = Flux.Recur(accum, 0)
 rnn(2)      # 2
 rnn(3)      # 3
 rnn.state   # 5
 rnn.(1:10)  # apply to a sequence
-rnn.state   # 60
source
Flux.reset!Function
reset!(rnn)

Reset the hidden state of a recurrent layer back to its original value.

Assuming you have a Recur layer rnn, this is roughly equivalent to:

rnn.state = hidden(rnn.cell)
source

Other General Purpose Layers

These are marginally more obscure than the Basic Layers. But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).

Flux.MaxoutType
Maxout(over)

The Maxout layer has a number of internal layers which all receive the same input. It returns the elementwise maximum of the internal layers' outputs.

Maxout over linear dense layers satisfies the univeral approximation theorem.

source
Flux.SkipConnectionType
SkipConnection(layer, connection)

Create a skip connection which consists of a layer or Chain of consecutive layers and a shortcut connection linking the block's input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given layer while the second is the unchanged, "skipped" input.

The simplest "ResNet"-type connection is just SkipConnection(layer, +), and requires the output of the layers to be the same shape as the input. Here is a more complicated example:

m = Conv((3,3), 4=>7, pad=(1,1))
+rnn.state   # 60
source
Flux.reset!Function
reset!(rnn)

Reset the hidden state of a recurrent layer back to its original value.

Assuming you have a Recur layer rnn, this is roughly equivalent to:

rnn.state = hidden(rnn.cell)
source

Other General Purpose Layers

These are marginally more obscure than the Basic Layers. But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).

Flux.MaxoutType
Maxout(over)

The Maxout layer has a number of internal layers which all receive the same input. It returns the elementwise maximum of the internal layers' outputs.

Maxout over linear dense layers satisfies the univeral approximation theorem.

source
Flux.SkipConnectionType
SkipConnection(layer, connection)

Create a skip connection which consists of a layer or Chain of consecutive layers and a shortcut connection linking the block's input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given layer while the second is the unchanged, "skipped" input.

The simplest "ResNet"-type connection is just SkipConnection(layer, +), and requires the output of the layers to be the same shape as the input. Here is a more complicated example:

m = Conv((3,3), 4=>7, pad=(1,1))
 x = ones(5,5,4,10);
 size(m(x)) == (5, 5, 7, 10)
 
 sm = SkipConnection(m, (mx, x) -> cat(mx, x, dims=3))
-size(sm(x)) == (5, 5, 11, 10)
source

Normalisation & Regularisation

These layers don't affect the structure of the network but may improve training times or reduce overfitting.

Flux.normaliseFunction
normalise(x; dims=1)

Normalise x to mean 0 and standard deviation 1 across the dimensions given by dims. Defaults to normalising over columns.

julia> a = reshape(collect(1:9), 3, 3)
+size(sm(x)) == (5, 5, 11, 10)
source

Normalisation & Regularisation

These layers don't affect the structure of the network but may improve training times or reduce overfitting.

Flux.normaliseFunction
normalise(x; dims=1)

Normalise x to mean 0 and standard deviation 1 across the dimensions given by dims. Defaults to normalising over columns.

julia> a = reshape(collect(1:9), 3, 3)
 3×3 Array{Int64,2}:
  1  4  7
  2  5  8
@@ -64,35 +64,35 @@ julia> Flux.normalise(a, dims=2)
 3×3 Array{Float64,2}:
  -1.22474  0.0  1.22474
  -1.22474  0.0  1.22474
- -1.22474  0.0  1.22474
source
Flux.BatchNormType
BatchNorm(channels::Integer, σ = identity;
+ -1.22474  0.0  1.22474
source
Flux.BatchNormType
BatchNorm(channels::Integer, σ = identity;
           initβ = zeros, initγ = ones,
           ϵ = 1e-8, momentum = .1)

Batch Normalization layer. channels should be the size of the channel dimension in your data (see below).

Given an array with N dimensions, call the N-1th the channel dimension. (For a batch of feature vectors this is just the data dimension, for WHCN images it's the usual channel dimension.)

BatchNorm computes the mean and variance for each each W×H×1×N slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel bias and scale parameters).

Use testmode! during inference.

Examples

m = Chain(
   Dense(28^2, 64),
   BatchNorm(64, relu),
   Dense(64, 10),
   BatchNorm(10),
-  softmax)
source
Flux.dropoutFunction
dropout(x, p; dims = :)

The dropout function. For each input, either sets that input to 0 (with probability p) or scales it by 1 / (1 - p). dims specifies the unbroadcasted dimensions, e.g. dims=1 applies dropout along columns and dims=2 along rows. This is used as a regularisation, i.e. it reduces overfitting during training.

See also the Dropout layer.

source
Flux.DropoutType
Dropout(p, dims = :)

Dropout layer. In the forward pass, apply the Flux.dropout function on the input.

Does nothing to the input once Flux.testmode! is true.

source
Flux.AlphaDropoutType
AlphaDropout(p)

A dropout layer. Used in Self-Normalizing Neural Networks. The AlphaDropout layer ensures that mean and variance of activations remain the same as before.

Does nothing to the input once testmode! is true.

source
Flux.LayerNormType
LayerNorm(h::Integer)

A normalisation layer designed to be used with recurrent hidden states of size h. Normalises the mean and standard deviation of each input before applying a per-neuron gain/bias.

source
Flux.InstanceNormType
InstanceNorm(channels::Integer, σ = identity;
+  softmax)
source
Flux.dropoutFunction
dropout(x, p; dims = :)

The dropout function. For each input, either sets that input to 0 (with probability p) or scales it by 1 / (1 - p). dims specifies the unbroadcasted dimensions, e.g. dims=1 applies dropout along columns and dims=2 along rows. This is used as a regularisation, i.e. it reduces overfitting during training.

See also the Dropout layer.

source
Flux.DropoutType
Dropout(p, dims = :)

Dropout layer. In the forward pass, apply the Flux.dropout function on the input.

Does nothing to the input once Flux.testmode! is true.

source
Flux.AlphaDropoutType
AlphaDropout(p)

A dropout layer. Used in Self-Normalizing Neural Networks. The AlphaDropout layer ensures that mean and variance of activations remain the same as before.

Does nothing to the input once testmode! is true.

source
Flux.LayerNormType
LayerNorm(h::Integer)

A normalisation layer designed to be used with recurrent hidden states of size h. Normalises the mean and standard deviation of each input before applying a per-neuron gain/bias.

source
Flux.InstanceNormType
InstanceNorm(channels::Integer, σ = identity;
              initβ = zeros, initγ = ones,
              ϵ = 1e-8, momentum = .1)

Instance Normalization layer. channels should be the size of the channel dimension in your data (see below).

Given an array with N dimensions, call the N-1th the channel dimension. (For a batch of feature vectors this is just the data dimension, for WHCN images it's the usual channel dimension.)

InstanceNorm computes the mean and variance for each each W×H×1×1 slice and shifts them to have a new mean and variance (corresponding to the learnable, per-channel bias and scale parameters).

Use testmode! during inference.

Examples

m = Chain(
   Dense(28^2, 64),
   InstanceNorm(64, relu),
   Dense(64, 10),
   InstanceNorm(10),
-  softmax)
source
Flux.GroupNormType
GroupNorm(chs::Integer, G::Integer, λ = identity;
+  softmax)
source
Flux.GroupNormType
GroupNorm(chs::Integer, G::Integer, λ = identity;
           initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i),
           ϵ = 1f-5, momentum = 0.1f0)

Group Normalization layer. This layer can outperform Batch Normalization and Instance Normalization.

chs is the number of channels, the channel dimension of your input. For an array of N dimensions, the N-1th index is the channel dimension.

G is the number of groups along which the statistics are computed. The number of channels must be an integer multiple of the number of groups.

Use testmode! during inference.

Examples

m = Chain(Conv((3,3), 1=>32, leakyrelu;pad = 1),
           GroupNorm(32,16))
-          # 32 channels, 16 groups (G = 16), thus 2 channels per group used
source

Testmode

Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides Flux.testmode!. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.

Flux.testmode!Function
testmode!(m, mode = true)

Set a layer or model's test mode (see below). Using :auto mode will treat any gradient computation as training.

Note: if you manually set a model into test mode, you need to manually place it back into train mode during training phase.

Possible values include:

  • false for training
  • true for testing
  • :auto or nothing for Flux to detect the mode automatically
source
Flux.trainmode!Function
trainmode!(m, mode = true)

Set a layer of model's train mode (see below). Symmetric to testmode! (i.e. `trainmode!(m, mode) == testmode!(m, !mode)).

Note: if you manually set a model into train mode, you need to manually place it into test mode during testing phase.

Possible values include:

  • true for training
  • false for testing
  • :auto or nothing for Flux to detect the mode automatically
source

Cost Functions

Flux.maeFunction
mae(ŷ, y)

Return the mean of absolute error; calculated as sum(abs.(ŷ .- y)) / length(y).

source
Flux.mseFunction
mse(ŷ, y)

Return the mean squared error between ŷ and y; calculated as sum((ŷ .- y).^2) / length(y).

Examples

julia> Flux.mse([0, 2], [1, 1])
-1//1
source
Flux.msleFunction
msle(ŷ, y; ϵ=eps(eltype(ŷ)))

Return the mean of the squared logarithmic errors; calculated as sum((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2) / length(y). The ϵ term provides numerical stability.

Penalizes an under-predicted estimate greater than an over-predicted estimate.

source
Flux.huber_lossFunction
huber_loss(ŷ, y; δ=1.0)

Return the mean of the Huber loss given the prediction and true values y.

             | 0.5 * |ŷ - y|,            for |ŷ - y| <= δ
+          # 32 channels, 16 groups (G = 16), thus 2 channels per group used
source

Testmode

Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides Flux.testmode!. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.

Flux.testmode!Function
testmode!(m, mode = true)

Set a layer or model's test mode (see below). Using :auto mode will treat any gradient computation as training.

Note: if you manually set a model into test mode, you need to manually place it back into train mode during training phase.

Possible values include:

  • false for training
  • true for testing
  • :auto or nothing for Flux to detect the mode automatically
source
Flux.trainmode!Function
trainmode!(m, mode = true)

Set a layer of model's train mode (see below). Symmetric to testmode! (i.e. `trainmode!(m, mode) == testmode!(m, !mode)).

Note: if you manually set a model into train mode, you need to manually place it into test mode during testing phase.

Possible values include:

  • true for training
  • false for testing
  • :auto or nothing for Flux to detect the mode automatically
source

Cost Functions

Flux.maeFunction
mae(ŷ, y)

Return the mean of absolute error; calculated as sum(abs.(ŷ .- y)) / length(y).

source
Flux.mseFunction
mse(ŷ, y)

Return the mean squared error between ŷ and y; calculated as sum((ŷ .- y).^2) / length(y).

Examples

julia> Flux.mse([0, 2], [1, 1])
+1//1
source
Flux.msleFunction
msle(ŷ, y; ϵ=eps(eltype(ŷ)))

Return the mean of the squared logarithmic errors; calculated as sum((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2) / length(y). The ϵ term provides numerical stability.

Penalizes an under-predicted estimate greater than an over-predicted estimate.

source
Flux.huber_lossFunction
huber_loss(ŷ, y; δ=1.0)

Return the mean of the Huber loss given the prediction and true values y.

             | 0.5 * |ŷ - y|,            for |ŷ - y| <= δ
 Huber loss = |
-             |  δ * (|ŷ - y| - 0.5 * δ), otherwise
source
Flux.crossentropyFunction
crossentropy(ŷ, y; weight = nothing)

Return the cross entropy between the given probability distributions; calculated as -sum(y .* log.(ŷ) .* weight) / size(y, 2).

weight can be Nothing, a Number or an AbstractVector. weight=nothing acts like weight=1 but is faster.

See also: Flux.logitcrossentropy, Flux.binarycrossentropy, Flux.logitbinarycrossentropy

Examples

julia> Flux.crossentropy(softmax([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
-3.085467254747739
source
Flux.logitcrossentropyFunction
logitcrossentropy(ŷ, y; weight = 1)

Return the crossentropy computed after a Flux.logsoftmax operation; calculated as -sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2).

logitcrossentropy(ŷ, y) is mathematically equivalent to Flux.crossentropy(softmax(log(ŷ)), y) but it is more numerically stable.

See also: Flux.crossentropy, Flux.binarycrossentropy, Flux.logitbinarycrossentropy

Examples

julia> Flux.logitcrossentropy([-1.1491, 0.8619, 0.3127], [1, 1, 0])
-3.085467254747738
source
Flux.binarycrossentropyFunction
binarycrossentropy(ŷ, y; ϵ=eps(ŷ))

Return $-y*\log(ŷ + ϵ) - (1-y)*\log(1-ŷ + ϵ)$. The ϵ term provides numerical stability.

Typically, the prediction is given by the output of a sigmoid activation.

See also: Flux.crossentropy, Flux.logitcrossentropy, Flux.logitbinarycrossentropy

Examples

julia> Flux.binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
+             |  δ * (|ŷ - y| - 0.5 * δ), otherwise
source
Flux.crossentropyFunction
crossentropy(ŷ, y; weight = nothing)

Return the cross entropy between the given probability distributions; calculated as -sum(y .* log.(ŷ) .* weight) / size(y, 2).

weight can be Nothing, a Number or an AbstractVector. weight=nothing acts like weight=1 but is faster.

See also: Flux.logitcrossentropy, Flux.binarycrossentropy, Flux.logitbinarycrossentropy

Examples

julia> Flux.crossentropy(softmax([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
+3.085467254747739
source
Flux.logitcrossentropyFunction
logitcrossentropy(ŷ, y; weight = 1)

Return the crossentropy computed after a Flux.logsoftmax operation; calculated as -sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2).

logitcrossentropy(ŷ, y) is mathematically equivalent to Flux.crossentropy(softmax(log(ŷ)), y) but it is more numerically stable.

See also: Flux.crossentropy, Flux.binarycrossentropy, Flux.logitbinarycrossentropy

Examples

julia> Flux.logitcrossentropy([-1.1491, 0.8619, 0.3127], [1, 1, 0])
+3.085467254747738
source
Flux.binarycrossentropyFunction
binarycrossentropy(ŷ, y; ϵ=eps(ŷ))

Return $-y*\log(ŷ + ϵ) - (1-y)*\log(1-ŷ + ϵ)$. The ϵ term provides numerical stability.

Typically, the prediction is given by the output of a sigmoid activation.

See also: Flux.crossentropy, Flux.logitcrossentropy, Flux.logitbinarycrossentropy

Examples

julia> Flux.binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
 3-element Array{Float64,1}:
  1.424397097347566
  0.35231664672364077
- 0.8616703662235441
source
Flux.logitbinarycrossentropyFunction
logitbinarycrossentropy(ŷ, y)

logitbinarycrossentropy(ŷ, y) is mathematically equivalent to Flux.binarycrossentropy(σ(log(ŷ)), y) but it is more numerically stable.

See also: Flux.crossentropy, Flux.logitcrossentropy, Flux.binarycrossentropy

Examples

julia> Flux.logitbinarycrossentropy.([-1.1491, 0.8619, 0.3127], [1, 1, 0])
+ 0.8616703662235441
source
Flux.logitbinarycrossentropyFunction
logitbinarycrossentropy(ŷ, y)

logitbinarycrossentropy(ŷ, y) is mathematically equivalent to Flux.binarycrossentropy(σ(log(ŷ)), y) but it is more numerically stable.

See also: Flux.crossentropy, Flux.logitcrossentropy, Flux.binarycrossentropy

Examples

julia> Flux.logitbinarycrossentropy.([-1.1491, 0.8619, 0.3127], [1, 1, 0])
 3-element Array{Float64,1}:
  1.4243970973475661
  0.35231664672364094
- 0.8616703662235443
source
Flux.kldivergenceFunction
kldivergence(ŷ, y)

Return the Kullback-Leibler divergence between the given probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source
Flux.poissonFunction
poisson(ŷ, y)

Return how much the predicted distribution diverges from the expected Poisson distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).

More information..

source
Flux.hingeFunction
hinge(ŷ, y)

Return the hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2).

See also: squared_hinge

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y)

Return the squared hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2).

See also: hinge

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as: 1 - 2sum(|ŷ . y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)

source
+ 0.8616703662235443source
Flux.kldivergenceFunction
kldivergence(ŷ, y)

Return the Kullback-Leibler divergence between the given probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source
Flux.poissonFunction
poisson(ŷ, y)

Return how much the predicted distribution diverges from the expected Poisson distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).

More information..

source
Flux.hingeFunction
hinge(ŷ, y)

Return the hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum(max.(0, 1 .- ŷ .* y)) / size(y, 2).

See also: squared_hinge

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y)

Return the squared hinge loss given the prediction and true labels y (containing 1 or -1); calculated as sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2).

See also: hinge

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as: 1 - 2sum(|ŷ . y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)

source
diff --git a/dev/models/nnlib/index.html b/dev/models/nnlib/index.html index 7f3d0d9f..4186fec8 100644 --- a/dev/models/nnlib/index.html +++ b/dev/models/nnlib/index.html @@ -28,4 +28,4 @@ a = randomly sampled from uniform distribution U(l, u)

Randomized batched_adjoint(A)

Equivalent to applying transpose or adjoint to each matrix A[:,:,k].

These exist to control how batched_mul behaves, as it operated on such matrix slices of an array with ndims(A)==3.

BatchedTranspose{T, N, S} <: AbstractBatchedMatrix{T, N}
 BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

NNlib.batched_transposeFunction
batched_transpose(A::AbstractArray{T,3})
 batched_adjoint(A)

Equivalent to applying transpose or adjoint to each matrix A[:,:,k].

These exist to control how batched_mul behaves, as it operated on such matrix slices of an array with ndims(A)==3.

BatchedTranspose{T, N, S} <: AbstractBatchedMatrix{T, N}
-BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

+BatchedAdjoint{T, N, S}

Lazy wrappers analogous to Transpose and Adjoint, returned by batched_transpose

diff --git a/dev/models/recurrence/index.html b/dev/models/recurrence/index.html index 4818a777..2f66d8b4 100644 --- a/dev/models/recurrence/index.html +++ b/dev/models/recurrence/index.html @@ -39,4 +39,4 @@ m = Flux.Recur(rnn, h) y = m(x)

The Recur wrapper stores the state between runs in the m.state field.

If you use the RNN(10, 5) constructor – as opposed to RNNCell – you'll see that it's simply a wrapped cell.

julia> RNN(10, 5)
 Recur(RNNCell(10, 5, tanh))

Sequences

Often we want to work with sequences of inputs, rather than individual xs.

seq = [rand(10) for i = 1:10]

With Recur, applying our model to each element of a sequence is trivial:

m.(seq) # returns a list of 5-element vectors

This works even when we've chain recurrent layers into a larger model.

m = Chain(LSTM(10, 15), Dense(15, 5))
-m.(seq)

Finally, we can reset the hidden state of the cell back to its initial value using reset!(m).

+m.(seq)

Finally, we can reset the hidden state of the cell back to its initial value using reset!(m).

diff --git a/dev/models/regularisation/index.html b/dev/models/regularisation/index.html index e4f1b335..810948ab 100644 --- a/dev/models/regularisation/index.html +++ b/dev/models/regularisation/index.html @@ -36,4 +36,4 @@ julia> activations(c, rand(10)) Float32[0.5192045, 0.48079553] julia> sum(norm, ans) -2.1166067f0
Flux.activationsFunction
activations(c::Chain, input)

Calculate the forward results of each layers in Chain c with input as model input.

source
+2.1166067f0
Flux.activationsFunction
activations(c::Chain, input)

Calculate the forward results of each layers in Chain c with input as model input.

source
diff --git a/dev/performance/index.html b/dev/performance/index.html index 5b268162..850558ed 100644 --- a/dev/performance/index.html +++ b/dev/performance/index.html @@ -17,4 +17,4 @@ y_batch = reduce(hcat, ys) function loss_total(x_batch::Matrix, y_batch::Matrix) y_preds = model(x_batch) sum(loss.(y_preds, y_batch)) -end

When doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penalty, and will hit the optimised reduce method.

+end

When doing this kind of concatenation use reduce(hcat, xs) rather than hcat(xs...). This will avoid the splatting penalty, and will hit the optimised reduce method.

diff --git a/dev/saving/index.html b/dev/saving/index.html index 34af1dde..28fa6233 100644 --- a/dev/saving/index.html +++ b/dev/saving/index.html @@ -47,4 +47,4 @@ evalcb = throttle(30) do # Show loss @save "model-checkpoint.bson" model end

This will update the "model-checkpoint.bson" file every thirty seconds.

You can get more advanced by saving a series of models throughout training, for example

@save "model-$(now()).bson" model

will produce a series of models like "model-2018-03-06T02:57:10.41.bson". You could also store the current test set loss, so that it's easy to (for example) revert to an older copy of the model if it starts to overfit.

@save "model-$(now()).bson" model loss = testloss()

You can even store optimiser state alongside the model, to resume training exactly where you left off.

opt = ADAM()
-@save "model-$(now()).bson" model opt
+@save "model-$(now()).bson" model opt diff --git a/dev/search/index.html b/dev/search/index.html index bfb7390d..8728f90d 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -6,4 +6,4 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

Loading search...

    +

    Loading search...

      diff --git a/dev/training/optimisers/index.html b/dev/training/optimisers/index.html index feefed1d..46e7b911 100644 --- a/dev/training/optimisers/index.html +++ b/dev/training/optimisers/index.html @@ -27,8 +27,8 @@ end

      Running this will alter the parameters W and

      An optimiser update! accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass opt to our training loop, which will update all parameters of the model in a loop. However, we can now easily replace Descent with a more advanced optimiser such as ADAM.

      Optimiser Reference

      All optimisers return an object that, when passed to train!, will update the parameters passed to it.

      Flux.Optimise.update!Function
      update!(x, x̄)

      Update the array x according to x .-= x̄.

      source
      update!(opt, p, g)
      -update!(opt, ps::Params, gs)

      Perform an update step of the parameters ps (or the single parameter p) according to optimizer opt and the gradients gs (the gradient g).

      As a result, the parameters are mutated and the optimizer's internal state may change.

      source
      Flux.Optimise.DescentType
      Descent(η = 0.1)

      Classic gradient descent optimiser with learning rate η. For each parameter p and its gradient δp, this runs p -= η*δp

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.

      Examples

      opt = Descent()
      +end

      An optimiser update! accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass opt to our training loop, which will update all parameters of the model in a loop. However, we can now easily replace Descent with a more advanced optimiser such as ADAM.

      Optimiser Reference

      All optimisers return an object that, when passed to train!, will update the parameters passed to it.

      Flux.Optimise.update!Function
      update!(x, x̄)

      Update the array x according to x .-= x̄.

      source
      update!(opt, p, g)
      +update!(opt, ps::Params, gs)

      Perform an update step of the parameters ps (or the single parameter p) according to optimizer opt and the gradients gs (the gradient g).

      As a result, the parameters are mutated and the optimizer's internal state may change.

      source
      Flux.Optimise.DescentType
      Descent(η = 0.1)

      Classic gradient descent optimiser with learning rate η. For each parameter p and its gradient δp, this runs p -= η*δp

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.

      Examples

      opt = Descent()
       
       opt = Descent(0.3)
       
      @@ -38,29 +38,29 @@ gs = gradient(ps) do
           loss(x, y)
       end
       
      -Flux.Optimise.update!(opt, ps, gs)
      source
      Flux.Optimise.MomentumType
      Momentum(η = 0.01, ρ = 0.9)

      Gradient descent optimizer with learning rate η and momentum ρ.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Momentum (ρ): Controls the acceleration of gradient descent in the prominent direction, in effect dampening oscillations.

      Examples

      opt = Momentum()
      +Flux.Optimise.update!(opt, ps, gs)
      source
      Flux.Optimise.MomentumType
      Momentum(η = 0.01, ρ = 0.9)

      Gradient descent optimizer with learning rate η and momentum ρ.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Momentum (ρ): Controls the acceleration of gradient descent in the prominent direction, in effect dampening oscillations.

      Examples

      opt = Momentum()
       
      -opt = Momentum(0.01, 0.99)
      source
      Flux.Optimise.NesterovType
      Nesterov(η = 0.001, ρ = 0.9)

      Gradient descent optimizer with learning rate η and Nesterov momentum ρ.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Nesterov momentum (ρ): Controls the acceleration of gradient descent in the prominent direction, in effect dampening oscillations.

      Examples

      opt = Nesterov()
      +opt = Momentum(0.01, 0.99)
      source
      Flux.Optimise.NesterovType
      Nesterov(η = 0.001, ρ = 0.9)

      Gradient descent optimizer with learning rate η and Nesterov momentum ρ.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Nesterov momentum (ρ): Controls the acceleration of gradient descent in the prominent direction, in effect dampening oscillations.

      Examples

      opt = Nesterov()
       
      -opt = Nesterov(0.003, 0.95)
      source
      Flux.Optimise.RMSPropType
      RMSProp(η = 0.001, ρ = 0.9)

      Optimizer using the RMSProp algorithm. Often a good choice for recurrent networks. Parameters other than learning rate generally don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Momentum (ρ): Controls the acceleration of gradient descent in the prominent direction, in effect dampening oscillations.

      Examples

      opt = RMSProp()
      +opt = Nesterov(0.003, 0.95)
      source
      Flux.Optimise.RMSPropType
      RMSProp(η = 0.001, ρ = 0.9)

      Optimizer using the RMSProp algorithm. Often a good choice for recurrent networks. Parameters other than learning rate generally don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Momentum (ρ): Controls the acceleration of gradient descent in the prominent direction, in effect dampening oscillations.

      Examples

      opt = RMSProp()
       
      -opt = RMSProp(0.002, 0.95)
      source
      Flux.Optimise.ADAMType
      ADAM(η = 0.001, β::Tuple = (0.9, 0.999))

      ADAM optimiser.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = ADAM()
      +opt = RMSProp(0.002, 0.95)
      source
      Flux.Optimise.ADAMType
      ADAM(η = 0.001, β::Tuple = (0.9, 0.999))

      ADAM optimiser.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = ADAM()
       
      -opt = ADAM(0.001, (0.9, 0.8))
      source
      Flux.Optimise.RADAMType
      RADAM(η = 0.001, β::Tuple = (0.9, 0.999))

      Rectified ADAM optimizer.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = RADAM()
      +opt = ADAM(0.001, (0.9, 0.8))
      source
      Flux.Optimise.RADAMType
      RADAM(η = 0.001, β::Tuple = (0.9, 0.999))

      Rectified ADAM optimizer.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = RADAM()
       
      -opt = RADAM(0.001, (0.9, 0.8))
      source
      Flux.Optimise.AdaMaxType
      AdaMax(η = 0.001, β::Tuple = (0.9, 0.999))

      AdaMax is a variant of ADAM based on the ∞-norm.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = AdaMax()
      +opt = RADAM(0.001, (0.9, 0.8))
      source
      Flux.Optimise.AdaMaxType
      AdaMax(η = 0.001, β::Tuple = (0.9, 0.999))

      AdaMax is a variant of ADAM based on the ∞-norm.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = AdaMax()
       
      -opt = AdaMax(0.001, (0.9, 0.995))
      source
      Flux.Optimise.ADAGradType
      ADAGrad(η = 0.1)

      ADAGrad optimizer. It has parameter specific learning rates based on how frequently it is updated. Parameters don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.

      Examples

      opt = ADAGrad()
      +opt = AdaMax(0.001, (0.9, 0.995))
      source
      Flux.Optimise.ADAGradType
      ADAGrad(η = 0.1)

      ADAGrad optimizer. It has parameter specific learning rates based on how frequently it is updated. Parameters don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.

      Examples

      opt = ADAGrad()
       
      -opt = ADAGrad(0.001)
      source
      Flux.Optimise.ADADeltaType
      ADADelta(ρ = 0.9)

      ADADelta is a version of ADAGrad adapting its learning rate based on a window of past gradient updates. Parameters don't need tuning.

      Parameters

      • Rho (ρ): Factor by which the gradient is decayed at each time step.

      Examples

      opt = ADADelta()
      +opt = ADAGrad(0.001)
      source
      Flux.Optimise.ADADeltaType
      ADADelta(ρ = 0.9)

      ADADelta is a version of ADAGrad adapting its learning rate based on a window of past gradient updates. Parameters don't need tuning.

      Parameters

      • Rho (ρ): Factor by which the gradient is decayed at each time step.

      Examples

      opt = ADADelta()
       
      -opt = ADADelta(0.89)
      source
      Flux.Optimise.AMSGradType
      AMSGrad(η = 0.001, β::Tuple = (0.9, 0.999))

      The AMSGrad version of the ADAM optimiser. Parameters don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = AMSGrad()
      +opt = ADADelta(0.89)
      source
      Flux.Optimise.AMSGradType
      AMSGrad(η = 0.001, β::Tuple = (0.9, 0.999))

      The AMSGrad version of the ADAM optimiser. Parameters don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = AMSGrad()
       
      -opt = AMSGrad(0.001, (0.89, 0.995))
      source
      Flux.Optimise.NADAMType
      NADAM(η = 0.001, β::Tuple = (0.9, 0.999))

      NADAM is a Nesterov variant of ADAM. Parameters don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = NADAM()
      +opt = AMSGrad(0.001, (0.89, 0.995))
      source
      Flux.Optimise.NADAMType
      NADAM(η = 0.001, β::Tuple = (0.9, 0.999))

      NADAM is a Nesterov variant of ADAM. Parameters don't need tuning.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.

      Examples

      opt = NADAM()
       
      -opt = NADAM(0.002, (0.89, 0.995))
      source
      Flux.Optimise.ADAMWFunction
      ADAMW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)

      ADAMW is a variant of ADAM fixing (as in repairing) its weight decay regularization.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.
      • decay: Decay applied to weights during optimisation.

      Examples

      opt = ADAMW()
      +opt = NADAM(0.002, (0.89, 0.995))
      source
      Flux.Optimise.ADAMWFunction
      ADAMW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)

      ADAMW is a variant of ADAM fixing (as in repairing) its weight decay regularization.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • Decay of momentums (β::Tuple): Exponential decay for the first (β1) and the second (β2) momentum estimate.
      • decay: Decay applied to weights during optimisation.

      Examples

      opt = ADAMW()
       
      -opt = ADAMW(0.001, (0.89, 0.995), 0.1)
      source

      Optimiser Interface

      Flux's optimisers are built around a struct that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the apply! function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.

      In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.

      mutable struct Momentum
      +opt = ADAMW(0.001, (0.89, 0.995), 0.1)
      source

      Optimiser Interface

      Flux's optimisers are built around a struct that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the apply! function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.

      In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.

      mutable struct Momentum
         eta
         rho
         velocity
      @@ -88,4 +88,4 @@ end
       
       loss(rand(10)) # around 0.9

      In this manner it is possible to compose optimisers for some added flexibility.

      Decays

      Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.

      Flux.Optimise.ExpDecayType
      ExpDecay(η = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4)

      Discount the learning rate η by the factor decay every decay_step steps till a minimum of clip.

      Parameters

      • Learning rate (η): Amount by which gradients are discounted before updating the weights.
      • decay: Factor by which the learning rate is discounted.
      • decay_step: Schedule decay operations by setting the number of steps between two decay operations.
      • clip: Minimum value of learning rate.

      Examples

      To apply exponential decay to an optimiser:

      Optimiser(ExpDecay(..), Opt(..))
       
      -opt = Optimiser(ExpDecay(), ADAM())
      source
      Flux.Optimise.InvDecayType
      InvDecay(γ = 0.001)

      Apply inverse time decay to an optimiser, so that the effective step size at iteration n is eta / (1 + γ * n) where eta is the initial step size. The wrapped optimiser's step size is not modified.

      Examples

      Optimiser(InvDecay(..), Opt(..))
      source
      Flux.Optimise.WeightDecayType
      WeightDecay(wd = 0)

      Decay weights by wd.

      Parameters

      • Weight decay (wd)
      source
      +opt = Optimiser(ExpDecay(), ADAM())source
      Flux.Optimise.InvDecayType
      InvDecay(γ = 0.001)

      Apply inverse time decay to an optimiser, so that the effective step size at iteration n is eta / (1 + γ * n) where eta is the initial step size. The wrapped optimiser's step size is not modified.

      Examples

      Optimiser(InvDecay(..), Opt(..))
      source
      Flux.Optimise.WeightDecayType
      WeightDecay(wd = 0)

      Decay weights by wd.

      Parameters

      • Weight decay (wd)
      source
      diff --git a/dev/training/training/index.html b/dev/training/training/index.html index 70e014df..f69d4bb2 100644 --- a/dev/training/training/index.html +++ b/dev/training/training/index.html @@ -6,7 +6,7 @@ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) ga('create', 'UA-36890222-9', 'auto'); ga('send', 'pageview', {'page': location.pathname + location.search + location.hash}); -

      Training

      To actually train a model we need four things:

      • A objective function, that evaluates how well a model is doing given some input data.
      • The trainable parameters of the model.
      • A collection of data points that will be provided to the objective function.
      • An optimiser that will update the model parameters appropriately.

      With these we can call train!:

      Flux.Optimise.train!Function
      train!(loss, params, data, opt; cb)

      For each datapoint d in data compute the gradient of loss(d...) through backpropagation and call the optimizer opt.

      In case datapoints d are of numeric array type, assume no splatting is needed and compute the gradient of loss(d).

      A callback is given with the keyword argument cb. For example, this will print "training" every 10 seconds (using Flux.throttle):

      train!(loss, params, data, opt, cb = throttle(() -> println("training"), 10))

      The callback can call Flux.stop to interrupt the training loop.

      Multiple optimisers and callbacks can be passed to opt and cb as arrays.

      source

      There are plenty of examples in the model zoo.

      Loss Functions

      The objective function must return a number representing how far the model is from its target – the loss of the model. The loss function that we defined in basics will work as an objective. We can also define an objective in terms of some model:

      m = Chain(
      +

      Training

      To actually train a model we need four things:

      • A objective function, that evaluates how well a model is doing given some input data.
      • The trainable parameters of the model.
      • A collection of data points that will be provided to the objective function.
      • An optimiser that will update the model parameters appropriately.

      With these we can call train!:

      Flux.Optimise.train!Function
      train!(loss, params, data, opt; cb)

      For each datapoint d in data compute the gradient of loss(d...) through backpropagation and call the optimizer opt.

      In case datapoints d are of numeric array type, assume no splatting is needed and compute the gradient of loss(d).

      A callback is given with the keyword argument cb. For example, this will print "training" every 10 seconds (using Flux.throttle):

      train!(loss, params, data, opt, cb = throttle(() -> println("training"), 10))

      The callback can call Flux.stop to interrupt the training loop.

      Multiple optimisers and callbacks can be passed to opt and cb as arrays.

      source

      There are plenty of examples in the model zoo.

      Loss Functions

      The objective function must return a number representing how far the model is from its target – the loss of the model. The loss function that we defined in basics will work as an objective. We can also define an objective in terms of some model:

      m = Chain(
         Dense(784, 32, σ),
         Dense(32, 10), softmax)
       
      @@ -36,7 +36,7 @@ julia> @epochs 2 Flux.train!(...)
       [ Info: Epoch 1
       hello
       [ Info: Epoch 2
      -hello
      source

      Callbacks

      train! takes an additional argument, cb, that's used for callbacks so that you can observe the training process. For example:

      train!(objective, ps, data, opt, cb = () -> println("training"))

      Callbacks are called for every batch of training data. You can slow this down using Flux.throttle(f, timeout) which prevents f from being called more than once every timeout seconds.

      A more typical callback might look like this:

      test_x, test_y = # ... create single batch of test data ...
      +hello
      source

      Callbacks

      train! takes an additional argument, cb, that's used for callbacks so that you can observe the training process. For example:

      train!(objective, ps, data, opt, cb = () -> println("training"))

      Callbacks are called for every batch of training data. You can slow this down using Flux.throttle(f, timeout) which prevents f from being called more than once every timeout seconds.

      A more typical callback might look like this:

      test_x, test_y = # ... create single batch of test data ...
       evalcb() = @show(loss(test_x, test_y))
       
       Flux.train!(objective, ps, data, opt,
      @@ -55,4 +55,4 @@ end

      You could simplify this further, for example by hard-coding in the loss function.

      +end

      You could simplify this further, for example by hard-coding in the loss function.

      diff --git a/dev/utilities/index.html b/dev/utilities/index.html index 63a91dfb..cc323673 100644 --- a/dev/utilities/index.html +++ b/dev/utilities/index.html @@ -24,7 +24,7 @@ julia> Flux.unsqueeze([1 2; 3 4], 2) [:, :, 2] = 2 - 4source
      Flux.stackFunction
      stack(xs, dim)

      Concatenate the given Array of Arrays xs into a single Array along the given dimension dim.

      Examples

      julia> xs = [[1, 2], [3, 4], [5, 6]]
      + 4
      source
      Flux.stackFunction
      stack(xs, dim)

      Concatenate the given Array of Arrays xs into a single Array along the given dimension dim.

      Examples

      julia> xs = [[1, 2], [3, 4], [5, 6]]
       3-element Array{Array{Int64,1},1}:
        [1, 2]
        [3, 4]
      @@ -40,12 +40,12 @@ julia> cat(xs, dims=1)
       3-element Array{Array{Int64,1},1}:
        [1, 2]
        [3, 4]
      - [5, 6]
      source
      Flux.unstackFunction
      unstack(xs, dim)

      Unroll the given xs into an Array of Arrays along the given dimension dim.

      Examples

      julia> Flux.unstack([1 3 5 7; 2 4 6 8], 2)
      + [5, 6]
      source
      Flux.unstackFunction
      unstack(xs, dim)

      Unroll the given xs into an Array of Arrays along the given dimension dim.

      Examples

      julia> Flux.unstack([1 3 5 7; 2 4 6 8], 2)
       4-element Array{Array{Int64,1},1}:
        [1, 2]
        [3, 4]
        [5, 6]
      - [7, 8]
      source
      Flux.chunkFunction
      chunk(xs, n)

      Split xs into n parts.

      Examples

      julia> Flux.chunk(1:10, 3)
      + [7, 8]
      source
      Flux.chunkFunction
      chunk(xs, n)

      Split xs into n parts.

      Examples

      julia> Flux.chunk(1:10, 3)
       3-element Array{UnitRange{Int64},1}:
        1:4
        5:8
      @@ -55,18 +55,18 @@ julia> Flux.chunk(collect(1:10), 3)
       3-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
        [1, 2, 3, 4]
        [5, 6, 7, 8]
      - [9, 10]
      source
      Flux.frequenciesFunction
      frequencies(xs)

      Count the number of times that each element of xs appears.

      Examples

      julia> Flux.frequencies(['a','b','b'])
      + [9, 10]
      source
      Flux.frequenciesFunction
      frequencies(xs)

      Count the number of times that each element of xs appears.

      Examples

      julia> Flux.frequencies(['a','b','b'])
       Dict{Char,Int64} with 2 entries:
         'a' => 1
      -  'b' => 2
      source
      Flux.batchFunction
      batch(xs)

      Batch the arrays in xs into a single array.

      Examples

      julia> Flux.batch([[1,2,3],[4,5,6]])
      +  'b' => 2
      source
      Flux.batchFunction
      batch(xs)

      Batch the arrays in xs into a single array.

      Examples

      julia> Flux.batch([[1,2,3],[4,5,6]])
       3×2 Array{Int64,2}:
        1  4
        2  5
      - 3  6
      source
      Flux.batchseqFunction
      batchseq(seqs, pad)

      Take a list of N sequences, and turn them into a single sequence where each item is a batch of N. Short sequences will be padded by pad.

      Examples

      julia> Flux.batchseq([[1, 2, 3], [4, 5]], 0)
      + 3  6
      source
      Flux.batchseqFunction
      batchseq(seqs, pad)

      Take a list of N sequences, and turn them into a single sequence where each item is a batch of N. Short sequences will be padded by pad.

      Examples

      julia> Flux.batchseq([[1, 2, 3], [4, 5]], 0)
       3-element Array{Array{Int64,1},1}:
        [1, 4]
        [2, 5]
      - [3, 0]
      source
      Base.rpadMethod

      Return the given sequence padded with p up to a maximum length of n.

      Examples

      julia> rpad([1, 2], 4, 0)
      + [3, 0]
      source
      Base.rpadMethod

      Return the given sequence padded with p up to a maximum length of n.

      Examples

      julia> rpad([1, 2], 4, 0)
       4-element Array{Int64,1}:
        1
        2
      @@ -77,15 +77,15 @@ julia> rpad([1, 2, 3], 2, 0)
       3-element Array{Int64,1}:
        1
        2
      - 3
      source

      Layer Initialization

      These are primarily useful if you are planning to write your own layers. Flux initializes convolutional layers and recurrent cells with glorot_uniform by default. To change the default on an applicable layer, pass the desired function with the init keyword. For example:

      julia> conv = Conv((3, 3), 1 => 8, relu; init=Flux.glorot_normal)
      + 3
      source

      Layer Initialization

      These are primarily useful if you are planning to write your own layers. Flux initializes convolutional layers and recurrent cells with glorot_uniform by default. To change the default on an applicable layer, pass the desired function with the init keyword. For example:

      julia> conv = Conv((3, 3), 1 => 8, relu; init=Flux.glorot_normal)
       Conv((3, 3), 1=>8, relu)
      Flux.glorot_uniformFunction
      glorot_uniform(dims...)

      Return an Array of size dims containing random variables taken from a uniform distribution in the interval $[-x, x]$, where x = sqrt(24 / sum(dims)) / 2.

      Examples

      julia> Flux.glorot_uniform(2, 3)
       2×3 Array{Float32,2}:
        0.601094  -0.57414   -0.814925
      - 0.900868   0.805994   0.057514
      source
      Flux.glorot_normalFunction
      glorot_normal(dims...)

      Return an Array of size dims containing random variables taken from a normal distribution with mean 0 and standard deviation sqrt(2 / sum(dims)).

      Examples

      julia> Flux.glorot_normal(3, 2)
      + 0.900868   0.805994   0.057514
      source
      Flux.glorot_normalFunction
      glorot_normal(dims...)

      Return an Array of size dims containing random variables taken from a normal distribution with mean 0 and standard deviation sqrt(2 / sum(dims)).

      Examples

      julia> Flux.glorot_normal(3, 2)
       3×2 Array{Float32,2}:
         0.429505  -0.0852891
         0.523935   0.371009
      - -0.223261   0.188052
      source

      Model Abstraction

      Flux.destructureFunction
      destructure(m)

      Flatten a model's parameters into a single weight vector.

      julia> m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
      + -0.223261   0.188052
      source

      Model Abstraction

      Flux.destructureFunction
      destructure(m)

      Flatten a model's parameters into a single weight vector.

      julia> m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
       Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
       
       julia> θ, re = destructure(m);
      @@ -94,6 +94,6 @@ julia> θ
       67-element Array{Float32,1}:
       -0.1407104
       ...

      The second return value re allows you to reconstruct the original network after making modifications to the weight vector (for example, with a hypernetwork).

      julia> re(θ .* 2)
      -Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
      source

      Callback Helpers

      Flux.throttleFunction
      throttle(f, timeout; leading=true, trailing=false)

      Return a function that when invoked, will only be triggered at most once during timeout seconds.

      Normally, the throttled function will run as much as it can, without ever going more than once per wait duration; but if you'd like to disable the execution on the leading edge, pass leading=false. To enable execution on the trailing edge, pass trailing=true.

      source
      Flux.Optimise.stopFunction
      stop()

      Call Flux.stop() in a callback to indicate when a callback condition is met. This will trigger the train loop to stop and exit.

      Examples

      cb = function ()
      +Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
      source

      Callback Helpers

      Flux.throttleFunction
      throttle(f, timeout; leading=true, trailing=false)

      Return a function that when invoked, will only be triggered at most once during timeout seconds.

      Normally, the throttled function will run as much as it can, without ever going more than once per wait duration; but if you'd like to disable the execution on the leading edge, pass leading=false. To enable execution on the trailing edge, pass trailing=true.

      source
      Flux.Optimise.stopFunction
      stop()

      Call Flux.stop() in a callback to indicate when a callback condition is met. This will trigger the train loop to stop and exit.

      Examples

      cb = function ()
         accuracy() > 0.9 && Flux.stop()
      -end
      source
      +endsource