Compare commits

...

356 Commits

Author SHA1 Message Date
bors[bot] 7035ee9bea
Merge #1238
1238: Fix inline code block r=dhairyagandhi96 a=harryscholes

### PR Checklist

- [ ] Tests are added
- [ ] Entry in NEWS.md
- [x] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: harryscholes <harryscholes@gmail.com>
2020-06-19 08:28:41 +00:00
harryscholes 57efd7fead Fix inline code block 2020-06-19 09:24:44 +01:00
bors[bot] 19b45b49d3
Merge #1221
1221: DataLoader with NamedTuple r=CarloLucibello a=cossio

Just a couple of small changes, so that `DataLoader` can be created with a `NamedTuple` of tensors instead of `Tuple`. This way the tensors can be referred to by name. For example

```
train_loader = DataLoader((images = Xtrain, labels = Ytrain), batchsize=16)
batch = first(train_loader)
y = model(batch.images)
logitcrossentropy(y, batch.labels)
```

If we only use tuples, then in datasets with multiple tensors one has to be careful about the order in which the tensors are fed into the `DataLoader` constructor and be consistent with this elsewhere. With `NamedTuples` one just have to be consistent about the names used, which I think is a minor improvement.

CC @CarloLucibello 

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md
- [x] Documentation, if applicable

I don't think this qualifies as an API change. It's just a minor feature addition. So final review probably not required.

- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: cossio <j.cossio.diaz@gmail.com>
Co-authored-by: cossio <cossio@users.noreply.github.com>
2020-06-16 17:21:28 +00:00
bors[bot] 254e4a7058
Merge #1231
1231: use `ntuple` in conv r=MikeInnes a=MikeInnes

This is the right abstraction over `map`, and in particular is a bit easier to compile away in some cases. 

As this is a trivial change from Flux's perspective it's not easy to test here, but there are downstream tests in XLA.jl.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-06-16 13:04:20 +00:00
Mike J Innes 9f931dd7fa use `ntuple` in conv 2020-06-16 14:02:24 +01:00
cossio 9078f85096 revert selectdim
selectdim can lead to type instability, see https://discourse.julialang.org/t/why-selectdim-is-type-instable/25271/5
2020-06-16 13:32:27 +02:00
cossio 1dbaf32810 DataLoader type inference tests 2020-06-16 13:32:27 +02:00
cossio cb34bb848b simplify _getobs 2020-06-16 13:32:27 +02:00
cossio 75692161a7 Apply suggestions from code review
accept suggested changes

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-06-16 13:32:27 +02:00
cossio 909a55ac10 news and docs 2020-06-16 13:32:27 +02:00
cossio 02ee6ba426 DataLoader with NamedTuple 2020-06-16 13:31:29 +02:00
bors[bot] 97406507fd
Merge #1218
1218: Require weight and bias to be AbstractArrays r=CarloLucibello a=oxinabox

closes #1199
While in theory someone could be using Dense with weights and biases that are not abstract arrays, I would be surprised.
So allowing it is just leaving a food-gun laying around.
If it is common then we can instead close #1199 by adding a special constructor for `Number` subtypes that error if they are not integers, or something a long those lines.

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md

I think this is a bug-fix thus the following are not required:

- [ ] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: Lyndon White <lyndon.white@invenialabs.co.uk>
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
2020-06-15 15:21:21 +00:00
Lyndon White e61787c1c8
Update test/layers/basic.jl 2020-06-12 13:58:10 +01:00
Lyndon White 601f842eaf
bonus test 2020-06-11 23:17:40 +01:00
bors[bot] 99ec30c8c2
Merge #1220
1220: CompatHelper: bump compat for "Adapt" to "2.0" r=CarloLucibello a=github-actions[bot]

This pull request changes the compat entry for the `Adapt` package from `1` to `1, 2.0`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-06-11 09:54:46 +00:00
github-actions[bot] fbfc973011 CompatHelper: bump compat for "Adapt" to "2.0" 2020-06-11 00:18:47 +00:00
Lyndon White a1623aca76
move into 0.11 news 2020-06-10 12:39:00 +01:00
Lyndon White 15c7354c4e
Make release as DEV 2020-06-10 12:38:33 +01:00
Lyndon White 97b0aa4d36 bump version 2020-06-10 12:14:47 +01:00
Lyndon White cf90517a8a update news.md 2020-06-10 12:14:19 +01:00
Lyndon White df84628c29 Require weight and bias to be AbstractArrays 2020-06-10 12:06:57 +01:00
bors[bot] e1f80d4627
Merge #1213
1213: Fixing indentation in train! docstring r=CarloLucibello a=natema

One code block is not correctly displayed in the doc of [Flux.Optimise.train!
](https://fluxml.ai/Flux.jl/stable/training/training/#Flux.Optimise.train!). 
Based on the previous code block, I guess it's an indentation problem.


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-08 18:29:46 +00:00
bors[bot] a7bbd3d35b
Merge #1152
1152: extend dataloader r=CarloLucibello a=CarloLucibello

cfr discussion in #1149. Currently DataLoader interface supports

1. `for x in DataLoader(X)`
2. `for (x, y) in DataLoader(X, Y)`

This PR adds

3. `for (x,) in DataLoader((X,))`
4. `for (x, y) in DataLoader((X, Y))`

Edit:
the constructor in 2. is removed in this PR

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-06-08 18:01:06 +00:00
CarloLucibello 0cf46432cf cleanup 2020-06-08 19:59:34 +02:00
natema 70bbf18180
Fixing indentation in train! docstring
One code block is not correctly displayed in the doc of [Flux.Optimise.train!
](https://fluxml.ai/Flux.jl/stable/training/training/#Flux.Optimise.train!). 
Based on the previous code block, I guess it's an indentation problem.
2020-06-07 15:44:04 +02:00
bors[bot] d9b07475b0
Merge #1129
1129: Added dropgrad in huber_loss r=CarloLucibello a=HenriDeh

Workaround to prevent `iterate(::nothing)` when working with CuArrays. See issue #1128

Co-authored-by: HenriDeh <47037088+HenriDeh@users.noreply.github.com>
2020-06-06 17:21:19 +00:00
bors[bot] 9ebbe8cb4c
Merge #1141
1141: Speedup matmul of CuMatrix and OneHotMatrix r=CarloLucibello a=AStupidBear

This solves #189.

```julia
julia> using Flux


julia> using Flux: CuArrays

julia> A = zeros(300, 10000) |> gpu;

julia> B = Flux.onehotbatch(rand(1:10000, 256), 1:10000) |> gpu;

julia> A * B; CuArrays.@time A * B;
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/shared/.julia/packages/GPUArrays/OXvxB/src/host/indexing.jl:43
  0.002824 seconds (951 CPU allocations: 38.156 KiB) (2 GPU allocations: 301.000 KiB, 2.32% gc time of which 46.42% spent allocating)

julia> import Base: *

julia> A::AbstractMatrix * B::Flux.OneHotMatrix = @inbounds A[:, map(x->x.ix, B.data)]
* (generic function with 522 methods)

julia> A * B; CuArrays.@time A * B;
  0.000343 seconds (169 CPU allocations: 5.000 KiB) (2 GPU allocations: 301.000 KiB, 15.53% gc time of which 65.97% spent allocating)
```

Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-06-06 17:00:01 +00:00
CarloLucibello b1f226eb34 add news 2020-06-06 18:15:04 +02:00
CarloLucibello a643cb6758 extend dataloader 2020-06-06 18:02:03 +02:00
bors[bot] 792a1c54f8
Merge #1211
1211: Fixing syntax in onehot docstring r=CarloLucibello a=natema

`otherwise, it will error` -> `otherwise, it will raise an error`


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-06 15:02:40 +00:00
natema 8f6aed5770
Fixing syntax in onehot docstring
`otherwise, it will error` -> `otherwise, it will raise an error`
2020-06-05 18:20:50 +02:00
bors[bot] 22d5e318e5
Merge #1192
1192: Improve `restructure` performance r=dhairyagandhi96 a=MikeInnes

A small change, but it significantly improves the performance on the following test case:

```julia
julia> VERSION
v"1.5.0-DEV.876"

julia> using Flux, DiffEqFlux, BenchmarkTools

julia> using Flux: mse

julia> fastdense = FastDense(784, 32, tanh);

julia> p = initial_params(fastdense);

julia> dense = Dense(784, 32, tanh);

julia> p,re = Flux.destructure(dense);

julia> x = rand(Float32, 784, 10);

julia> y = rand(Float32, 32, 10);

julia> @btime gradient((x,p) -> mse(fastdense(x, p), y), x, p);
  505.530 μs (87 allocations: 240.73 KiB)

julia> @btime gradient((x,p) -> mse(re(p)(x), y), x, p);
  107.796 μs (139 allocations: 340.94 KiB)
```

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-06-05 14:53:11 +00:00
bors[bot] 71ebd51e45
Merge #1208
1208: Fixing output format for `onehot` r=dhairyagandhi96 a=natema

Currently `Flux.OneHotVector` is displayed as a binary vector (0/1) rather than a boolean one (true/false). This is also shown in successive examples in the same page. 
I fixed the `onehot(:b, [:a, :b, :c])` and `onehot(:c, [:a, :b, :c])` outputs in the first example of the page accordingly.


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-05 09:17:12 +00:00
bors[bot] b5a73f8532
Merge #1207
1207: Fixing typo in docs r=dhairyagandhi96 a=natema

`what ever` -> `whatever`


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-05 09:00:06 +00:00
natema 48d6f2d0c0
Fixing output format for `onehot`
`Flux.OneHotVector` is displayed as a binary vector (0/1) rather than a boolean (true/false) one, as is also shown in successive examples in the same page, so I fixed the `onehot(:b, [:a, :b, :c])` and `onehot(:c, [:a, :b, :c])` output as given by the current Julia version 1.4.2.
2020-06-03 17:03:08 +02:00
natema 2c4b1e521e
Fixing typo in docs
`what ever` -> `whatever`
2020-06-02 19:20:41 +02:00
bors[bot] ca1b1b2c7c
Merge #1206
1206: Fixing ambiguous remark in Preserve inputs' types r=dhairyagandhi96 a=natema

This PR is based on the [discussion in the forum](https://discourse.julialang.org/t/not-clear-what-0-01f0x-is-in-the-flux-docs/40553?u=mathematics) on the ambiguity of `0.01f0x` in the line
> While one could change the activation function (e.g. to use `0.01f0x`)

Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-02 17:09:58 +00:00
natema a24f46b606
Fixing ambiguous remark in Preserve inputs' types
This PR is based on the [discussion in the forum](https://discourse.julialang.org/t/not-clear-what-0-01f0x-is-in-the-flux-docs/40553?u=mathematics) on the ambiguity of `0.01f0x` in the line
> While one could change the activation function (e.g. to use `0.01f0x`)
2020-06-02 18:48:07 +02:00
Mike J Innes 089ec0832c improved restructure adjoint 2020-05-27 12:28:22 +01:00
bors[bot] ddd0f4e747
Merge #1191
1191: Pull Request Template r=MikeInnes a=MikeInnes

Hopefully makes it a little clearer what the requirements are, which will lead to easier review, and encourage things like NEWS.md that we want to be better in sync.

cc @dhairyagandhi96 and @CarloLucibello for thoughts.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-27 11:15:26 +00:00
Mike J Innes e10818bbad
Update pull_request_template.md 2020-05-27 12:12:13 +01:00
Mike J Innes 8c3a80c940
Create pull_request_template.md 2020-05-26 12:52:28 +01:00
bors[bot] 85c39e2309
Merge #1190
1190: Correcting advanced.md r=dhairyagandhi96 a=Sleort

To make the example consistent, it should be 
```
julia> Flux.trainable(a::Affine) = (a.W,)
```
not
```
julia> Flux.trainable(a::Affine) = (a.W, a.b)
```

Co-authored-by: Troels Arnfred Bojesen <tr-ab@online.no>
2020-05-25 14:47:42 +00:00
Troels Arnfred Bojesen 17bb00a3fa
Correcting advanced.md
To make the example consistent, it should be 
```
julia> Flux.trainable(a::Affine) = (a.W,)
```
not
```
julia> Flux.trainable(a::Affine) = (a.W, a.b)
```
2020-05-25 23:33:09 +09:00
bors[bot] bd152ca099
Merge #1177
1177: Align ExpDecay implementation with documentation r=dhairyagandhi96 a=DrChainsaw

Fix for #1176 



Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
2020-05-21 14:33:20 +00:00
bors[bot] f343172daf
Merge #1185
1185: Add some news r=dhairyagandhi96 a=dhairyagandhi96

cc @CarloLucibello please add to this list as well

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-21 12:46:39 +00:00
bors[bot] 472e1fbf5e
Merge #957
957: Add some gradient checking tests on GPUs r=dhairyagandhi96 a=dhairyagandhi96

Good to add generic tests for tracking gradients through the various layers on the GPU.

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
2020-05-21 12:25:53 +00:00
Dhairya Gandhi 0801064d50 add comment on broken layers 2020-05-20 00:11:38 +05:30
Dhairya Gandhi c4409fa6d1 clearing failures 2020-05-19 23:54:18 +05:30
bors[bot] 87ba651add
Merge #1165
1165: Fix docstring of logitcrossentropy r=dhairyagandhi96 a=cossio

Since `y` is a logit, there is no log (see the diff).

Co-authored-by: cossio <cossio@users.noreply.github.com>
2020-05-19 11:07:15 +00:00
Dhairya Gandhi 55430e207d add news 2020-05-19 16:34:28 +05:30
bors[bot] 0b10f1a8df
Merge #1184
1184: Add some functions to docs r=dhairyagandhi96 a=dhairyagandhi96



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-18 21:10:46 +00:00
DrChainsaw 9a24ee0bd7 Change intendation to 2 spaces 2020-05-18 21:52:40 +02:00
Dhairya Gandhi bdfe567519 add some layers to docs 2020-05-18 23:53:11 +05:30
bors[bot] b6a5dd7152
Merge #1133
1133: add ClipValue and ClipNorm r=CarloLucibello a=AStupidBear



Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-05-15 17:15:07 +00:00
Yao Lu 007586858c fix export merge conflict 2020-05-14 17:13:35 +08:00
Dhairya Gandhi fab53e0a01
Merge pull request #1179 from FluxML/compathelper/new_version/2020-05-13-00-13-17-919-1190174363
CompatHelper: add new compat entry for "Functors" at version "0.1"
2020-05-13 11:27:40 +05:30
github-actions[bot] 3fa9e91c41 CompatHelper: add new compat entry for "Functors" at version "0.1" 2020-05-13 00:13:46 +00:00
DrChainsaw e8433d0abe Align ExpDecay implementation with documentation 2020-05-12 22:50:17 +02:00
bors[bot] de39d1095b
Merge #1175
1175: xlogy broadcast adjoint r=MikeInnes a=MikeInnes

This is helpful for performance, since it avoids having to differentiate `xlogy` itself inside of a map.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-12 17:10:58 +00:00
Mike J Innes f5a8900ffb xlogy broadcast adjoint 2020-05-12 17:29:35 +01:00
Mike J Innes bd43201f37
fix logitcrossentropy doc string 2020-05-12 16:18:29 +01:00
bors[bot] a84e08cf28
Merge #1174
1174: Functors r=MikeInnes a=MikeInnes

Just splits out the implementation to the [Functors](https://github.com/FluxML/Functors.jl) package, so the same traits can be used elsewhere (e.g. Optimisers.jl) without depending on all of Flux.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-12 14:39:08 +00:00
Mike J Innes 22d29c9bfd released functors.jl 2020-05-12 15:33:14 +01:00
Dhairya Gandhi 36d3a9ce99
Merge pull request #1162 from aminya/patch-5
Update CompatHelper.yml
2020-05-10 14:21:14 +05:30
Yao Lu 5a9eb7411a cpu 2020-05-10 14:39:48 +08:00
Yao Lu 888f286c51 use @inbounds 2020-05-09 19:40:46 +08:00
Yao Lu 63cb70dd23 remove importing CuMatrix 2020-05-09 19:13:52 +08:00
Yao Lu 30648910c8 transfer onehot indices back to cpu 2020-05-09 19:10:46 +08:00
Yao Lu d1ad8db625 add to docs 2020-05-09 16:40:26 +08:00
bors[bot] d89ee6cdba
Merge #1167
1167: Update basics.md r=dhairyagandhi96 a=mipals

Removing superfluous ```using Flux```

Co-authored-by: Mikkel Paltorp Schmitt <mikkel.paltorp@gmail.com>
2020-05-08 11:38:22 +00:00
bors[bot] 0287abbf66
Merge #1166
1166: Fix crossentropy when some probabilities are zero r=dhairyagandhi96 a=cossio

Use a function `xlogy(x,y) = x * log(y)` that has the correct limit at `x=0`.

Before this PR:

```julia
julia> Flux.crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9])
NaN
```

After this PR:

```julia
julia> Flux.crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9])
0.3250829733914482
```

Co-authored-by: cossio <j.cossio.diaz@gmail.com>
2020-05-08 11:14:31 +00:00
cossio 17f54e4c6f bump version 2020-05-08 12:57:34 +02:00
cossio feb72d400a NaN 2020-05-07 12:44:32 +02:00
cossio 86d6555269 cufunc 2020-05-07 09:58:33 +02:00
Mikkel Paltorp Schmitt 40efa9df49
Update basics.md
Removing superfluous ```using Flux```
2020-05-06 13:41:56 +02:00
cossio 8314200c51 generic 2020-05-05 19:23:05 +02:00
cossio 06c1e20372 add tests 2020-05-05 19:05:04 +02:00
cossio 480473a81b xlogy 2020-05-05 18:33:50 +02:00
cossio 9e1fd883d5
Fix docstring of logitbinarycrossentropy and logitcrossentropy 2020-05-05 16:29:29 +02:00
Amin Yahyaabadi 70f76fd6db
Update CompatHelper.yml 2020-05-05 07:11:22 -05:00
bors[bot] c444226db5
Merge #1160
1160: Build docs on Julia 1.3 r=dhairyagandhi96 a=dhairyagandhi96

This causes red CI otherwise

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-04 12:59:25 +00:00
Dhairya Gandhi f2c66579ec yaml syntax fix 2020-05-04 18:01:33 +05:30
Dhairya Gandhi fc464f5ef8 build docs on Julia 1.3 2020-05-04 17:54:04 +05:30
bors[bot] 1e2476b3c2
Merge #1156
1156: Add correct overload for apply! in docs r=dhairyagandhi96 a=dhairyagandhi96

Maybe we should considering adding a `const` name that is better than `apply!` (or rename `apply!`) and export it, so folks can just overload `descriptive_apply_my_optimiser_rule!` rather than have to go to the sub-project `Flux.Optimise`?

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-04 06:01:23 +00:00
Dhairya Gandhi d6a1ccd354 add correct overload for apply in docs 2020-05-03 16:56:39 +05:30
bors[bot] 5d9acc7e73
Merge #873
873: Make bias optional r=MikeInnes a=dhairyagandhi96

Addresses #868 



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-01 13:28:15 +00:00
Mike J Innes 8f877f2dbf quick fix 2020-05-01 14:22:46 +01:00
Dhairya Gandhi 29215fa5d7 comment on possible future deprecations 2020-04-29 16:17:44 +05:30
Dhairya Gandhi 534809ae78 move zeros to its own file 2020-04-29 16:15:35 +05:30
Dhairya Gandhi 5086c0f4f0 merge conflicts 2020-04-29 16:11:39 +05:30
Yao Lu 114f63a214 norm(Δ) 2020-04-26 17:28:07 +08:00
Yao Lu eb6898ea19 speedup matmul of CuMatrix and OneHotMatrix 2020-04-25 23:22:46 +08:00
Yao Lu 7d6f711c6f Merge branch 'master' into clip 2020-04-25 22:18:58 +08:00
bors[bot] 9237cdaf5b
Merge #901
901: Add option for "Same" padding to conv and pooling layers r=dhairyagandhi96 a=DrChainsaw

Fixes #813 

This adds the possibility to set "pad=SamePad()" to automatically calculate the amount of padding to apply so that outputsize==inputsize (assuming stide == 1).

Comments on API more than welcome. I considered the following options:

* Call the type just Same and export it, but I was afraid to cause name collisions due to a too generic name
* Call the type Same and not export it
* Dispatch on type instead of instance (so that one can type pad=Same instead of pad=Same())
* Supply a method instead of a type, giving a similar API as above. 

Happy to change to any of the above or to anything else.

I don't think that same padding is common for pooling layers, but I added it just for the sake of consistency. It is a separate commit so it can easily be removed if not wanted.

Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
2020-04-25 04:39:18 +00:00
DrChainsaw 4e4f6d9d1f Change next version entry to 0.10.5 2020-04-24 22:07:57 +02:00
DrChainsaw deff98812a Add v0.11.0 entry and added samepadding option 2020-04-24 21:59:02 +02:00
DrChainsaw 1544f84bb9 Fix merge conflicts 2020-04-24 21:56:26 +02:00
Yao Lu 58a72ec879 Merge branch 'master' of https://github.com/FluxML/Flux.jl into clip 2020-04-22 01:29:13 +08:00
Yao Lu c4f5e83697 resolve conflict 2020-04-22 01:24:13 +08:00
Yao Lu 1dfec7f38b add test 2020-04-22 01:22:34 +08:00
Yao Lu def19b058e simplify docstrings 2020-04-21 10:56:38 +08:00
Yao Lu cc1dcd5590 rm requires 2020-04-20 20:02:29 +08:00
Yao Lu 68b84bba36 add LinearAlgebra 2020-04-20 19:54:44 +08:00
Yao Lu ba0fca5a19 remove onehot 2020-04-20 19:45:15 +08:00
Yao Lu b33c4b49be add ClipValue and ClipNorm 2020-04-20 19:41:10 +08:00
Yao Lu 427c55af92 speedup matmul of CuMatrix and OneHotMatrix 2020-04-20 19:11:57 +08:00
HenriDeh ac94754281
Update stateless.jl 2020-04-18 13:23:11 +02:00
bors[bot] cdada06472
Merge #1131
1131: Update glorot_normal doc r=dhairyagandhi96 a=AdarshKumar712

Just a minute correction in glorot_normal function doc.

Co-authored-by: Adarsh Kumar <45385384+AdarshKumar712@users.noreply.github.com>
2020-04-18 00:58:49 +00:00
Adarsh Kumar d53deb9132
Update glorot_normal doc 2020-04-18 03:19:32 +05:30
HenriDeh 1f2643c95c
Add dropgrad in huber_loss
Workaround for issue #1128
2020-04-17 13:34:04 +02:00
bors[bot] d49d121a65
Merge #1127
1127: Removed deprecated SGD exports r=dhairyagandhi96 a=bhvieira

Closes #1121 

Co-authored-by: Bruno Hebling Vieira <bruno.hebling.vieira@usp.br>
2020-04-16 13:28:00 +00:00
Bruno Hebling Vieira 2c9881bca6 Merge branch 'master' into removeSGD 2020-04-16 09:56:38 -03:00
Bruno Hebling Vieira db99e41959 Removed SGD exports 2020-04-16 09:50:41 -03:00
Mike J Innes a35335db00 update for functors.jl change 2020-04-14 15:21:45 +01:00
Mike J Innes 6eda279190 split out functor 2020-04-14 13:58:52 +01:00
bors[bot] 32e2435729
Merge #1123
1123: Fix doc indent r=dhairyagandhi96 a=matsueushi

Fix [docs for `update!`](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.update!).

Co-authored-by: matsueushi <matsueushi@gmail.com>
2020-04-14 04:20:30 +00:00
matsueushi be92618473 Fix doc indent 2020-04-14 00:12:06 -04:00
bors[bot] 7a32a703f0
Merge #853
853: Improve docs r=CarloLucibello a=janEbert

If you disagree with any of the changes, please tell me what to reverse or fix.
I am unsure about the docstrings I added to `src/utils.jl` for `unsqueeze` and
the `[un]stack` functions so please give those a more detailed look.

Update Documenter.jl version for new features, fix deprecation warnings in
`docs/make.jl` and import Flux for all doctests.
Add missing docstrings to `src/utils.jl`, `src/layers/stateless.jl` and `src/data/`; add
these and other missing functions to Markdown docs.

Improve docstrings by...
   - fixing typos,
   - removing trailing or double whitespaces,
   - using `jldoctest` blocks where applicable,
   - fixing, updating or correctly setting up existing doctests,
   - improving consistency (for example, always use "# Examples" instead
     of other variants),
   - removing empty lines between docstrings and functions,
   - instead of mentioning keywords, put them into the docstring,
   - adding some missing but useful keywords,
   - adding references (`@ref`),
   - using LaTeX math where applicable, and
   - linking papers.

Debatable stuff that is untouched:
   - BE/AE s/z irregularities (e.g. "normalise" versus "normalize") since
     most papers use the AE version while the Flux source code was
     written with BE spelling.
   - Names of normalization functions are capitalized
     ("Batch Normalization" instead of "batch normalization").
   - Default values in argument lists have spaces around the equals sign (`arg = x` instead of `arg=x`).

Co-authored-by: janEbert <janpublicebert@posteo.net>
2020-04-06 13:47:42 +00:00
bors[bot] a9f8250b43
Merge #1110
1110: fix tests and new version r=CarloLucibello a=CarloLucibello

Add to set the Boston Housing dataset tests as broken due to as SSL certificate expiration problem wich is not our fault

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-04-06 13:27:58 +00:00
janEbert 684570660a Update doctest version guard (1.2 -> 1.4)
And add the same to docs/make.jl
2020-04-06 13:53:36 +02:00
janEbert 0e9bc82626 Loss -> Loss Functions 2020-04-06 13:52:27 +02:00
Carlo Lucibello c54d71ce56 update travis 2020-04-06 13:20:28 +02:00
Carlo Lucibello d6cb9f055d fix housing download 2020-04-06 11:08:20 +02:00
Carlo Lucibello f9e9710446 update travis and bound julia version 2020-04-06 09:35:34 +02:00
Carlo Lucibello 18ea480388 fix tests and new version 2020-04-06 09:26:38 +02:00
janEbert 2a65a30399 Fix doctests in runtests.jl 2020-04-05 13:58:27 +02:00
janEbert 8d2d15aa70 Remove links to OneHot{Vector,Matrix}
Since they aren't documented, we only get a 404 link.
2020-04-04 23:06:56 +02:00
janEbert 73d631f5cd Fix and improve docs
Add missing docstrings, improve existing ones, fix links to functions
or files.
2020-04-04 23:00:34 +02:00
janEbert 2ce5f6d9bf Further docstring improvements in src/
Some had to be re-done after the rebase
2020-04-04 22:59:45 +02:00
janEbert 64ce32ddcf Fix problems due to rebase 2020-04-04 22:55:14 +02:00
janEbert e16c24a9b8 General minuscule improvements 2020-04-04 19:43:28 +02:00
janEbert a614983e0b Improve parameter lists in optimisers.jl 2020-04-04 18:40:20 +02:00
janEbert aaa0a82b74 Slight modifications in `recurrent` docstrings 2020-04-04 18:21:10 +02:00
janEbert 3b913cd501 Fix rebase changes
- Remove `Flux.testmode!` reference (the function no longer exists).
- Change TrackedArray to Array in doctest (Tracker -> Zygote).
2020-04-04 18:21:10 +02:00
janEbert ff9198b939 Add datasets to docs
All the relevant functions. Perhaps discuss a consistent API, describe
it in the docs and then only document the modules.
2020-04-04 18:19:20 +02:00
janEbert 740a59d0a6 Add missing docstrings to `src/data`. 2020-04-04 18:16:46 +02:00
janEbert ba80c2e8ab Improve whitespaces in docs 2020-04-04 18:16:46 +02:00
janEbert ab86e350f2 Improve docstrings
Improvements like...
   - fixing typos,
   - removing trailing and double whitespaces,
   - using `jldoctest` blocks where applicable,
   - fixing, updating or correctly setting up existing doctests,
   - improving consistency (for example, always use "# Examples" instead
     of other variants),
   - removing empty lines between docstrings and functions,
   - instead of mentioning keywords, put them into the docstring,
   - adding some missing but useful keywords,
   - adding references (`@ref`),
   - using LaTeX math where applicable, and
   - linking papers.

Debatable stuff that is untouched:
   - BE/AE s/z irregularities ("normalise" versus "normalize") since
     most papers use the AE version while the Flux source code was
     written with BE spelling.
   - Names of normalization functions are capitalized
     ("Batch Normalization" instead of "batch normalization").
2020-04-04 18:16:46 +02:00
janEbert c76b7315ac Add loss and utility functions to docs 2020-04-04 17:39:19 +02:00
janEbert c222e1b124 Add missing docstrings to `src/utils.jl`
Not sure about the `stack`, `unstack` and `unsqueeze` functions.
2020-04-04 17:38:25 +02:00
janEbert 2f955a33cd `src/layers/stateless.jl`: Add missing docstrings 2020-04-04 17:36:23 +02:00
janEbert 9b68423e64 Import (`using`) Flux for all doctests 2020-04-04 17:22:08 +02:00
janEbert 1bf8dc2d5b Update Documenter version and fix warnings
0.23.2 -> 0.23.3
2020-04-04 17:22:08 +02:00
bors[bot] 6b37ce3986
Merge #1098
1098: Allow CuArrays v2.x r=dhairyagandhi96 a=ararslan



Co-authored-by: Tim Besard <tim.besard@gmail.com>
Co-authored-by: Alex Arslan <ararslan@comcast.net>
Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-03-26 09:43:21 +00:00
Dhairya Gandhi 6939e03fc6 bump CuArrays version 2020-03-26 14:03:55 +05:30
Dhairya Gandhi 119a66a7cd Merge remote-tracking branch 'origin/tb/cuarraystyle' into aa/cuarrays 2020-03-26 13:42:06 +05:30
Alex Arslan e85a5d8573
Update CUDAdrv for Tim's bug fix 2020-03-25 15:23:07 -07:00
Alex Arslan 49ba121159
Update Manifest.toml 2020-03-25 12:48:29 -07:00
Alex Arslan 347f53adf6
Allow CuArrays v2.x 2020-03-25 10:58:39 -07:00
bors[bot] 240ab1147f
Merge #1096
1096: fix doc typos r=dhairyagandhi96 a=wenjie-p



Co-authored-by: yuebanyishenqiu <thisispwj@outlook.com>
2020-03-22 06:26:11 +00:00
yuebanyishenqiu 1511778267 fix typos 2020-03-22 09:41:29 +08:00
bors[bot] 1605a01039
Merge #1083
1083: Fix typo in the docstrings of AlphaDropout r=CarloLucibello a=AzamatB



Co-authored-by: AzamatB <aberdysh@gmail.com>
2020-03-14 09:56:05 +00:00
AzamatB 85a9493722
Fix typo in the docstrings of AlphaDropout 2020-03-14 15:42:00 +06:00
bors[bot] 5e09113586
Merge #1080
1080: CompatHelper: bump compat for "Colors" to "0.12" r=dhairyagandhi96 a=github-actions[bot]

This pull request changes the compat entry for the `Colors` package from `0.8, 0.9, 0.10, 0.11` to `0.8, 0.9, 0.10, 0.11, 0.12`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-03-14 01:47:45 +00:00
github-actions[bot] bca74213ee CompatHelper: bump compat for "Colors" to "0.12" 2020-03-14 00:12:33 +00:00
bors[bot] 8930021b47
Merge #1078
1078: CompatHelper: bump compat for "CodecZlib" to "0.7" r=CarloLucibello a=github-actions[bot]

This pull request changes the compat entry for the `CodecZlib` package from `0.5, 0.6` to `0.5, 0.6, 0.7`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-03-13 11:49:02 +00:00
github-actions[bot] 69e96ed1c1 CompatHelper: bump compat for "CodecZlib" to "0.7" 2020-03-13 00:13:04 +00:00
bors[bot] a874bef6f9
Merge #1076
1076: fix typo in the Dropout docs r=dhairyagandhi96 a=AzamatB



Co-authored-by: AzamatB <aberdysh@gmail.com>
2020-03-10 09:40:28 +00:00
AzamatB f0d866b2fd
fix typo in the Dropout docs 2020-03-10 12:44:19 +06:00
bors[bot] d4cf1436df
Merge #950
950: added GlobalMaxPool, GlobalMeanPool, and flatten layers r=CarloLucibello a=gartangh



Co-authored-by: Garben Tanghe <garben.tanghe@gmail.com>
2020-03-08 14:27:10 +00:00
Garben Tanghe fc3af681ec updated documentation 2020-03-08 14:22:09 +01:00
Garben Tanghe 746e3310f1 removed Flatten struct
updated documentation
2020-03-08 14:22:03 +01:00
Garben Tanghe 82e16a5b29 split up Flatten layer to use the flatten function 2020-03-08 14:21:59 +01:00
Garben Tanghe 3e14bd878c added GlobalMaxPool, GlobalMeanPool, and Flatten layers 2020-03-08 14:18:48 +01:00
Dhairya Gandhi d8e44fcc1c correct broadcasting for addition 2020-03-04 18:22:45 +05:30
Dhairya Gandhi 7e308e77fd rm unneccesary fns 2020-03-04 17:57:16 +05:30
Dhairya Gandhi 5a4f1932a6 closes #1071 2020-03-04 17:22:45 +05:30
bors[bot] df3f904f7c
Merge #1072
1072: update freeze docs r=CarloLucibello a=CarloLucibello



Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-03-04 03:47:45 +00:00
CarloLucibello 12106ff4cc update freeze docs 2020-03-04 04:45:41 +01:00
bors[bot] 94ba1e8ede
Merge #1028 #1070
1028: Common questions answered in docs r=CarloLucibello a=dhairyagandhi96

cc @MikeInnes 

1070: Prevent breakage due to new `active` field in normalise layers r=CarloLucibello a=ianshmean

Prevents breakage where the normalise structs, such as `BatchNorm`, have been directly defined but missing the new `active` field

cc. @darsnack 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Ian <i.r.butterworth@gmail.com>
2020-03-04 00:10:39 +00:00
bors[bot] af23a5756c
Merge #1053
1053: Added Some Loss functions with some doc improvements r=CarloLucibello a=AdarshKumar712

Added the following loss functions with tests:
1. mae
2. mean squared logarithmic error
3. huber loss
4. squared hinge loss
5. dice coeff loss
6. tversky loss 

Also added some documentation improvements for few other functions. 

Co-authored-by: Adarsh Kumar <45385384+AdarshKumar712@users.noreply.github.com>
2020-03-03 23:56:21 +00:00
Ian 61f66e3dcd remove unnecessary helper for AlphaDropout 2020-03-03 13:20:02 -05:00
Ian 078ad7dd50 bump version to 0.10.3 2020-03-03 13:05:23 -05:00
Ian d63fcf2cb4 add depreciation reminder 2020-03-03 13:05:03 -05:00
Ian d9ea5fba76 add `active` helpers for other normalise layers 2020-03-03 11:55:39 -05:00
Ian 0def352383 Prevent breakage due to new `active` field in BatchNorm 2020-03-03 11:49:34 -05:00
bors[bot] 19a034b215
Merge #1069
1069: Updated activation functions in NNlib doc r=dhairyagandhi96 a=AdarshKumar712



Co-authored-by: Adarshkumar712 <Adarshkumar712.ak@gmail.com>
2020-03-03 12:39:03 +00:00
Adarshkumar712 d0e8a9ff71 Updated activation functions in NNlib doc 2020-03-03 22:07:05 +05:30
Adarsh Kumar 6e5c18bddf
Updated loss functions 2020-03-03 16:02:57 +05:30
bors[bot] 4acc907723
Merge #1065
1065: update documenter r=CarloLucibello a=CarloLucibello



Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-03-03 07:20:03 +00:00
bors[bot] df73b8b8fb
Merge #1064
1064: Include cuda/cuda.jl during precompilation? r=CarloLucibello a=ianshmean

Loading `cuda/cuda.jl` at run-time during `__init__()` seems to be causing issues with PackageCompiler. (see error at bottom).

I'm wondering the cost of loading `cuda/cuda.jl` is negligible enough to just do it in all cases and get it precompiled. Setting `Flux.use_cuda[]` would stil be used  for switching cuda on or off. 

Load time in 1.3.1 on my mac (without cuda):

This PR:
```
julia> @time using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 37.313982 seconds (56.30 M allocations: 2.822 GiB, 2.52% gc time)
...
julia> @time using Flux
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 22.111054 seconds (52.93 M allocations: 2.663 GiB, 3.99% gc time)
```
Master:
```
julia> @time using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 35.750143 seconds (53.73 M allocations: 2.698 GiB, 2.51% gc time)
...
julia> @time using Flux
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 26.267999 seconds (52.92 M allocations: 2.660 GiB, 3.67% gc time)
```


I didn't make `include("cuda/cuda.jl")` dependent  on `CuArrays.functional()` because I guess there could be a case where, say, a user doesn't have cuda installed, loads Flux, installs cuda, reloads Flux.. where the 2nd time the package isn't re-precompiled.

The PackageCompiler error, which doesn't happen every time. It just seems that the runtime loading of cuda.jl  may be introducing dep tracking issues (?)
```
┌ Warning: Package Zygote does not have InteractiveUtils in its dependencies:
│ - If you have Zygote checked out for development and have
│   added InteractiveUtils as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Zygote
└ Loading InteractiveUtils into Zygote from project dependency, future warnings for Zygote are suppressed.
fatal: error thrown and no exception handler available.
#<null>
require at ./loading.jl:905
_jl_invoke at /home/ian/Documents/julia-kf-31156/src/gf.c:2161 [inlined]
jl_apply_generic at /home/ian/Documents/julia-kf-31156/src/gf.c:2328
jl_apply at /home/ian/Documents/julia-kf-31156/src/julia.h:1695 [inlined]
call_require at /home/ian/Documents/julia-kf-31156/src/toplevel.c:399 [inlined]
eval_import_path at /home/ian/Documents/julia-kf-31156/src/toplevel.c:436
eval_import_from at /home/ian/Documents/julia-kf-31156/src/toplevel.c:557
jl_toplevel_eval_flex at /home/ian/Documents/julia-kf-31156/src/toplevel.c:646
jl_eval_module_expr at /home/ian/Documents/julia-kf-31156/src/toplevel.c:181
jl_toplevel_eval_flex at /home/ian/Documents/julia-kf-31156/src/toplevel.c:640
jl_parse_eval_all at /home/ian/Documents/julia-kf-31156/src/ast.c:907
jl_load_rewrite at /home/ian/Documents/julia-kf-31156/src/toplevel.c:872
include at ./Base.jl:380
include at ./Base.jl:368 [inlined]
include at /home/ian/.julia/packages/Flux/p8ZLv/src/Flux.jl:1 [inlined]
__init__ at /home/ian/.julia/packages/Flux/p8ZLv/src/Flux.jl:56
jfptr___init___22072 at /home/ian/Documents/MyPackage.jl/dev/compilation/MyPackageSysImage.so (unknown line)
_jl_invoke at /home/ian/Documents/julia-kf-31156/src/gf.c:2161 [inlined]
jl_apply_generic at /home/ian/Documents/julia-kf-31156/src/gf.c:2328
jl_apply at /home/ian/Documents/julia-kf-31156/src/julia.h:1695 [inlined]
jl_module_run_initializer at /home/ian/Documents/julia-kf-31156/src/toplevel.c:74
_julia_init at /home/ian/Documents/julia-kf-31156/src/init.c:788
unknown function (ip: 0x5594b1667f)
__libc_start_main at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x5594b16733)
unknown function (ip: 0x5594b16733)
```

Co-authored-by: Ian <i.r.butterworth@gmail.com>
2020-03-03 07:07:54 +00:00
CarloLucibello af99ca27ee docs update 2020-03-03 07:52:20 +01:00
Adarsh Kumar 92e09e204d
Test argument consistency with ŷ and y 2020-03-02 20:33:12 +05:30
Adarsh Kumar 2f05094068
Added consistency with ŷ and unicode chars 2020-03-02 20:00:47 +05:30
CarloLucibello f5da4d0c70 remove docs manifest 2020-03-02 15:10:08 +01:00
CarloLucibello ffea8b616d fix docs 2020-03-02 15:08:37 +01:00
CarloLucibello e51070bf79 update documenter 2020-03-02 15:08:37 +01:00
bors[bot] ddab979ea9
Merge #1066
1066: fix travis for documentation build r=CarloLucibello a=johnnychen94

The previous build doesn't trigger the documentation stage because the matrix doesn't get expanded for the sole job.

Not very clear how Travis reads the config but this change fixes the issue.

😕 weird that it doesn't allow failures on nightly here... The one in my fork works as expected. https://github.com/johnnychen94/Flux.jl/runs/479502998

cc: @CarloLucibello

Co-authored-by: Johnny Chen <johnnychen94@hotmail.com>
2020-03-02 12:29:20 +00:00
Johnny Chen f30267e037
bring back test on custom Manifest.toml 2020-03-02 20:14:43 +08:00
Johnny Chen 224ec728ac
fix travis for documentation build 2020-03-02 20:05:56 +08:00
Adarsh Kumar 5565250c28
Updated test for tversky 2020-03-02 13:46:33 +05:30
Adarsh Kumar 89d07c07ec
Added Loss functions to docs 2020-03-02 13:33:44 +05:30
Adarsh Kumar f9e31a020c
Updated huber_loss with other minute changes 2020-03-02 13:25:23 +05:30
Dhairya Gandhi cbb9a2a929
Merge branch 'master' into dg/params_docs 2020-03-02 12:45:30 +05:30
Dhairya Gandhi bb5350591f cleanup 2020-03-02 12:42:33 +05:30
Dhairya Gandhi 27949693f3 refactor 2020-03-02 12:40:19 +05:30
bors[bot] be38146ee9
Merge #1061
1061: fix a few typos in docstrings r=CarloLucibello a=visr



Co-authored-by: Martijn Visser <mgvisser@gmail.com>
2020-03-02 01:03:58 +00:00
bors[bot] 6575fb8f48
Merge #1057
1057: add Julia ecosystem doc section r=CarloLucibello a=CarloLucibello

Partially fixing #251,  related to the discussion in #1051 .

Not exactly a poem that I wrote here, maybe someone could suggest a better rephrasing. 
Suggestion for additional packages to add to the list also welcome

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-03-02 00:52:22 +00:00
Ian 7555e488c6 tweaks 2020-03-01 19:40:03 -05:00
Ian 9b2f4919ee includ cuda/cuda.jl during precompile, even if cuda isn't detected 2020-03-01 19:33:23 -05:00
bors[bot] 3cf131b8de
Merge #1062
1062: docstring ensure signature code formatting r=CarloLucibello a=visr

by using a four space indent instead of two

Fixes issues seen here:

![image](https://user-images.githubusercontent.com/4471859/75627427-54aa6600-5bd0-11ea-93d3-92901d44db59.png)

Where the type signature has no code formatting, and a code block is introduced that throws off the rest of the formatting.

Co-authored-by: Martijn Visser <mgvisser@gmail.com>
2020-03-01 22:28:10 +00:00
bors[bot] 069d228693
Merge #1044
1044: Add testmode! back for normalization layers r=CarloLucibello a=darsnack

Fixed #909 

I added `testmode!(m, mode)` back to Flux as per v0.9. Now the `mode` can be `false`, `true`, or `:auto`/`nothing` with the default being `:auto` for newly constructed layers. In `:auto` mode, the `istraining()` functions added in v0.10 are used to determine whether we are evaluating within an AD trace or not.

Also plan on adding a doc section in an additional commit.

Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
2020-03-01 19:14:07 +00:00
Kyle Daruwalla e49d9c4537 Debump version 2020-03-01 13:11:07 -06:00
Kyle Daruwalla 88cad1c5e7 Bump minor version to v0.10.3 2020-03-01 12:50:49 -06:00
Kyle Daruwalla 23f791e32b Add "during X phase" phrasing to testmode!/trainmode! docstring. 2020-03-01 12:49:30 -06:00
Kyle Daruwalla 35e460b044 Fixed broken @ref in docstring 2020-03-01 12:44:36 -06:00
Kyle Daruwalla 4cebf36361
Merge branch 'master' into feature/istraining 2020-03-01 12:32:15 -06:00
Kyle Daruwalla c001d0f3c5 Added trainmode! and updated docs with warning 2020-03-01 12:30:41 -06:00
Martijn Visser d67a2e40b3 remove stray code block start from docstring 2020-03-01 15:20:40 +01:00
Martijn Visser f4365dab94 fix docstring example indentation as well 2020-03-01 15:19:22 +01:00
Martijn Visser 32e0aa9fcb docstring ensure signature code formatting
by using a four space indent instead of two
2020-03-01 15:15:39 +01:00
Martijn Visser 6076847a45 fix a few typos in docstrings 2020-03-01 15:07:12 +01:00
Adarsh Kumar 08dabce57e
Updated loss function docs 2020-03-01 12:00:11 +05:30
Adarsh Kumar 57c1b67d08
Merge branch 'master' into patch-1 2020-03-01 11:49:33 +05:30
Kyle Daruwalla 568ecb1c97 Removed trainmode from tests 2020-02-29 16:25:18 -06:00
Kyle Daruwalla 5cbd2cecf2 Changed testmode! to return model 2020-02-29 16:09:59 -06:00
bors[bot] 77a7606dad
Merge #1051
1051: add DataLoader r=CarloLucibello a=CarloLucibello

Fix #450 

This adds a DataLoader type, largely adapted from the Knet one, therefore pinging @denizyuret to check if he is cool with this. Unfortunately, I cannot "unsee" his implementation, and in any case any reasonable alternative implementation will be pretty much similar I guess. 

This is an initial implementation to get things going, possibly in the future we will also want a distributed and out-of-memory option as the one implemented by @staticfloat here
https://github.com/FluxML/Metalhead.jl/blob/sf/training/training/ImageNet/dataset.jl



Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-02-29 19:27:27 +00:00
CarloLucibello a1efc434c2 fix typo 2020-02-29 19:40:44 +01:00
CarloLucibello a72258ea2a fix rebase 2020-02-29 18:55:49 +01:00
CarloLucibello 97141e8c98 improve docstring 2020-02-29 18:51:00 +01:00
CarloLucibello 487002878e restrict train! special casing 2020-02-29 18:51:00 +01:00
CarloLucibello b6c79b38b4 add DataLoader
special case train! for the unsupervised data iterator
2020-02-29 18:50:59 +01:00
bors[bot] 37af9fb15c
Merge #1023
1023: Feature: Added Boston Housing Dataset r=CarloLucibello a=pranjaldatta

[Boston Housing Dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/) is one of the most common datasets that are used by beginners. It is as popular as other datasets like Iris etc. Hence, it feels only natural that this dataset is a part of Flux.

Added src/data/housing.jl: code for downloading and loading the dataset
Edited src/data/Data.jl: To include and export housing.jl
Edited test/data.jl: Added test for the dataset.

*All tests in test/data.jl are passing*

Co-authored-by: pranjaldatta <pranjaldatta99@gmail.com>
Co-authored-by: Pranjal  Datta <pranjaldatta99@gmail.com>
2020-02-29 15:54:34 +00:00
CarloLucibello 4f693e02cb add model zoo reference 2020-02-29 13:50:23 +01:00
CarloLucibello 4109f2e0d7 cleanup 2020-02-29 13:45:17 +01:00
CarloLucibello 169ed6eb25 add ecosystem 2020-02-29 13:43:03 +01:00
bors[bot] 81a55a0c9e
Merge #1041
1041: add NNlib docs + misc docs improvements r=CarloLucibello a=CarloLucibello

Partially addressing https://github.com/FluxML/NNlib.jl/issues/137.

Also, I'm leaving out the `σ`  activation and using its alias `sigmoid`, since `σ` conveys little information and it is also used to denote a generic activation in the Dense layer. I think we should deprecate `σ` in NNlib, has this been discussed already?

In an ideal world, before merging this, we should get NNlib to either unexport or add docs to its undocumented exports  

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-02-29 10:24:39 +00:00
Carlo Lucibello 425fcdbe69 NNlib docs + misc docs improvements 2020-02-29 11:14:48 +01:00
bors[bot] 2dd23574c0
Merge #998
998: test restructure on the GPU r=CarloLucibello a=ChrisRackauckas

Requires https://github.com/FluxML/Zygote.jl/pull/474 to pass

Co-authored-by: Chris Rackauckas <accounts@chrisrackauckas.com>
2020-02-29 09:08:11 +00:00
Adarsh Kumar 8afed01345
Apply suggestions from code review
Co-Authored-By: David Lung <lungd@users.noreply.github.com>
2020-02-27 23:23:53 +05:30
Dhairya Gandhi 35f6998be7 pkg up 2020-02-27 22:19:06 +05:30
Adarsh Kumar 9dce623214
Updated Msle loss 2020-02-27 16:26:17 +05:30
Dhairya Gandhi a121742f9c pkg up 2020-02-27 13:56:05 +05:30
Adarsh Kumar 3d8965230f
Added tests for dice and Tversky loss 2020-02-27 02:29:39 +05:30
Adarsh Kumar 980ce72914
Added tversky and dice loss 2020-02-27 02:00:28 +05:30
bors[bot] 531d3d4d8b
Merge #1052
1052: update docs and export update! r=dhairyagandhi96 a=CarloLucibello

Fix #951 

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-02-26 19:33:53 +00:00
CarloLucibello 759fe9df2f update docs and export update! 2020-02-26 20:27:39 +01:00
Dhairya Gandhi 20e78e274e docs fix 2020-02-26 22:41:45 +05:30
Dhairya Gandhi cf82393ae8 type signatures 2020-02-26 22:36:25 +05:30
Dhairya Gandhi cd931793ef more docs and constructors 2020-02-26 22:29:14 +05:30
Dhairya Gandhi 58211e31bd docs improve 2020-02-26 22:22:11 +05:30
Dhairya Gandhi f889d0c4d4 add kwarg constructors 2020-02-26 22:19:17 +05:30
Pranjal Datta 90bb3205f4
Merge pull request #2 from pranjaldatta/housing_added
added newlines  at end of file
2020-02-26 15:08:37 +05:30
pranjaldatta 569021a9f1 added newlines at end of file 2020-02-26 15:05:23 +05:30
Kyle Daruwalla ba5259a269 Added docs on testmode! 2020-02-25 13:53:49 -06:00
bors[bot] 55616afc11
Merge #960
960: Added utility function outdims to compute output dimensions of a layer r=dhairyagandhi96 a=darsnack

Based on Slack chatter, I added a utility function, `outdims`, that computes the output dimensions for given input dimensions.

Example
```julia
layer = Conv((3, 3), 3 => 16)
outdims(layer, (10, 10)) # returns (8, 8)
```

Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
2020-02-25 17:40:05 +00:00
Tim Besard 4ed7d984db Adapt to CuArrays ArrayStyle changes. 2020-02-25 14:09:03 +01:00
Dhairya Gandhi 7e58766467
Merge pull request #1047 from MotJuMi/master
Edit description of convolutional layer
2020-02-25 15:39:23 +05:30
Bulat Suleymanov db4eaf254b
Edit description of convolutional layer 2020-02-24 13:16:51 +05:00
Dhairya Gandhi 34ceed5c1f
Merge pull request #1046 from ianshmean/patch-1
Bump Colors compat to include 0.10, 0.11
2020-02-24 10:41:49 +05:30
Ian Butterworth 6ced7e1ecf
expand Colors compat 2020-02-23 13:42:11 -05:00
Kyle Daruwalla 924b8f49ec Updated to place function definitions in the appropriate places. 2020-02-21 15:10:28 -06:00
Kyle Daruwalla 7c12af065a Added testmode! functionality back to normalization layers. 2020-02-21 14:35:10 -06:00
Kyle Daruwalla f5b9cf659c Updated docs to specify exactly what layers support outdims 2020-02-20 23:38:56 -06:00
Dhairya Gandhi 88b0c65d72
Merge pull request #1035 from matsueushi/remove_get_macro
Remove get! macro
2020-02-20 12:58:16 +05:30
Dhairya Gandhi 8f7a0bb264
Merge pull request #1030 from JuliaTagBot/master
Install TagBot as a GitHub Action
2020-02-19 21:47:31 +05:30
Dhairya Gandhi a38af748e5
Merge pull request #1037 from heliosdrm/heliosdrm-patch-1
update compat to Juno 0.8
2020-02-19 21:46:33 +05:30
bors[bot] e4a84c120f
Merge #1021
1021: nograd for onecold, onehot, onehotbatch r=MikeInnes a=CarloLucibello

fixes #1020 

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-02-17 14:12:48 +00:00
Helios De Rosario 9bb388d953
update Juno compat 2020-02-16 18:29:18 +01:00
Helios De Rosario 6f0710d364
Merge pull request #1 from FluxML/master
update to origin
2020-02-16 18:27:35 +01:00
Dhairya Gandhi 26631e1361 test_broken AlphaDropout 2020-02-16 21:22:37 +05:30
Viral B. Shah 0b8d1574bf
Merge pull request #984 from aminya/CompatHelper
Adding CompatHelper
2020-02-16 09:44:09 -05:00
matsueushi 6ea7b95384 Remove unused using 2020-02-15 20:06:15 -05:00
Dhairya Gandhi d5ed9a4478
Update docs/src/models/basics.md
Co-Authored-By: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-02-12 11:26:11 +05:30
Dhairya Gandhi ee6d950696
Update docs/src/models/basics.md
Co-Authored-By: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-02-12 11:25:50 +05:30
bors[bot] fe85a38d78 Merge #1032
1032: Remove outdated reference to truncate! r=dhairyagandhi96 a=mcognetta



Co-authored-by: Marco <mcognetta@users.noreply.github.com>
2020-02-10 08:30:15 +00:00
Marco ae0455517a Remove outdated reference to truncate! 2020-02-10 00:03:11 -08:00
Kyle Daruwalla c37fc3cfa6 Recommitting to trigger build 2020-02-09 19:45:04 -06:00
Julia TagBot d7b20d1a78 Install TagBot as a GitHub Action 2020-02-08 20:02:52 +07:00
Dhairya Gandhi 37d58e16dd common questions answered in docs 2020-02-08 16:33:18 +05:30
Pranjal Datta d1522deee4
Merge pull request #1 from pranjaldatta/housing_added
Feature: Added Boston Housing Dataset
2020-02-07 04:01:00 +05:30
pranjaldatta 197a1a70c0 added BostonHousing dataset and testing 2020-02-07 03:47:19 +05:30
CarloLucibello 6499344af3 nograd for onecold, onehot, onehotbatch 2020-02-06 15:41:46 +01:00
Adarsh Kumar 659ba074d1
Updated test for msle 2020-02-06 01:21:51 +05:30
Adarsh Kumar 7710bb0b4b
Removed spurious promotions 2020-02-06 01:06:41 +05:30
Adarsh Kumar b5184553d4
Error correction in mae 2020-02-05 23:32:55 +05:30
Adarsh Kumar 44a977b7a4
Added tests for new loss functions 2020-02-05 23:20:06 +05:30
Adarsh Kumar 643086c8db
Updated squared_hinge 2020-02-05 22:40:07 +05:30
Adarsh Kumar 7ac647a7ac
Added loss functions 2020-02-05 22:29:15 +05:30
bors[bot] 60043fa2aa
Merge #1013
1013: Adapt to GPUArrays/CuArrays changes r=dhairyagandhi96 a=maleadt

Changes in response to a29df67184 and https://github.com/JuliaGPU/CuArrays.jl/pull/576. I suppose the next CuArrays release will need to be breaking because of this.

Maybe the `crossentropy` signature needs to be adjusted to support integer vectors, but I'll leave that decision up to Flux developers. This at least is the quick fix to get the tests passing again.

Co-authored-by: Tim Besard <tim.besard@gmail.com>
2020-02-03 16:29:48 +00:00
Dhairya Gandhi ddc2c20e68
Merge pull request #994 from FluxML/ox/doccustomtraining
Add custom training loops to docs
2020-02-01 11:13:54 +05:30
Dhairya Gandhi bc20103ea6 no-op copy 2020-01-31 13:23:33 +05:30
Tim Besard e2c2ec5575 Don't invoke GPU crossentropy with integers.
Broadcasting log on integers does not work.
2020-01-31 08:22:54 +01:00
Tim Besard e66a7f130f Don't compare CPU with GPU arrays. 2020-01-31 08:22:21 +01:00
Dhairya Gandhi b9fbee1ff0 ::typeof(op) -> op 2020-01-31 12:24:36 +05:30
Dhairya Gandhi 620cffc45c
Merge pull request #1008 from FluxML/tb/cuindex
Remove unused imports.
2020-01-29 18:52:53 +05:30
Tim Besard d88f63adb4 Remove unused imports. 2020-01-29 12:15:41 +01:00
Chris Rackauckas 9803826a36 test restructure on the GPU
Requires https://github.com/FluxML/Zygote.jl/pull/474
2020-01-20 13:53:28 -05:00
Dhairya Gandhi 29ab410794 test gradients are allocated on the gpu 2020-01-17 15:52:26 +05:30
Lyndon White 7797e31b44
Add custom training loops to docs 2020-01-16 21:57:59 +00:00
bors[bot] d1edd9b16d
Merge #680
680: Added new loss functions. r=thebhatman a=thebhatman

I have added the KL Divergence Loss function, Poisson loss function, Logcosh loss, and Hinge loss function.

Co-authored-by: Manjunath Bhat <manjunathbhat9920@gmail.com>
Co-authored-by: thebhatman <manjunathbhat9920@gmail.com>
2020-01-13 15:46:25 +00:00
Manjunath Bhat 747e01ea02
Test to check for spurious promotions 2020-01-13 18:33:30 +05:30
aminya f00b532556 Adding CompatHelper 2020-01-06 03:17:25 +03:30
Dhairya Gandhi b1e68813a8 cpu -> test_throws 2019-12-20 23:02:44 +05:30
Dhairya Gandhi efa2cbfd0e checkin Manifest#master 2019-12-11 14:13:41 +05:30
Kyle Daruwalla 2f854bdfc0 Recommitting to trigger new build 2019-12-10 09:57:08 -06:00
Dhairya Gandhi a72ca2b05d fix args 2019-12-09 23:18:01 +05:30
Dhairya Gandhi 894c075b6d rm Zeros setindex 2019-12-09 21:40:58 +05:30
Dhairya Gandhi f39e184814 rm Zeros warning 2019-12-09 21:07:30 +05:30
Manjunath Bhat 8a93be8c6c
Change loss to cost 2019-12-09 20:39:46 +05:30
Kyle Daruwalla 04991d3261 Added entry to docs for outdims 2019-12-07 14:06:11 -06:00
Kyle Daruwalla 0cdd11c0dc Added tests for varying padding, stride, and dilation with outdims. 2019-12-07 14:05:50 -06:00
Kyle Daruwalla a64378b112 Switched to using NNlib for conv.jl outdims. 2019-12-07 13:21:26 -06:00
Kyle Daruwalla 6265b1fa39 Added tests for outdims 2019-12-05 22:54:25 -06:00
Kyle Daruwalla 31dda0ce6c Updated with all basic and conv layers outdims 2019-12-05 21:57:10 -06:00
Dhairya Gandhi 9b6155c77d
Merge branch 'master' into dg/gradtests 2019-12-05 18:17:47 +05:30
Dhairya Gandhi 76dc8ea9d4 formatting fixes 2019-12-05 18:14:04 +05:30
Dhairya Gandhi 717ad9328d add some grad tests on GPU 2019-12-05 18:12:23 +05:30
DrChainsaw 755536bf5e Merge remote-tracking branch 'upstream/master' into samepad 2019-12-04 23:45:03 +01:00
Kyle Daruwalla b4ed16ad9c Added outdims for some basic layers 2019-12-03 22:48:48 -06:00
Kyle Daruwalla 9279d79e63
Merge pull request #1 from FluxML/master
Updating to upstream master
2019-12-03 21:09:35 -06:00
Dhairya Gandhi ec872bb579 test that bias has no grads with Zeros 2019-11-27 19:45:04 +05:30
Dhairya Gandhi 245563077b cleaner API 2019-11-27 19:40:58 +05:30
Dhairya Gandhi eb41715d26 define manual rules 2019-11-19 13:30:33 +05:30
Dhairya Gandhi e89b8eba77 fixes 2019-11-13 01:12:26 +05:30
DrChainsaw 453ecd1f24 Merge remote-tracking branch 'upstream/master' into samepad 2019-11-08 18:49:47 +01:00
Dhairya Gandhi a4a987f0b0 hook into bcasting 2019-11-07 16:53:41 +05:30
Dhairya Gandhi 7c90fb469d use array to define Zeros 2019-10-23 20:02:15 +05:30
Dhairya Gandhi 4a183aeaf0 make Zeros a dimensionlesss number 2019-10-22 16:11:27 +05:30
DrChainsaw 530d4edb67 Fix for reading comprehension error (dim is not always 2 * (N-2)) Fix for ambiguous method sig 2019-10-20 16:03:01 +02:00
DrChainsaw 411ce5dbd8 Add SamePad for pooling layers 2019-10-20 13:43:39 +02:00
DrChainsaw fc123d6279 Add SamePad for conv layers 2019-10-20 13:43:23 +02:00
thebhatman d591b2b59e Removed colon and capitalised 2019-10-09 21:36:40 +05:30
thebhatman 96a23c295c Changes to docs 2019-10-09 14:53:03 +05:30
Dhairya Gandhi c85bad4427 replace weight with filter 2019-10-08 20:26:09 +05:30
Dhairya Gandhi 49ea43e711 ZeroType => Zeros 2019-10-08 20:02:04 +05:30
Dhairya Gandhi 95c5845e99 document bias switch 2019-10-08 17:54:01 +05:30
Dhairya Gandhi b596faaffa tests bias switch 2019-10-08 17:18:39 +05:30
Dhairya Gandhi 040697fb2b add bias and weight kwarg 2019-10-08 17:18:19 +05:30
Dhairya Gandhi f3904b4e04 add ZeroType back 2019-10-08 17:17:36 +05:30
Dhairya Gandhi a1e826b888 fixes 2019-10-06 05:10:56 +05:30
Dhairya Gandhi 214f71f492 add N 2019-10-06 04:55:33 +05:30
Dhairya Gandhi 2ae3ad3b31 doc fixes 2019-10-06 04:46:13 +05:30
Dhairya Gandhi d00f833c17 rm ZeroType 2019-10-06 04:44:50 +05:30
Dhairya Gandhi e97d61f257 fixes 2019-10-06 04:42:26 +05:30
Dhairya Gandhi 48a305bd21 ditto remaining layers 2019-10-06 04:41:06 +05:30
Dhairya Gandhi 55ef7c1aba add weight and bias kwargs 2019-10-06 04:25:23 +05:30
thebhatman ec886c8ce8 Added docstring for hinge loss 2019-10-03 21:13:09 +05:30
Dhairya Gandhi 1fe321781b add to docs 2019-10-01 21:29:18 +05:30
Dhairya Gandhi dced8c04e5 use ZeroType 2019-10-01 21:25:07 +05:30
Manjunath Bhat 2b30319a55
Merge branch 'master' into patch-6 2019-09-30 21:05:02 +05:30
thebhatman ec35e9cbaa Loss functions docs added in layers.md 2019-09-30 21:02:13 +05:30
thebhatman 6e289ef939 Merge branch 'patch-6' of https://github.com/thebhatman/Flux.jl into patch-6 2019-09-30 20:55:44 +05:30
Dhairya Gandhi a801fcb9e7 docstrings 2019-09-27 12:07:55 +05:30
Dhairya Gandhi 9f2ac8fdef ditto remaining conv layers 2019-09-27 12:04:27 +05:30
Dhairya Gandhi 5ea6a33f44 make bias optional 2019-09-27 11:48:12 +05:30
thebhatman 710084ffbf Loss functions added to docs 2019-04-05 23:50:16 +05:30
thebhatman b84ab7ac95 Removed logcosh 2019-04-05 03:16:54 +05:30
thebhatman 4efcc69ba5 logcosh averaged 2019-03-26 23:23:02 +05:30
Manjunath Bhat 930adb122d
Avoided promotion to Float64 in hinge. 2019-03-25 23:43:06 +05:30
thebhatman 6f078857be Added reference links to loss functions 2019-03-26 03:15:28 +05:30
thebhatman c4d12e57fe Loss function names in lowercase 2019-03-26 03:09:48 +05:30
Manjunath Bhat 57a52e3375
Error of recurrent decimals fixed. 2019-03-12 02:58:32 +05:30
Manjunath Bhat 61386c04f8
Tests added. 2019-03-12 02:36:37 +05:30
Manjunath Bhat 633f0df01f
Added new loss functions. 2019-03-12 02:31:42 +05:30
55 changed files with 2817 additions and 880 deletions

12
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,12 @@
[Please delete this text and describe your change here.
For bugfixes, please detail the bug and include a test case which your patch fixes.
If you are adding a new feature, please clearly describe the design, its rationale, the possible alternatives considered.
It is easiest to merge new features when there is clear precedent in other systems; we need to know we're taking
the right direction since it can be hard to change later.]
### PR Checklist
- [ ] Tests are added
- [ ] Entry in NEWS.md
- [ ] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).

16
.github/workflows/CompatHelper.yml vendored Normal file
View File

@ -0,0 +1,16 @@
name: CompatHelper
on:
schedule:
- cron: '00 00 * * *'
jobs:
CompatHelper:
runs-on: ubuntu-latest
steps:
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: julia -e 'using CompatHelper; CompatHelper.main()'

11
.github/workflows/TagBot.yml vendored Normal file
View File

@ -0,0 +1,11 @@
name: TagBot
on:
schedule:
- cron: 0 * * * *
jobs:
TagBot:
runs-on: ubuntu-latest
steps:
- uses: JuliaRegistries/TagBot@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}

View File

@ -7,11 +7,11 @@ os:
julia:
- 1.3
- 1
- nightly
matrix:
allow_failures:
- julia: nightly
notifications:
email: false
jobs:
include:
@ -24,6 +24,9 @@ jobs:
- julia --project=docs/ docs/make.jl
after_success: skip
allow_failures:
- julia: nightly
## uncomment the following lines to override the default test script
script:
- julia --color=yes -e 'using Pkg; Pkg.activate(); Pkg.instantiate(); Pkg.test()'

View File

@ -8,65 +8,77 @@ version = "0.5.0"
[[AbstractTrees]]
deps = ["Markdown"]
git-tree-sha1 = "8201f932428d25a2e2903300764515754847d87d"
git-tree-sha1 = "33e450545eaf7699da1a6e755f9ea65f14077a45"
uuid = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
version = "0.3.0"
version = "0.3.3"
[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "82dab828020b872fa9efd3abec1152b075bc7cbf"
git-tree-sha1 = "fd04049c7dd78cfef0b06cdc1f0f181467655712"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "1.0.0"
version = "1.1.0"
[[ArrayLayouts]]
deps = ["FillArrays", "LinearAlgebra"]
git-tree-sha1 = "a504dca2ac7eda8761c8f7c1ed52427a1be75a3c"
uuid = "4c555306-a7a7-4459-81d9-ec55ddd5c99a"
version = "0.2.6"
[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
[[BinaryProvider]]
deps = ["Libdl", "SHA"]
git-tree-sha1 = "5b08ed6036d9d3f0ee6369410b830f8873d4024c"
deps = ["Libdl", "Logging", "SHA"]
git-tree-sha1 = "ecdec412a9abc8db54c0efc5548c64dfce072058"
uuid = "b99e7846-7c00-51b0-8f62-c81ae34c0232"
version = "0.5.8"
version = "0.5.10"
[[CEnum]]
git-tree-sha1 = "62847acab40e6855a9b5905ccb99c2b5cf6b3ebb"
git-tree-sha1 = "1b77a77c3b28e0b3f413f7567c9bb8dd9bdccd14"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.2.0"
version = "0.3.0"
[[CUDAapi]]
deps = ["Libdl", "Logging"]
git-tree-sha1 = "56a813440ac98a1aa64672ab460a1512552211a7"
git-tree-sha1 = "831b825d10104bd29e28f6da93312a976830717b"
uuid = "3895d2a7-ec45-59b8-82bb-cfc6a382f9b3"
version = "2.1.0"
version = "4.0.0"
[[CUDAdrv]]
deps = ["CEnum", "CUDAapi", "Printf"]
git-tree-sha1 = "1fce616fa0806c67c133eb1d2f68f0f1a7504665"
git-tree-sha1 = "f56bbf18c86bcff7a961a32a4947a5abb2963a29"
uuid = "c5f51814-7f29-56b8-a69c-e4d8f6be1fde"
version = "5.0.1"
version = "6.3.0"
[[CUDAnative]]
deps = ["Adapt", "CEnum", "CUDAapi", "CUDAdrv", "DataStructures", "InteractiveUtils", "LLVM", "Libdl", "Printf", "TimerOutputs"]
git-tree-sha1 = "6e11d5c2c91fc623952e94c4fb73f9c4db74795a"
deps = ["Adapt", "BinaryProvider", "CEnum", "CUDAapi", "CUDAdrv", "ExprTools", "GPUCompiler", "LLVM", "Libdl", "Pkg", "Printf"]
git-tree-sha1 = "ac86db2b05fdfec96b011e25a504ffe7476e8a68"
uuid = "be33ccc6-a3ff-5ff2-a52e-74243cff1e17"
version = "2.7.0"
version = "3.1.0"
[[CodeTracking]]
deps = ["InteractiveUtils", "UUIDs"]
git-tree-sha1 = "cab4da992adc0a64f63fa30d2db2fd8bec40cab4"
uuid = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
version = "0.5.11"
[[CodecZlib]]
deps = ["BinaryProvider", "Libdl", "TranscodingStreams"]
git-tree-sha1 = "05916673a2627dd91b4969ff8ba6941bc85a960e"
deps = ["TranscodingStreams", "Zlib_jll"]
git-tree-sha1 = "ded953804d019afa9a3f98981d99b33e3db7b6da"
uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
version = "0.6.0"
version = "0.7.0"
[[ColorTypes]]
deps = ["FixedPointNumbers", "Random"]
git-tree-sha1 = "7b62b728a5f3dd6ee3b23910303ccf27e82fad5e"
git-tree-sha1 = "c73d9cfc2a9d8433dc77f5bff4bddf46b1d78c20"
uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
version = "0.8.1"
version = "0.10.3"
[[Colors]]
deps = ["ColorTypes", "FixedPointNumbers", "InteractiveUtils", "Printf", "Reexport"]
git-tree-sha1 = "c9c1845d6bf22e34738bee65c357a69f416ed5d1"
deps = ["ColorTypes", "FixedPointNumbers", "InteractiveUtils", "Reexport"]
git-tree-sha1 = "1e9bba7984e78aa8cdeea7f9f7cc984ad4e4b1c7"
uuid = "5ae59095-9a9b-59fe-a467-6f913c188581"
version = "0.9.6"
version = "0.12.2"
[[CommonSubexpressions]]
deps = ["Test"]
@ -74,22 +86,34 @@ git-tree-sha1 = "efdaf19ab11c7889334ca247ff4c9f7c322817b0"
uuid = "bbf7d656-a473-5ed7-a52c-81e309532950"
version = "0.2.0"
[[CompilerSupportLibraries_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "7c4f882c41faa72118841185afc58a2eb00ef612"
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "0.3.3+0"
[[Cthulhu]]
deps = ["CodeTracking", "InteractiveUtils", "REPL", "UUIDs", "Unicode"]
git-tree-sha1 = "f3643e78353199d3097821e806348bd83f364155"
uuid = "f68482b8-f384-11e8-15f7-abe071a5a75f"
version = "1.1.1"
[[CuArrays]]
deps = ["AbstractFFTs", "Adapt", "CEnum", "CUDAapi", "CUDAdrv", "CUDAnative", "DataStructures", "GPUArrays", "Libdl", "LinearAlgebra", "MacroTools", "NNlib", "Printf", "Random", "Requires", "SparseArrays", "TimerOutputs"]
git-tree-sha1 = "51fbe053dea29ed2513e02d38380007310cf4c4b"
deps = ["AbstractFFTs", "Adapt", "CEnum", "CUDAapi", "CUDAdrv", "CUDAnative", "DataStructures", "GPUArrays", "Libdl", "LinearAlgebra", "MacroTools", "NNlib", "Pkg", "Printf", "Random", "Reexport", "Requires", "SparseArrays", "Statistics", "TimerOutputs"]
git-tree-sha1 = "1582b74d2322df7dd94549d4ac9d095e0f20e884"
uuid = "3a865a2d-5b23-5a0f-bc46-62713ec82fae"
version = "1.6.0"
version = "2.2.1"
[[DataAPI]]
git-tree-sha1 = "674b67f344687a88310213ddfa8a2b3c76cc4252"
git-tree-sha1 = "176e23402d80e7743fc26c19c681bfb11246af32"
uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
version = "1.1.0"
version = "1.3.0"
[[DataStructures]]
deps = ["InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "f784254f428fb8fd7ac15982e5862a38a44523d3"
git-tree-sha1 = "af6d9c86e191c917c2276fbede1137e8ea20157f"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.17.7"
version = "0.17.17"
[[Dates]]
deps = ["Printf"]
@ -107,78 +131,82 @@ version = "1.0.2"
[[DiffRules]]
deps = ["NaNMath", "Random", "SpecialFunctions"]
git-tree-sha1 = "10dca52cf6d4a62d82528262921daf63b99704a2"
git-tree-sha1 = "eb0c34204c8410888844ada5359ac8b96292cfd1"
uuid = "b552c78f-8df3-52c6-915a-8e097449b14b"
version = "1.0.0"
version = "1.0.1"
[[Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
[[FFTW]]
deps = ["AbstractFFTs", "FFTW_jll", "IntelOpenMP_jll", "Libdl", "LinearAlgebra", "MKL_jll", "Reexport"]
git-tree-sha1 = "109d82fa4b00429f9afcce873e9f746f11f018d3"
uuid = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
version = "1.2.0"
[[FFTW_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "05674f209a6e3387dd103a945b0113eeb64b1a58"
uuid = "f5851436-0d7a-5f13-b9de-f02708fd171a"
version = "3.3.9+3"
[[ExprTools]]
git-tree-sha1 = "6f0517056812fd6aa3af23d4b70d5325a2ae4e95"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.1"
[[FillArrays]]
deps = ["LinearAlgebra", "Random", "SparseArrays"]
git-tree-sha1 = "fec413d4fc547992eb62a5c544cedb6d7853c1f5"
git-tree-sha1 = "44f561e293987ffc84272cd3d2b14b0b93123d63"
uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
version = "0.8.4"
version = "0.8.10"
[[FixedPointNumbers]]
git-tree-sha1 = "d14a6fa5890ea3a7e5dcab6811114f132fec2b4b"
git-tree-sha1 = "3ba9ea634d4c8b289d590403b4a06f8e227a6238"
uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
version = "0.6.1"
version = "0.8.0"
[[ForwardDiff]]
deps = ["CommonSubexpressions", "DiffResults", "DiffRules", "NaNMath", "Random", "SpecialFunctions", "StaticArrays"]
git-tree-sha1 = "840700059391d36e2498d89c2e82c08f261f2a2a"
git-tree-sha1 = "869540e4367122fbffaace383a5bdc34d6e5e5ac"
uuid = "f6369f11-7733-5829-9624-2563aa707210"
version = "0.10.8"
version = "0.10.10"
[[Functors]]
deps = ["MacroTools"]
git-tree-sha1 = "f40adc6422f548176bb4351ebd29e4abf773040a"
uuid = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
version = "0.1.0"
[[Future]]
deps = ["Random"]
uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
[[GPUArrays]]
deps = ["AbstractFFTs", "Adapt", "LinearAlgebra", "Printf", "Random", "Serialization"]
git-tree-sha1 = "e756da6cee76a5f1436a05827fa8fdf3badc577f"
git-tree-sha1 = "d887693eb1bd5e1fd573262a978745481895ec7d"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "2.0.1"
version = "3.4.1"
[[GPUCompiler]]
deps = ["Cthulhu", "DataStructures", "InteractiveUtils", "LLVM", "Libdl", "TimerOutputs"]
git-tree-sha1 = "5275aa268ecd09640b32560e1eae90c78816e4d1"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.2.0"
[[IRTools]]
deps = ["InteractiveUtils", "MacroTools", "Test"]
git-tree-sha1 = "72421971e60917b8cd7737f9577c4f0f87eab306"
git-tree-sha1 = "90ee39f9beaaa186e4968417ea2b8ed5673c91c0"
uuid = "7869d1d1-7146-5819-86e3-90919afe41df"
version = "0.3.0"
[[IntelOpenMP_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "fb8e1c7a5594ba56f9011310790e03b5384998d6"
uuid = "1d5cc7b8-4909-519e-a0f8-d0f5ad9712d0"
version = "2018.0.3+0"
version = "0.3.3"
[[InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
[[Juno]]
deps = ["Base64", "Logging", "Media", "Profile", "Test"]
git-tree-sha1 = "30d94657a422d09cb97b6f86f04f750fa9c50df8"
deps = ["Base64", "Logging", "Media", "Profile"]
git-tree-sha1 = "a686b0cf235fa3e491b79b4783c2d2382292b436"
uuid = "e5e0dc1b-0480-54bc-9374-aad01c23163d"
version = "0.7.2"
version = "0.8.2"
[[LLVM]]
deps = ["CEnum", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "1d08d7e4250f452f6cb20e4574daaebfdbee0ff7"
git-tree-sha1 = "dd3f584c3dbefe39b2a8fbafa1a3b77e31e21255"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "1.3.3"
version = "1.5.1"
[[LibGit2]]
deps = ["Printf"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
[[Libdl]]
@ -191,17 +219,11 @@ uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
[[MKL_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "61069ae718b8ab1e325bbfb4e5268902e7ea08e3"
uuid = "856f044c-d86e-5d09-b602-aeab76dc8ba7"
version = "2019.0.117+0"
[[MacroTools]]
deps = ["DataStructures", "Markdown", "Random"]
git-tree-sha1 = "e2fc7a55bb2224e203bbd8b59f72b91323233458"
deps = ["Markdown", "Random"]
git-tree-sha1 = "f7d2e3f654af75f01ec49be82c231c382214223a"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.3"
version = "0.5.5"
[[Markdown]]
deps = ["Base64"]
@ -224,9 +246,9 @@ uuid = "a63ad114-7e13-5084-954f-fe012c677804"
[[NNlib]]
deps = ["BinaryProvider", "Libdl", "LinearAlgebra", "Requires", "Statistics"]
git-tree-sha1 = "135c0de4794d5e214b06f1fb4787af4a72896e61"
git-tree-sha1 = "d9f196d911f55aeaff11b11f681b135980783824"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.6.2"
version = "0.6.6"
[[NaNMath]]
git-tree-sha1 = "928b8ca9b2791081dc71a51c55347c27c618760f"
@ -234,16 +256,15 @@ uuid = "77ba4419-2d1f-58cd-9bb1-8ffee604a2e3"
version = "0.3.3"
[[OpenSpecFun_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "65f672edebf3f4e613ddf37db9dcbd7a407e5e90"
deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
git-tree-sha1 = "d51c416559217d974a1113522d5919235ae67a87"
uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
version = "0.5.3+1"
version = "0.5.3+3"
[[OrderedCollections]]
deps = ["Random", "Serialization", "Test"]
git-tree-sha1 = "c4c13474d23c60d20a67b217f1d7f22a40edf8f1"
git-tree-sha1 = "12ce190210d278e12644bcadf5b21cbdcf225cd3"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.1.0"
version = "1.2.0"
[[Pkg]]
deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
@ -273,9 +294,9 @@ version = "0.2.0"
[[Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "999513b7dea8ac17359ed50ae8ea089e4464e35e"
git-tree-sha1 = "d37400976e98018ee840e0ca4f9d20baa231dc6b"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.0.0"
version = "1.0.1"
[[SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
@ -298,15 +319,15 @@ uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
[[SpecialFunctions]]
deps = ["OpenSpecFun_jll"]
git-tree-sha1 = "268052ee908b2c086cc0011f528694f02f3e2408"
git-tree-sha1 = "d8d8b8a9f4119829410ecd706da4cc8594a1e020"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "0.9.0"
version = "0.10.3"
[[StaticArrays]]
deps = ["LinearAlgebra", "Random", "Statistics"]
git-tree-sha1 = "5a3bcb6233adabde68ebc97be66e95dcb787424c"
git-tree-sha1 = "5c06c0aeb81bef54aed4b3f446847905eb6cbda0"
uuid = "90137ffa-7385-5640-81b9-e52037218182"
version = "0.12.1"
version = "0.12.3"
[[Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
@ -314,9 +335,9 @@ uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
[[StatsBase]]
deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics"]
git-tree-sha1 = "c53e809e63fe5cf5de13632090bc3520649c9950"
git-tree-sha1 = "a6102b1f364befdb05746f386b67c6b7e3262c45"
uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
version = "0.32.0"
version = "0.33.0"
[[Test]]
deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
@ -324,9 +345,9 @@ uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[[TimerOutputs]]
deps = ["Printf"]
git-tree-sha1 = "311765af81bbb48d7bad01fb016d9c328c6ede03"
git-tree-sha1 = "f458ca23ff80e46a630922c555d838303e4b9603"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.3"
version = "0.5.6"
[[TranscodingStreams]]
deps = ["Random", "Test"]
@ -343,21 +364,21 @@ uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
[[ZipFile]]
deps = ["Libdl", "Printf", "Zlib_jll"]
git-tree-sha1 = "5de8320a46812da1a8ca98b16a8a4546d44efa62"
git-tree-sha1 = "254975fef2fc526583bb9b7c9420fe66ffe09f2f"
uuid = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
version = "0.9.0"
version = "0.9.2"
[[Zlib_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "5618a43055eb09377edca21d19d0e99bce24a9c3"
git-tree-sha1 = "a2e0d558f6031002e380a90613b199e37a8565bf"
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
version = "1.2.11+7"
version = "1.2.11+10"
[[Zygote]]
deps = ["DiffRules", "FFTW", "FillArrays", "ForwardDiff", "IRTools", "InteractiveUtils", "LinearAlgebra", "MacroTools", "NNlib", "NaNMath", "Random", "Requires", "SpecialFunctions", "Statistics", "ZygoteRules"]
git-tree-sha1 = "74382bcc4c1e8075e14554da67d75565f8fb7827"
deps = ["AbstractFFTs", "ArrayLayouts", "DiffRules", "FillArrays", "ForwardDiff", "Future", "IRTools", "InteractiveUtils", "LinearAlgebra", "MacroTools", "NNlib", "NaNMath", "Random", "Requires", "SpecialFunctions", "Statistics", "ZygoteRules"]
git-tree-sha1 = "707ceea58e2bd0ff3077ab13a92f8355181d3ee4"
uuid = "e88e6eb3-aa80-5325-afca-941959d7151f"
version = "0.4.5"
version = "0.4.20"
[[ZygoteRules]]
deps = ["MacroTools"]

16
NEWS.md
View File

@ -1,3 +1,19 @@
# v0.11
* Change to `DataLoader`'s constructor [https://github.com/FluxML/Flux.jl/pull/1152]
* Use `DataLoader` with `NamedTuple`s, so that tensors can be accessed by name [https://github.com/FluxML/Flux.jl/pull/1221].
* Error if Dense layers weights and biases are not arrays [https://github.com/FluxML/Flux.jl/pull/1218].
# v0.10.5
* Add option for [same padding](https://github.com/FluxML/Flux.jl/pull/901) to conv and pooling layers by setting `pad=SamePad()`.
* Added option to set `bias` to [Flux.Zeros](https://github.com/FluxML/Flux.jl/pull/873) to eliminating `bias` from being trained.
* Added `GlobalMaxPool` and `GlobalMeanPool` [layers](https://github.com/FluxML/Flux.jl/pull/950) for performing global pooling operations.
* Added `ClipValue` and `ClipNorm` in this [pr](https://github.com/FluxML/Flux.jl/pull/1133) to `Flux.Optimise` to provide a cleaner API for gradient clipping.
* Added new kwarg-only [constructors](https://github.com/FluxML/Flux.jl/pull/873) for the various convolutional layers.
* Documented the convolutional layer constructors accepting `weight` and `bias` keyword arguments to supply custom arrays for those fields.
* Testing suite improvements now test for gradients of all layers along with GPU support.
* Functors have now moved to [Functors.jl](https://github.com/FluxML/Flux.jl/pull/1174) to allow for their use outside of Flux.
* Added [helper functions](https://github.com/FluxML/Flux.jl/pull/873) `Flux.convfilter` and `Flux.depthwiseconvfilter` to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations.
# v0.10.0
* The default AD engine has switched from [Tracker to Zygote.jl](https://github.com/FluxML/Flux.jl/pull/669)
- The dependency on Tracker.jl has been removed.

View File

@ -1,6 +1,6 @@
name = "Flux"
uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c"
version = "0.10.1"
version = "0.11.0-DEV"
[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
@ -9,7 +9,9 @@ CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
CuArrays = "3a865a2d-5b23-5a0f-bc46-62713ec82fae"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
Juno = "e5e0dc1b-0480-54bc-9374-aad01c23163d"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
@ -25,22 +27,25 @@ Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
[compat]
AbstractTrees = "0.2, 0.3"
Adapt = "1"
CodecZlib = "0.5, 0.6"
Colors = "0.8, 0.9"
CuArrays = "1.6"
Juno = "0.5, 0.6, 0.7"
Adapt = "1, 2.0"
CodecZlib = "0.5, 0.6, 0.7"
Colors = "0.8, 0.9, 0.10, 0.11, 0.12"
CuArrays = "2"
Functors = "0.1"
Juno = "0.5, 0.6, 0.7, 0.8"
MacroTools = "0.3, 0.4, 0.5"
NNlib = "0.6"
Reexport = "0.2"
StatsBase = "0"
ZipFile = "0.7, 0.8, 0.9"
Zygote = "0.4"
julia = "1"
Zygote = "0.4.13"
julia = "1.3"
[extras]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
IterTools = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[targets]
test = ["Test", "Documenter"]
test = ["Test", "Documenter", "IterTools", "LinearAlgebra"]

View File

@ -1,89 +0,0 @@
# This file is machine-generated - editing it directly is not advised
[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
[[Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
[[Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
[[DocStringExtensions]]
deps = ["LibGit2", "Markdown", "Pkg", "Test"]
git-tree-sha1 = "0513f1a8991e9d83255e0140aace0d0fc4486600"
uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
version = "0.8.0"
[[Documenter]]
deps = ["Base64", "DocStringExtensions", "InteractiveUtils", "JSON", "LibGit2", "Logging", "Markdown", "REPL", "Test", "Unicode"]
git-tree-sha1 = "c61d6eedbc3c4323c08b64af12d29c8ee0fcbb5f"
uuid = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
version = "0.23.2"
[[InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
[[JSON]]
deps = ["Dates", "Mmap", "Parsers", "Unicode"]
git-tree-sha1 = "b34d7cef7b337321e97d22242c3c2b91f476748e"
uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
version = "0.21.0"
[[LibGit2]]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
[[Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
[[Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"
[[Parsers]]
deps = ["Dates", "Test"]
git-tree-sha1 = "db2b35dedab3c0e46dc15996d170af07a5ab91c9"
uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
version = "0.3.6"
[[Pkg]]
deps = ["Dates", "LibGit2", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
[[Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
[[REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
[[Random]]
deps = ["Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
[[SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
[[Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
[[Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
[[Test]]
deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[[UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
[[Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

View File

@ -1,2 +1,6 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
[compat]
Documenter = "0.24"

View File

@ -1,29 +1,36 @@
using Pkg;
Pkg.activate(joinpath(@__DIR__, "..")); Pkg.instantiate()
Pkg.activate(); Pkg.instantiate()
pushfirst!(LOAD_PATH, joinpath(@__DIR__, ".."))
using Documenter, Flux, NNlib
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
makedocs(modules=[Flux, NNlib],
doctest = VERSION >= v"1.4",
sitename = "Flux",
pages = ["Home" => "index.md",
"Building Models" =>
["Basics" => "models/basics.md",
"Recurrence" => "models/recurrence.md",
"Regularisation" => "models/regularisation.md",
"Model Reference" => "models/layers.md"],
"Model Reference" => "models/layers.md",
"Advanced Model Building" => "models/advanced.md",
"NNlib" => "models/nnlib.md"],
"Handling Data" =>
["One-Hot Encoding" => "data/onehot.md",
"DataLoader" => "data/dataloader.md"],
"Training Models" =>
["Optimisers" => "training/optimisers.md",
"Training" => "training/training.md"],
"One-Hot Encoding" => "data/onehot.md",
"GPU Support" => "gpu.md",
"Saving & Loading" => "saving.md",
"The Julia Ecosystem" => "ecosystem.md",
"Utility Functions" => "utilities.md",
"Performance Tips" => "performance.md",
"Datasets" => "datasets.md",
"Community" => "community.md"],
format = Documenter.HTML(assets = ["assets/flux.css"],
analytics = "UA-36890222-9",
prettyurls = haskey(ENV, "CI")))
format = Documenter.HTML(
analytics = "UA-36890222-9",
assets = ["assets/flux.css"],
prettyurls = get(ENV, "CI", nothing) == "true"),
)
deploydocs(repo = "github.com/FluxML/Flux.jl.git")
deploydocs(repo = "github.com/FluxML/Flux.jl.git",
target = "build",
push_preview = true)

View File

@ -0,0 +1,6 @@
# DataLoader
Flux provides the `DataLoader` type in the `Flux.Data` module to handle iteration over mini-batches of data.
```@docs
Flux.Data.DataLoader
```

View File

@ -7,15 +7,15 @@ julia> using Flux: onehot, onecold
julia> onehot(:b, [:a, :b, :c])
3-element Flux.OneHotVector:
false
true
false
0
1
0
julia> onehot(:c, [:a, :b, :c])
3-element Flux.OneHotVector:
false
false
true
0
0
1
```
The inverse is `onecold` (which can take a general probability distribution, as well as just booleans).
@ -31,6 +31,11 @@ julia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])
:c
```
```@docs
Flux.onehot
Flux.onecold
```
## Batches
`onehotbatch` creates a batch (matrix) of one-hot vectors, and `onecold` treats matrices as batches.
@ -52,3 +57,7 @@ julia> onecold(ans, [:a, :b, :c])
```
Note that these operations returned `OneHotVector` and `OneHotMatrix` rather than `Array`s. `OneHotVector`s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.
```@docs
Flux.onehotbatch
```

20
docs/src/datasets.md Normal file
View File

@ -0,0 +1,20 @@
# Datasets
Flux includes several standard machine learning datasets.
```@docs
Flux.Data.Iris.features()
Flux.Data.Iris.labels()
Flux.Data.MNIST.images()
Flux.Data.MNIST.labels()
Flux.Data.FashionMNIST.images()
Flux.Data.FashionMNIST.labels()
Flux.Data.CMUDict.phones()
Flux.Data.CMUDict.symbols()
Flux.Data.CMUDict.rawdict()
Flux.Data.CMUDict.cmudict()
Flux.Data.Sentiment.train()
Flux.Data.Sentiment.test()
Flux.Data.Sentiment.dev()
```

21
docs/src/ecosystem.md Normal file
View File

@ -0,0 +1,21 @@
# The Julia Ecosystem
One of the main strengths of Julia lies in an ecosystem of packages
globally providing a rich and consistent user experience.
This is a non-exhaustive list of Julia packages, nicely complementing `Flux` in typical
machine learning and deep learning workflows:
- [ArgParse.jl](https://github.com/carlobaldassi/ArgParse.jl): package for parsing command-line arguments to Julia programs.
- [Augmentor.jl](https://github.com/Evizero/Augmentor.jl): a fast image augmentation library in Julia for machine learning.
- [BSON.jl](https://github.com/JuliaIO/BSON.jl): package for working with the Binary JSON serialisation format
- [DataFrames.jl](https://github.com/joshday/OnlineStats.jl): in-memory tabular data in Julia
- [DrWatson.jl](https://github.com/JuliaDynamics/DrWatson.jl): a scientific project assistant software
- [MLDatasets.jl](https://github.com/JuliaML/MLDatasets.jl): utility package for accessing common machine learning datasets
- [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl): single-pass algorithms for statistics
- [Parameters.jl](https://github.com/mauro3/Parameters.jl): types with default field values, keyword constructors and (un-)pack macros
- [ProgressMeters.jl](https://github.com/timholy/ProgressMeter.jl): progress meters for long-running computations
- [TensorBoardLogger.jl](https://github.com/PhilipVinc/TensorBoardLogger.jl): easy peasy logging to [tensorboard](https://www.tensorflow.org/tensorboard) in Julia
This tight integration among Julia pakages is shown in some of the examples in the [model-zoo](https://github.com/FluxML/model-zoo) repository.

View File

@ -30,7 +30,7 @@ If you define a structured model, like a `Dense` layer or `Chain`, you just need
```julia
d = Dense(10, 5, σ)
d = fmap(cu, d)
d.W # Tracked CuArray
d.W # CuArray
d(cu(rand(10))) # CuArray output
m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
@ -53,7 +53,7 @@ julia> x = rand(10) |> gpu
0.511655
julia> m(x)
Tracked 5-element CuArray{Float32,1}:
5-element CuArray{Float32,1}:
-0.30535
-0.618002

View File

@ -0,0 +1,73 @@
# Advanced Model Building and Customisation
Here we will try and describe usage of some more advanced features that Flux provides to give more control over model building.
## Customising Parameter Collection for a Model
Taking reference from our example `Affine` layer from the [basics](basics.md#Building-Layers-1).
By default all the fields in the `Affine` type are collected as its parameters, however, in some cases it may be desired to hold other metadata in our "layers" that may not be needed for training, and are hence supposed to be ignored while the parameters are collected. With Flux, it is possible to mark the fields of our layers that are trainable in two ways.
The first way of achieving this is through overloading the `trainable` function.
```julia-repl
julia> @functor Affine
julia> a = Affine(rand(3,3), rand(3))
Affine{Array{Float64,2},Array{Float64,1}}([0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955])
julia> Flux.params(a) # default behavior
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955]])
julia> Flux.trainable(a::Affine) = (a.W,)
julia> Flux.params(a)
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297]])
```
Only the fields returned by `trainable` will be collected as trainable parameters of the layer when calling `Flux.params`.
Another way of achieving this is through the `@functor` macro directly. Here, we can mark the fields we are interested in by grouping them in the second argument:
```julia
Flux.@functor Affine (W,)
```
However, doing this requires the `struct` to have a corresponding constructor that accepts those parameters.
## Freezing Layer Parameters
When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`.
Consider a simple multi-layer perceptron model where we want to avoid optimising the first two `Dense` layers. We can obtain
this using the slicing features `Chain` provides:
```julia
m = Chain(
Dense(784, 64, relu),
Dense(64, 64, relu),
Dense(32, 10)
)
ps = Flux.params(m[3:end])
```
The `Zygote.Params` object `ps` now holds a reference to only the parameters of the layers passed to it.
During training, the gradients will only be computed for (and applied to) the last `Dense` layer, therefore only that would have its parameters changed.
`Flux.params` also takes multiple inputs to make it easy to collect parameters from heterogenous models with a single call. A simple demonstration would be if we wanted to omit optimising the second `Dense` layer in the previous example. It would look something like this:
```julia
Flux.params(m[1], m[3:end])
```
Sometimes, a more fine-tuned control is needed.
We can freeze a specific parameter of a specific layer which already entered a `Params` object `ps`,
by simply deleting it from `ps`:
```julia
ps = params(m)
delete!(ps, m[2].b)
```

View File

@ -32,8 +32,6 @@ julia> gradient(f, [2, 1], [2, 0])
But machine learning models can have *hundreds* of parameters! To handle this, Flux lets you work with collections of parameters, via `params`. You can get the gradient of all parameters used in a program without explicitly passing them in.
```jldoctest basics
julia> using Flux
julia> x = [2, 1];
julia> y = [2, 0];
@ -69,8 +67,8 @@ b = rand(2)
predict(x) = W*x .+ b
function loss(x, y)
= predict(x)
sum((y .- ).^2)
ŷ = predict(x)
sum((y .- ŷ).^2)
end
x, y = rand(5), rand(2) # Dummy data
@ -219,3 +217,26 @@ Flux.@functor Affine
```
This enables a useful extra set of functionality for our `Affine` layer, such as [collecting its parameters](../training/optimisers.md) or [moving it to the GPU](../gpu.md).
For some more helpful tricks, including parameter freezing, please checkout the [advanced usage guide](advanced.md).
## Utility functions
Flux provides some utility functions to help you generate models in an automated fashion.
`outdims` enables you to calculate the spatial output dimensions of layers like `Conv` when applied to input images of a given size.
Currently limited to the following layers:
- `Chain`
- `Dense`
- `Conv`
- `Diagonal`
- `Maxout`
- `ConvTranspose`
- `DepthwiseConv`
- `CrossCor`
- `MaxPool`
- `MeanPool`
```@docs
Flux.outdims
```

View File

@ -14,10 +14,17 @@ These layers are used to build convolutional neural networks (CNNs).
```@docs
Conv
MaxPool
GlobalMaxPool
MeanPool
GlobalMeanPool
DepthwiseConv
ConvTranspose
CrossCor
SamePad
flatten
Flux.Zeros
Flux.convfilter
Flux.depthwiseconvfilter
```
## Recurrent Layers
@ -29,6 +36,7 @@ RNN
LSTM
GRU
Flux.Recur
Flux.reset!
```
## Other General Purpose Layers
@ -40,28 +48,45 @@ Maxout
SkipConnection
```
## Activation Functions
Non-linearities that go between layers of your model. Most of these functions are defined in [NNlib](https://github.com/FluxML/NNlib.jl) but are available by default in Flux.
Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call `σ.(xs)`, `relu.(xs)` and so on.
```@docs
σ
relu
leakyrelu
elu
swish
```
## Normalisation & Regularisation
These layers don't affect the structure of the network but may improve training times or reduce overfitting.
```@docs
Flux.normalise
BatchNorm
Flux.dropout
Dropout
AlphaDropout
LayerNorm
InstanceNorm
GroupNorm
```
### Testmode
Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `Flux.testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
```@docs
Flux.testmode!
trainmode!
```
## Cost Functions
```@docs
Flux.mae
Flux.mse
Flux.msle
Flux.huber_loss
Flux.crossentropy
Flux.logitcrossentropy
Flux.binarycrossentropy
Flux.logitbinarycrossentropy
Flux.kldivergence
Flux.poisson
Flux.hinge
Flux.squared_hinge
Flux.dice_coeff_loss
Flux.tversky_loss
```

61
docs/src/models/nnlib.md Normal file
View File

@ -0,0 +1,61 @@
# NNlib
Flux re-exports all of the functions exported by the [NNlib](https://github.com/FluxML/NNlib.jl) package.
## Activation Functions
Non-linearities that go between layers of your model. Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call `σ.(xs)`, `relu.(xs)` and so on.
```@docs
NNlib.celu
NNlib.elu
NNlib.gelu
NNlib.hardsigmoid
NNlib.hardtanh
NNlib.leakyrelu
NNlib.lisht
NNlib.logcosh
NNlib.logsigmoid
NNlib.mish
NNlib.relu
NNlib.relu6
NNlib.rrelu
NNlib.selu
NNlib.sigmoid
NNlib.softplus
NNlib.softshrink
NNlib.softsign
NNlib.swish
NNlib.tanhshrink
NNlib.trelu
```
## Softmax
```@docs
NNlib.softmax
NNlib.logsoftmax
```
## Pooling
```@docs
NNlib.maxpool
NNlib.meanpool
```
## Convolution
```@docs
NNlib.conv
NNlib.depthwiseconv
```
## Batched Operations
```@docs
NNlib.batched_mul
NNlib.batched_mul!
NNlib.batched_adjoint
NNlib.batched_transpose
```

View File

@ -31,7 +31,7 @@ julia> params(m)
param([0.0, 0.0, 0.0, 0.0, 0.0])
julia> sum(norm, params(m))
26.01749952921026 (tracked)
26.01749952921026
```
Here's a larger example with a multi-layer perceptron.
@ -52,7 +52,7 @@ One can also easily add per-layer regularisation via the `activations` function:
```julia
julia> using Flux: activations
julia> c = Chain(Dense(10,5,σ),Dense(5,2),softmax)
julia> c = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
julia> activations(c, rand(10))
@ -64,3 +64,7 @@ julia> activations(c, rand(10))
julia> sum(norm, ans)
2.1166067f0
```
```@docs
Flux.activations
```

View File

@ -4,7 +4,7 @@ All the usual [Julia performance tips apply](https://docs.julialang.org/en/v1/ma
As always [profiling your code](https://docs.julialang.org/en/v1/manual/profile/#Profiling-1) is generally a useful way of finding bottlenecks.
Below follow some Flux specific tips/reminders.
## Don't use more precision than you need.
## Don't use more precision than you need
Flux works great with all kinds of number types.
But often you do not need to be working with say `Float64` (let alone `BigFloat`).
@ -14,7 +14,8 @@ Which means allocations occur much faster.
And you use less memory.
## Make sure your activation and loss functions preserve the type of their inputs
## Preserve inputs' types
Not only should your activation and loss functions be [type-stable](https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1),
they should also preserve the type of their inputs.
@ -29,31 +30,29 @@ because it results in having to use slow mixed type multiplication in the dense
Similar situations can occur in the loss function during backpropagation.
Which means if you change your data say from `Float64` to `Float32` (which should give a speedup: see above),
you will see a large slow-down
you will see a large slow-down.
This can occur sneakily, because you can cause type-promotion by interacting with a numeric literals.
E.g. the following will have run into the same problem as above:
```
leaky_tanh(x) = 0.01x + tanh(x)
leaky_tanh(x) = 0.01*x + tanh(x)
```
While one could change your activation function (e.g. to use `0.01f0x`) to avoid this when ever your inputs change,
the idiomatic (and safe way) is to use `oftype`.
While one could change the activation function (e.g. to use `0.01f0*x`), the idiomatic (and safe way) to avoid type casts whenever inputs changes is to use `oftype`:
```
leaky_tanh(x) = oftype(x/1, 0.01)x + tanh(x)
leaky_tanh(x) = oftype(x/1, 0.01)*x + tanh(x)
```
## Evaluate batches as Matrices of features, rather than sequences of Vector features
## Evaluate batches as Matrices of features
While it can sometimes be tempting to process your observations (feature vectors) one at a time
e.g.
```julia
function loss_total(xs::AbstractVector{<:Vector}, ys::AbstractVector{<:Vector})
sum(zip(xs, ys)) do (x, y_target)
y_pred = model(x) # evaluate the model
y_pred = model(x) # evaluate the model
return loss(y_pred, y_target)
end
end

View File

@ -21,7 +21,7 @@ grads = gradient(() -> loss(x, y), θ)
We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:
```julia
using Flux: update!
using Flux.Optimise: update!
η = 0.1 # Learning Rate
for p in (W, b)
@ -46,11 +46,13 @@ An optimiser `update!` accepts a parameter and a gradient, and updates the param
All optimisers return an object that, when passed to `train!`, will update the parameters passed to it.
```@docs
Flux.Optimise.update!
Descent
Momentum
Nesterov
RMSProp
ADAM
RADAM
AdaMax
ADAGrad
ADADelta
@ -61,7 +63,7 @@ ADAMW
## Optimiser Interface
Flux's optimsers are built around a `struct` that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the `apply!` function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.
Flux's optimisers are built around a `struct` that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the `apply!` function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.
In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.
@ -78,7 +80,7 @@ Momentum(eta::Real, rho::Real) = Momentum(eta, rho, IdDict())
The `Momentum` type will act as our optimiser in this case. Notice that we have added all the parameters as fields, along with the velocity which we will use as our state dictionary. Each parameter in our models will get an entry in there. We can now define the rule applied when this optimiser is invoked.
```julia
function apply!(o::Momentum, x, Δ)
function Flux.Optimise.apply!(o::Momentum, x, Δ)
η, ρ = o.eta, o.rho
v = get!(o.velocity, x, zero(x))::typeof(x)
@. v = ρ * v - η * Δ
@ -99,15 +101,15 @@ Flux internally calls on this function via the `update!` function. It shares the
## Composing Optimisers
Flux defines a special kind of optimiser called simply as `Optimiser` which takes in a arbitrary optimisers as input. Its behaviour is similar to the usual optimisers, but differs in that it acts by calling the optimisers listed in it sequentially. Each optimiser produces a modified gradient
Flux defines a special kind of optimiser simply called `Optimiser` which takes in arbitrary optimisers as input. Its behaviour is similar to the usual optimisers, but differs in that it acts by calling the optimisers listed in it sequentially. Each optimiser produces a modified gradient
that will be fed into the next, and the resultant update will be applied to the parameter as usual. A classic use case is where adding decays is desirable. Flux defines some basic decays including `ExpDecay`, `InvDecay` etc.
```julia
opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())
```
Here we apply exponential decay to the `Descent` optimser. The defaults of `ExpDecay` say that its learning rate will be decayed every 1000 steps.
It is then applied like any optimser.
Here we apply exponential decay to the `Descent` optimiser. The defaults of `ExpDecay` say that its learning rate will be decayed every 1000 steps.
It is then applied like any optimiser.
```julia
w = randn(10, 10)
@ -138,3 +140,16 @@ ExpDecay
InvDecay
WeightDecay
```
## Gradient Clipping
Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is
```julia
opt = Optimiser(ClipValue(1e-3), ADAM(1e-3))
```
```@docs
ClipValue
ClipNorm
```

View File

@ -7,10 +7,10 @@ To actually train a model we need four things:
* A collection of data points that will be provided to the objective function.
* An [optimiser](optimisers.md) that will update the model parameters appropriately.
With these we can call `Flux.train!`:
With these we can call `train!`:
```julia
Flux.train!(objective, params, data, opt)
```@docs
Flux.Optimise.train!
```
There are plenty of examples in the [model zoo](https://github.com/FluxML/model-zoo).
@ -32,6 +32,7 @@ Flux.train!(loss, ps, data, opt)
```
The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
For a list of all built-in loss functions, check out the [layer reference](../models/layers.md).
At first glance it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility, and the possibility of optimizing the calculations.
@ -41,6 +42,8 @@ The model to be trained must have a set of tracked parameters that are used to c
Such an object contains a reference to the model's parameters, not a copy, such that after their training, the model behaves according to their updated values.
Handling all the parameters on a layer by layer basis is explained in the [Layer Helpers](../models/basics.md) section. Also, for freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).
## Datasets
The `data` argument provides a collection of data to train with (usually a set of inputs `x` and target outputs `y`). For example, here's a dummy data set with only one data point:
@ -56,7 +59,8 @@ data = [(x, y)]
```julia
data = [(x, y), (x, y), (x, y)]
# Or equivalently
data = Iterators.repeated((x, y), 3)
using IterTools: ncycle
data = ncycle([(x, y)], 3)
```
It's common to load the `x`s and `y`s separately. In this case you can use `zip`:
@ -67,6 +71,14 @@ ys = [rand( 10), rand( 10), rand( 10)]
data = zip(xs, ys)
```
Training data can be conveniently partitioned for mini-batch training using the [`Flux.Data.DataLoader`](@ref) type:
```julia
X = rand(28, 28, 60000)
Y = rand(0:9, 60000)
data = DataLoader(X, Y, batchsize=128)
```
Note that, by default, `train!` only loops over the data once (a single "epoch").
A convenient way to run multiple epochs from the REPL is provided by `@epochs`.
@ -83,6 +95,10 @@ julia> @epochs 2 Flux.train!(...)
# Train for two epochs
```
```@docs
Flux.@epochs
```
## Callbacks
`train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example:
@ -110,3 +126,30 @@ cb = function ()
accuracy() > 0.9 && Flux.stop()
end
```
## Custom Training loops
The `Flux.train!` function can be very convenient, especially for simple problems.
Its also very flexible with the use of callbacks.
But for some problems its much cleaner to write your own custom training loop.
An example follows that works similar to the default `Flux.train` but with no callbacks.
You don't need callbacks if you just code the calls to your functions directly into the loop.
E.g. in the places marked with comments.
```julia
function my_custom_train!(loss, ps, data, opt)
ps = Params(ps)
for d in data
gs = gradient(ps) do
training_loss = loss(d...)
# Insert whatever code you want here that needs Training loss, e.g. logging
return training_loss
end
# insert what ever code you want here that needs gradient
# E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge
update!(opt, ps, gs)
# Here you might like to check validation set accuracy, and break out to do early stopping
end
end
```
You could simplify this further, for example by hard-coding in the loss function.

49
docs/src/utilities.md Normal file
View File

@ -0,0 +1,49 @@
# Utility Functions
Flux contains some utility functions for working with data; these functions
help create inputs for your models or batch your dataset.
Other functions can be used to initialize your layers or to regularly execute
callback functions.
## Working with Data
```@docs
Flux.unsqueeze
Flux.stack
Flux.unstack
Flux.chunk
Flux.frequencies
Flux.batch
Flux.batchseq
Base.rpad(v::AbstractVector, n::Integer, p)
```
## Layer Initialization
These are primarily useful if you are planning to write your own layers.
Flux initializes convolutional layers and recurrent cells with `glorot_uniform`
by default.
To change the default on an applicable layer, pass the desired function with the
`init` keyword. For example:
```jldoctest; setup = :(using Flux)
julia> conv = Conv((3, 3), 1 => 8, relu; init=Flux.glorot_normal)
Conv((3, 3), 1=>8, relu)
```
```@docs
Flux.glorot_uniform
Flux.glorot_normal
```
## Model Abstraction
```@docs
Flux.destructure
```
## Callback Helpers
```@docs
Flux.throttle
Flux.stop
```

View File

@ -3,28 +3,33 @@ module Flux
# Zero Flux Given
using Base: tail
using Zygote, MacroTools, Juno, Reexport, Statistics, Random
using Statistics, Random, LinearAlgebra
using Zygote, MacroTools, Juno, Reexport
using MacroTools: @forward
@reexport using NNlib
using Zygote: Params, @adjoint, gradient, pullback, @nograd
export gradient
export Chain, Dense, Maxout, RNN, LSTM, GRU, Conv, CrossCor, ConvTranspose, MaxPool, MeanPool,
export Chain, Dense, Maxout, RNN, LSTM, GRU, SamePad, Conv, CrossCor, ConvTranspose,
GlobalMaxPool, GlobalMeanPool, MaxPool, MeanPool, flatten,
DepthwiseConv, Dropout, AlphaDropout, LayerNorm, BatchNorm, InstanceNorm, GroupNorm,
SkipConnection, params, fmap, cpu, gpu, f32, f64
SkipConnection, params, fmap, cpu, gpu, f32, f64, testmode!, trainmode!
include("optimise/Optimise.jl")
using .Optimise
using .Optimise: @epochs
export SGD, Descent, ADAM, Momentum, Nesterov, RMSProp,
export Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM,
ADAMW, RADAM, InvDecay, ExpDecay, WeightDecay
ADAMW, RADAM, InvDecay, ExpDecay, WeightDecay,
ClipValue, ClipNorm
using CuArrays
const use_cuda = Ref(false)
include("utils.jl")
include("zeros.jl")
include("onehot.jl")
include("functor.jl")
@ -38,24 +43,13 @@ include("data/Data.jl")
include("deprecations.jl")
include("cuda/cuda.jl")
function __init__()
precompiling = ccall(:jl_generating_output, Cint, ()) != 0
# we don't want to include the CUDA module when precompiling,
# or we could end up replacing it at run time (triggering a warning)
precompiling && return
if !CuArrays.functional()
# nothing to do here, and either CuArrays or one of its dependencies will have warned
else
use_cuda[] = true
# FIXME: this functionality should be conditional at run time by checking `use_cuda`
# (or even better, get moved to CuArrays.jl as much as possible)
if CuArrays.has_cudnn()
include(joinpath(@__DIR__, "cuda/cuda.jl"))
else
@warn "CuArrays.jl did not find libcudnn. Some functionality will not be available."
use_cuda[] = CuArrays.functional() # Can be overridden after load with `Flux.use_cuda[] = false`
if CuArrays.functional()
if !CuArrays.has_cudnn()
@warn "CuArrays.jl found cuda, but did not find libcudnn. Some functionality will not be available."
end
end
end

View File

@ -1,6 +1,5 @@
import ..Flux: Flux, relu
using CuArrays.CUDAnative
using CuArrays: @cuindex, cudims
CuRNN{T} = Flux.RNNCell{<:Union{typeof(tanh),typeof(relu)},<:CuArray{T,2},<:CuArray{T,1}}
CuGRU{T} = Flux.GRUCell{<:CuArray{T,2},<:CuArray{T,1}}

View File

@ -3,6 +3,9 @@ module Data
import ..Flux
import SHA
using Random: shuffle!
using Base: @propagate_inbounds
export CMUDict, cmudict
deps(path...) = joinpath(@__DIR__, "..", "..", "deps", path...)
@ -26,6 +29,9 @@ function __init__()
mkpath(deps())
end
include("dataloader.jl")
export DataLoader
include("mnist.jl")
export MNIST
@ -42,4 +48,9 @@ using .Sentiment
include("iris.jl")
export Iris
include("housing.jl")
export Housing
@deprecate DataLoader(x...; kws...) DataLoader(x; kws...)
end

View File

@ -24,18 +24,35 @@ function load()
end
end
"""
phones()
Return a `Vector` containing the phones used in the CMU Pronouncing Dictionary.
"""
function phones()
load()
Symbol.(first.(split.(split(read(deps("cmudict", "cmudict.phones"),String),
"\n", keepempty = false), "\t")))
end
"""
symbols()
Return a `Vector` containing the symbols used in the CMU Pronouncing Dictionary.
A symbol is a phone with optional auxiliary symbols, indicating for example the
amount of stress on the phone.
"""
function symbols()
load()
Symbol.(split(read(deps("cmudict", "cmudict.symbols"),String),
"\n", keepempty = false))
end
"""
rawdict()
Return the unfiltered CMU Pronouncing Dictionary.
"""
function rawdict()
load()
Dict(String(xs[1]) => Symbol.(xs[2:end]) for xs in
@ -44,6 +61,14 @@ end
validword(s) = isascii(s) && occursin(r"^[\w\-\.]+$", s)
"""
cmudict()
Return a filtered CMU Pronouncing Dictionary.
It is filtered so each word contains only ASCII characters and a combination of
word characters (as determined by the regex engine using `\\w`), '-' and '.'.
"""
cmudict() = filter(p -> validword(p.first), rawdict())
alphabet() = ['A':'Z'..., '0':'9'..., '_', '-', '.']

110
src/data/dataloader.jl Normal file
View File

@ -0,0 +1,110 @@
# Adapted from Knet's src/data.jl (author: Deniz Yuret)
struct DataLoader{D}
data::D
batchsize::Int
nobs::Int
partial::Bool
imax::Int
indices::Vector{Int}
shuffle::Bool
end
"""
DataLoader(data; batchsize=1, shuffle=false, partial=true)
An object that iterates over mini-batches of `data`, each mini-batch containing `batchsize` observations
(except possibly the last one).
Takes as input a single data tensor, or a tuple (or a named tuple) of tensors.
The last dimension in each tensor is considered to be the observation dimension.
If `shuffle=true`, shuffles the observations each time iterations are re-started.
If `partial=false`, drops the last mini-batch if it is smaller than the batchsize.
The original data is preserved in the `data` field of the DataLoader.
Usage example:
Xtrain = rand(10, 100)
train_loader = DataLoader(Xtrain, batchsize=2)
# iterate over 50 mini-batches of size 2
for x in train_loader
@assert size(x) == (10, 2)
...
end
train_loader.data # original dataset
# similar, but yielding tuples
train_loader = DataLoader((Xtrain,), batchsize=2)
for (x,) in train_loader
@assert size(x) == (10, 2)
...
end
Xtrain = rand(10, 100)
Ytrain = rand(100)
train_loader = DataLoader((Xtrain, Ytrain), batchsize=2, shuffle=true)
for epoch in 1:100
for (x, y) in train_loader
@assert size(x) == (10, 2)
@assert size(y) == (2,)
...
end
end
# train for 10 epochs
using IterTools: ncycle
Flux.train!(loss, ps, ncycle(train_loader, 10), opt)
# can use NamedTuple to name tensors
train_loader = DataLoader((images=Xtrain, labels=Ytrain), batchsize=2, shuffle=true)
for datum in train_loader
@assert size(datum.images) == (10, 2)
@assert size(datum.labels) == (2,)
end
"""
function DataLoader(data; batchsize=1, shuffle=false, partial=true)
batchsize > 0 || throw(ArgumentError("Need positive batchsize"))
n = _nobs(data)
if n < batchsize
@warn "Number of observations less than batchsize, decreasing the batchsize to $n"
batchsize = n
end
imax = partial ? n : n - batchsize + 1
DataLoader(data, batchsize, n, partial, imax, [1:n;], shuffle)
end
@propagate_inbounds function Base.iterate(d::DataLoader, i=0) # returns data in d.indices[i+1:i+batchsize]
i >= d.imax && return nothing
if d.shuffle && i == 0
shuffle!(d.indices)
end
nexti = min(i + d.batchsize, d.nobs)
ids = d.indices[i+1:nexti]
batch = _getobs(d.data, ids)
return (batch, nexti)
end
function Base.length(d::DataLoader)
n = d.nobs / d.batchsize
d.partial ? ceil(Int,n) : floor(Int,n)
end
_nobs(data::AbstractArray) = size(data)[end]
function _nobs(data::Union{Tuple, NamedTuple})
length(data) > 0 || throw(ArgumentError("Need at least one data input"))
n = _nobs(data[1])
if !all(x -> _nobs(x) == n, Base.tail(data))
throw(DimensionMismatch("All data should contain same number of observations"))
end
return n
end
_getobs(data::AbstractArray, i) = data[ntuple(i -> Colon(), Val(ndims(data) - 1))..., i]
_getobs(data::Union{Tuple, NamedTuple}, i) = map(Base.Fix2(_getobs, i), data)
Base.eltype(::DataLoader{D}) where D = D

View File

@ -33,9 +33,10 @@ const TESTLABELS = joinpath(dir, "t10k-labels-idx1-ubyte")
Load the Fashion-MNIST images.
Each image is a 28×28 array of `Gray` colour values (see Colors.jl).
Each image is a 28×28 array of `Gray` colour values
(see [Colors.jl](https://github.com/JuliaGraphics/Colors.jl)).
Returns the 60,000 training images by default; pass `:test` to retreive the
Return the 60,000 training images by default; pass `:test` to retrieve the
10,000 test images.
"""
function images(set = :train)
@ -49,10 +50,10 @@ end
labels()
labels(:test)
Load the labels corresponding to each of the images returned from `images()`.
Load the labels corresponding to each of the images returned from [`images()`](@ref).
Each label is a number from 0-9.
Returns the 60,000 training labels by default; pass `:test` to retreive the
Return the 60,000 training labels by default; pass `:test` to retrieve the
10,000 test labels.
"""
function labels(set = :train)

136
src/data/housing.jl Normal file
View File

@ -0,0 +1,136 @@
"""
1. Title: Boston Housing Data
2. Sources:
(a) Origin: This dataset was taken from the StatLib library which is
maintained at Carnegie Mellon University.
(b) Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the
demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.
(c) Date: July 7, 1993
3. Number of Instances: 506
4. Number of Attributes: 13 continuous attributes (including "class"
attribute "MEDV"), 1 binary-valued attribute.
5. Attribute Information:
1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over
25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds
river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per 10,000 dollars
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in 1000's of dollars
Downloaded From: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
"""
module Housing
using DelimitedFiles
using ..Data: deps, download_and_verify
#Uncomment if package exists
#const cache_prefix = "https://cache.julialang.org/"
const cache_prefix = ""
function load()
isfile(deps("housing.data")) && return
@info "Downloading the Boston housing Dataset"
download_and_verify("$(cache_prefix)http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data",
deps("housing.data"),
"baadf72995725d76efe787b664e1f083388c79ba21ef9a7990d87f774184735a")
#@info "Download complete. Working on the files"
path = deps()
isfile(deps("housing.data")) && touch(joinpath(path, "tempfile.data"))
open(joinpath(path, "tempfile.data"), "a") do fout
open(deps("housing.data"), "r") do fin
for line in eachline(fin)
line = replace(lstrip(line), r" +" => s",")
println(fout, line)
end
end
end
mv(joinpath(path, "tempfile.data"), deps("housing.data"), force=true)
end
"""
Gets the targets for the Boston housing dataset, a 506 element array listing the targets for each example
```jldoctest
julia> using Flux
julia> target = Flux.Data.Housing.targets()
julia> summary(target)
506×1 Array{Float64,2}
julia> target[1]
24.0
"""
function targets()
load()
housing = readdlm(deps("housing.data"), ',')
reshape(Vector{Float64}(housing[1:end,end]), (506, 1))
end
"""
Gets the names of the features provided in the dataset
"""
function feature_names()
["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"]
end
"""
Gets the features of the Boston Housing Dataset. This is a 506x13 Matrix of Float64 datatypes.
The values are in the order ["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"].
It has 506 examples.
```jldoctest
julia> using Flux
julia> features = Flux.Data.Housing.features()
julia> summary(features)
506×13 Array{Float64,2}
julia> features[1, :]
13-element Array{Float64,1}:
0.00632
18.0
2.31
0.0
0.538
296.0
15.3
396.9
4.98
"""
function features()
load()
housing = readdlm(deps("housing.data"), ',')
Matrix{Float64}(housing[1:end, 1:13])
end
end

View File

@ -2,13 +2,12 @@
Fisher's classic iris dataset.
Measurements from 3 different species of iris: setosa, versicolor and
virginica. There are 50 examples of each species.
virginica. There are 50 examples of each species.
There are 4 measurements for each example: sepal length, sepal width, petal
length and petal width. The measurements are in centimeters.
There are 4 measurements for each example: sepal length, sepal width,
petal length and petal width. The measurements are in centimeters.
The module retrieves the data from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/iris).
"""
module Iris
@ -28,15 +27,12 @@ function load()
end
"""
labels()
Get the labels of the iris dataset, a 150 element array of strings listing the
species of each example.
```jldoctest
julia> using Flux
```jldoctest; setup = :(Flux.Data.Iris.load())
julia> labels = Flux.Data.Iris.labels();
julia> summary(labels)
@ -53,16 +49,13 @@ function labels()
end
"""
features()
Get the features of the iris dataset. This is a 4x150 matrix of Float64
elements. It has a row for each feature (sepal length, sepal width,
Get the features of the iris dataset. This is a 4x150 matrix of Float64
elements. It has a row for each feature (sepal length, sepal width,
petal length, petal width) and a column for each example.
```jldoctest
julia> using Flux
```jldoctest; setup = :(Flux.Data.Iris.load())
julia> features = Flux.Data.Iris.features();
julia> summary(features)

View File

@ -83,9 +83,10 @@ getfeatures(io::IO, index::Integer) = vec(getimage(io, index))
Load the MNIST images.
Each image is a 28×28 array of `Gray` colour values (see Colors.jl).
Each image is a 28×28 array of `Gray` colour values
(see [Colors.jl](https://github.com/JuliaGraphics/Colors.jl)).
Returns the 60,000 training images by default; pass `:test` to retreive the
Return the 60,000 training images by default; pass `:test` to retrieve the
10,000 test images.
"""
function images(set = :train)
@ -99,10 +100,10 @@ end
labels()
labels(:test)
Load the labels corresponding to each of the images returned from `images()`.
Load the labels corresponding to each of the images returned from [`images()`](@ref).
Each label is a number from 0-9.
Returns the 60,000 training labels by default; pass `:test` to retreive the
Return the 60,000 training labels by default; pass `:test` to retrieve the
10,000 test labels.
"""
function labels(set = :train)

View File

@ -1,3 +1,4 @@
"Stanford Sentiment Treebank dataset."
module Sentiment
using ZipFile
@ -39,8 +40,28 @@ function gettrees(name)
return parsetree.(ss)
end
"""
train()
Return the train split of the Stanford Sentiment Treebank.
The data is in [treebank](https://en.wikipedia.org/wiki/Treebank) format.
"""
train() = gettrees("train")
"""
test()
Return the test split of the Stanford Sentiment Treebank.
The data is in [treebank](https://en.wikipedia.org/wiki/Treebank) format.
"""
test() = gettrees("test")
"""
dev()
Return the dev split of the Stanford Sentiment Treebank.
The data is in [treebank](https://en.wikipedia.org/wiki/Treebank) format.
"""
dev() = gettrees("dev")
end

View File

@ -1,44 +1,41 @@
import Adapt: adapt, adapt_storage
using Zygote: IdSet
functor(x) = (), _ -> x
functor(x::Tuple) = x, y -> y
functor(x::NamedTuple) = x, y -> y
functor(x::AbstractArray) = x, y -> y
functor(x::AbstractArray{<:Number}) = (), _ -> x
function makefunctor(m::Module, T, fs = fieldnames(T))
@eval m begin
Flux.functor(x::$T) = ($([:($f=x.$f) for f in fs]...),), y -> $T(y...)
end
end
function functorm(T, fs = nothing)
fs == nothing || isexpr(fs, :tuple) || error("@functor T (a, b)")
fs = fs == nothing ? [] : [:($(map(QuoteNode, fs.args)...),)]
:(makefunctor(@__MODULE__, $(esc(T)), $(fs...)))
end
macro functor(args...)
functorm(args...)
end
isleaf(x) = functor(x)[1] === ()
function fmap1(f, x)
func, re = functor(x)
re(map(f, func))
end
function fmap(f, x; cache = IdDict())
haskey(cache, x) && return cache[x]
cache[x] = isleaf(x) ? f(x) : fmap1(x -> fmap(f, x, cache = cache), x)
end
import Functors: @functor, functor, fmap
trainable(m) = functor(m)[1]
"""
testmode!(m, mode = true)
Set a layer or model's test mode (see below).
Using `:auto` mode will treat any gradient computation as training.
_Note_: if you manually set a model into test mode, you need to manually place
it back into train mode during training phase.
Possible values include:
- `false` for training
- `true` for testing
- `:auto` or `nothing` for Flux to detect the mode automatically
"""
testmode!(m, mode = true) = m
"""
trainmode!(m, mode = true)
Set a layer of model's train mode (see below).
Symmetric to [`testmode!`](@ref) (i.e. `trainmode!(m, mode) == testmode!(m, !mode)`).
_Note_: if you manually set a model into train mode, you need to manually place
it into test mode during testing phase.
Possible values include:
- `true` for training
- `false` for testing
- `:auto` or `nothing` for Flux to detect the mode automatically
"""
trainmode!(m, mode = true) = mode isa Bool ? testmode!(m, !mode) : testmode!(m, mode)
params!(p::Params, x::AbstractArray{<:Number}, seen = IdSet()) = push!(p, x)
function params!(p::Params, x, seen = IdSet())

View File

@ -4,17 +4,23 @@
Chain multiple layers / functions together, so that they are called in sequence
on a given input.
```julia
m = Chain(x -> x^2, x -> x+1)
m(5) == 26
m = Chain(Dense(10, 5), Dense(5, 2))
x = rand(10)
m(x) == m[2](m[1](x))
```
`Chain` also supports indexing and slicing, e.g. `m[2]` or `m[1:end-1]`.
`m[1:3](x)` will calculate the output of the first three layers.
# Examples
```jldoctest
julia> m = Chain(x -> x^2, x -> x+1);
julia> m(5) == 26
true
julia> m = Chain(Dense(10, 5), Dense(5, 2));
julia> x = rand(10);
julia> m(x) == m[2](m[1](x))
true
```
"""
struct Chain{T<:Tuple}
layers::T
@ -24,7 +30,7 @@ end
@forward Chain.layers Base.getindex, Base.length, Base.first, Base.last,
Base.iterate, Base.lastindex
functor(c::Chain) = c.layers, ls -> Chain(ls...)
functor(::Type{<:Chain}, c) = c.layers, ls -> Chain(ls...)
applychain(::Tuple{}, x) = x
applychain(fs::Tuple, x) = applychain(tail(fs), first(fs)(x))
@ -33,12 +39,25 @@ applychain(fs::Tuple, x) = applychain(tail(fs), first(fs)(x))
Base.getindex(c::Chain, i::AbstractArray) = Chain(c.layers[i]...)
testmode!(m::Chain, mode = true) = (map(x -> testmode!(x, mode), m.layers); m)
function Base.show(io::IO, c::Chain)
print(io, "Chain(")
join(io, c.layers, ", ")
print(io, ")")
end
"""
outdims(c::Chain, isize)
Calculate the output dimensions given the input dimensions, `isize`.
```julia
m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32))
outdims(m, (10, 10)) == (6, 6)
```
"""
outdims(c::Chain, isize) = foldl(, map(l -> (x -> outdims(l, x)), c.layers))(isize)
# This is a temporary and naive implementation
# it might be replaced in the future for better performance
@ -47,6 +66,7 @@ end
# only slightly changed to better handle interaction with Zygote @dsweber2
"""
activations(c::Chain, input)
Calculate the forward results of each layers in Chain `c` with `input` as model input.
"""
function activations(c::Chain, input)
@ -65,24 +85,24 @@ extraChain(::Tuple{}, x) = ()
"""
Dense(in::Integer, out::Integer, σ = identity)
Creates a traditional `Dense` layer with parameters `W` and `b`.
Create a traditional `Dense` layer with parameters `W` and `b`.
y = σ.(W * x .+ b)
The input `x` must be a vector of length `in`, or a batch of vectors represented
as an `in × N` matrix. The out `y` will be a vector or batch of length `out`.
```julia
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
julia> d = Dense(5, 2)
Dense(5, 2)
julia> d(rand(5))
Tracked 2-element Array{Float64,1}:
0.00257447
-0.00449443
```
2-element Array{Float32,1}:
-0.16210233
0.12311903```
"""
struct Dense{F,S,T}
struct Dense{F,S<:AbstractArray,T<:AbstractArray}
W::S
b::T
σ::F
@ -116,10 +136,23 @@ end
(a::Dense{<:Any,W})(x::AbstractArray{<:AbstractFloat}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
"""
outdims(l::Dense, isize)
Calculate the output dimensions given the input dimensions, `isize`.
```julia
m = Dense(10, 5)
outdims(m, (5, 2)) == (5,)
outdims(m, (10,)) == (5,)
```
"""
outdims(l::Dense, isize) = (size(l.W)[1],)
"""
Diagonal(in::Integer)
Creates an element-wise linear transformation layer with learnable
Create an element-wise linear transformation layer with learnable
vectors `α` and `β`:
y = α .* x .+ β
@ -145,22 +178,16 @@ function Base.show(io::IO, l::Diagonal)
print(io, "Diagonal(", length(l.α), ")")
end
outdims(l::Diagonal, isize) = (length(l.α),)
"""
Maxout(over)
`Maxout` is a neural network layer, which has a number of internal layers,
which all have the same input, and the maxout returns the elementwise maximium
of the internal layers' outputs.
The [Maxout](https://arxiv.org/pdf/1302.4389.pdf) layer has a number of
internal layers which all receive the same input. It returns the elementwise
maximum of the internal layers' outputs.
Maxout over linear dense layers satisfies the univeral approximation theorem.
Reference:
Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio.
2013. Maxout networks.
In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML'13),
Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. JMLR.org III-1319-III-1327.
https://arxiv.org/pdf/1302.4389.pdf
"""
struct Maxout{FS<:Tuple}
over::FS
@ -169,17 +196,18 @@ end
"""
Maxout(f, n_alts)
Constructs a Maxout layer over `n_alts` instances of the layer given by `f`.
The function takes no arguement and should return some callable layer.
Conventionally this is a linear dense layer.
Construct a Maxout layer over `n_alts` instances of the layer given by `f`.
The function takes no arguments and should return some callable layer.
Conventionally, this is a linear dense layer.
For example the following example which
will construct a `Maxout` layer over 4 internal dense linear layers,
each identical in structure (784 inputs, 128 outputs).
# Examples
This constructs a `Maxout` layer over 4 internal dense linear layers, each
identical in structure (784 inputs, 128 outputs):
```julia
insize = 784
outsize = 128
Maxout(()->Dense(insize, outsize), 4)
insize = 784
outsize = 128
Maxout(()->Dense(insize, outsize), 4)
```
"""
function Maxout(f, n_alts)
@ -193,17 +221,21 @@ function (mo::Maxout)(input::AbstractArray)
mapreduce(f -> f(input), (acc, out) -> max.(acc, out), mo.over)
end
outdims(l::Maxout, isize) = outdims(first(l.over), isize)
"""
SkipConnection(layers, connection)
SkipConnection(layer, connection)
Creates a Skip Connection, of a layer or `Chain` of consecutive layers
plus a shortcut connection. The connection function will combine the result of the layers
with the original input, to give the final output.
Create a skip connection which consists of a layer or `Chain` of consecutive
layers and a shortcut connection linking the block's input to the output
through a user-supplied 2-argument callable. The first argument to the callable
will be propagated through the given `layer` while the second is the unchanged,
"skipped" input.
The simplest 'ResNet'-type connection is just `SkipConnection(layer, +)`,
The simplest "ResNet"-type connection is just `SkipConnection(layer, +)`,
and requires the output of the layers to be the same shape as the input.
Here is a more complicated example:
```
```julia
m = Conv((3,3), 4=>7, pad=(1,1))
x = ones(5,5,4,10);
size(m(x)) == (5, 5, 7, 10)

View File

@ -1,27 +1,66 @@
using NNlib: conv, ∇conv_data, depthwiseconv
using NNlib: conv, ∇conv_data, depthwiseconv, output_size
# pad dims of x with dims of y until ndims(x) == ndims(y)
_paddims(x::Tuple, y::Tuple) = (x..., y[(end - (length(y) - length(x) - 1)):end]...)
_convtransoutdims(isize, ksize, ssize, dsize, pad) = (isize .- 1).*ssize .+ 1 .+ (ksize .- 1).*dsize .- (pad[1:2:end] .+ pad[2:2:end])
expand(N, i::Tuple) = i
expand(N, i::Integer) = ntuple(_ -> i, N)
"""
Conv(size, in=>out)
Conv(size, in=>out, relu)
SamePad
Standard convolutional layer. `size` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Padding for convolutional layers will be calculated so that outputshape == inputshape when stride = 1.
Example: Applying Conv layer to a 1-channel input using a 2x2 window size,
giving us a 16-channel output. Output is activated with ReLU.
For stride > 1 the output shape depends on the type of convolution layer.
"""
struct SamePad end
size = (2,2)
calc_padding(pad, k::NTuple{N,T}, dilation, stride) where {T,N}= expand(Val(2*N), pad)
function calc_padding(::SamePad, k::NTuple{N,T}, dilation, stride) where {N,T}
#Ref: "A guide to convolution arithmetic for deep learning" https://arxiv.org/pdf/1603.07285
# Effective kernel size, including dilation
k_eff = @. k + (k - 1) * (dilation - 1)
# How much total padding needs to be applied?
pad_amt = @. k_eff - 1
# In case amount of padding is odd we need to apply different amounts to each side.
return Tuple(mapfoldl(i -> [ceil(Int, i/2), floor(Int, i/2)], vcat, pad_amt))
end
"""
Conv(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
filter = (2,2)
in = 1
out = 16
Conv((2, 2), 1=>16, relu)
Data should be stored in WHCN order (width, height, # channels, # batches).
Standard convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
# Examples
Apply a `Conv` layer to a 1-channel input using a 2×2 window filter size, giving us a
16-channel output. Output is activated with ReLU.
```julia
filter = (2,2)
in = 1
out = 16
Conv(filter, in => out, relu)
```
"""
struct Conv{N,M,F,A,V}
σ::F
@ -32,25 +71,68 @@ struct Conv{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function Conv(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
"""
Conv(weight::AbstractArray, bias::AbstractArray)
Conv(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the convolutional layer with user defined weight and bias arrays.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
There is also a keyword-only constuctor available for all convoultional
layers.
```julia
weight = rand(Float32, 3, 3, 5)
bias = zeros(Float32, 5)
Conv(weight = weight,
bias = bias,
σ = sigmoid)
```
"""
function Conv(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return Conv(σ, w, b, stride, pad, dilation)
end
Conv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N =
Conv(init(k..., ch...), zeros(ch[2]), σ,
stride = stride, pad = pad, dilation = dilation)
function Conv(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
Conv(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
"""
convfilter(filter::Tuple, in=>out)
Constructs a standard convolutional weight matrix with given `filter` and
channels from `in` to `out`.
Accepts the keyword `init` (default: `glorot_uniform`) to control the sampling
distribution.
See also: [`depthwiseconvfilter`](@ref)
"""
convfilter(filter::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer};
init = glorot_uniform) where N = init(filter..., ch...)
function Conv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, ch, init = init), bias = zeros(ch[2])) where N
Conv(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor Conv
function (c::Conv)(x::AbstractArray)
# TODO: breaks gpu broadcast :(
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
σ, b = c.σ, reshape(c.bias, ntuple(_->1, length(c.stride))..., :, 1)
cdims = DenseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
σ.(conv(x, c.weight, cdims) .+ b)
end
@ -69,16 +151,38 @@ end
a(T.(x))
"""
ConvTranspose(size, in=>out)
ConvTranspose(size, in=>out, relu)
outdims(l::Conv, isize::Tuple)
Standard convolutional transpose layer. `size` should be a tuple like `(2, 2)`.
Calculate the output dimensions given the input dimensions `isize`.
Batch size and channel size are ignored as per [NNlib.jl](https://github.com/FluxML/NNlib.jl).
```julia
m = Conv((3, 3), 3 => 16)
outdims(m, (10, 10)) == (8, 8)
outdims(m, (10, 10, 1, 3)) == (8, 8)
```
"""
outdims(l::Conv, isize) =
output_size(DenseConvDims(_paddims(isize, size(l.weight)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
ConvTranspose(filter, in=>out)
ConvTranspose(filter, in=>out, activation)
ConvTranspose(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard convolutional transpose layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order. In other words, a 100×100 RGB image would
be a `100×100×3` array, and a batch of 50 would be a `100×100×3×50` array.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == stride * inputsize - stride + 1.
"""
struct ConvTranspose{N,M,F,A,V}
σ::F
@ -89,18 +193,39 @@ struct ConvTranspose{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function ConvTranspose(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
"""
ConvTranspose(weight::AbstractArray, bias::AbstractArray)
ConvTranspose(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the convolutional transpose layer with user defined weight and bias arrays.
forward pass.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function ConvTranspose(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return ConvTranspose(σ, w, b, stride, pad, dilation)
end
ConvTranspose(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N =
ConvTranspose(init(k..., reverse(ch)...), zeros(ch[2]), σ,
function ConvTranspose(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
ConvTranspose(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
function ConvTranspose(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, reverse(ch), init = init), bias = zeros(ch[2])) where N
ConvTranspose(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor ConvTranspose
@ -112,9 +237,9 @@ function conv_transpose_dims(c::ConvTranspose, x::AbstractArray)
batch_size = size(x)[end]
# Create DenseConvDims() that looks like the corresponding conv()
return DenseConvDims((I..., C_in, batch_size), size(c.weight);
stride=c.stride,
padding=c.pad,
dilation=c.dilation,
stride=c.stride,
padding=c.pad,
dilation=c.dilation,
)
end
@ -125,7 +250,7 @@ function (c::ConvTranspose)(x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = conv_transpose_dims(c, x)
return σ.(∇conv_data(x, c.weight, cdims) .+ b)
σ.(∇conv_data(x, c.weight, cdims) .+ b)
end
function Base.show(io::IO, l::ConvTranspose)
@ -140,18 +265,28 @@ end
(a::ConvTranspose{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
"""
DepthwiseConv(size, in=>out)
DepthwiseConv(size, in=>out, relu)
Depthwise convolutional layer. `size` should be a tuple like `(2, 2)`.
outdims(l::ConvTranspose{N}, isize) where N = _convtransoutdims(isize[1:2], size(l.weight)[1:N], l.stride, l.dilation, l.pad)
"""
DepthwiseConv(filter::Tuple, in=>out)
DepthwiseConv(filter::Tuple, in=>out, activation)
DepthwiseConv(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Depthwise convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Note that `out` must be an integer multiple of `in`.
Data should be stored in WHCN order. In other words, a 100×100 RGB image would
be a `100×100×3` array, and a batch of 50 would be a `100×100×3×50` array.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
"""
struct DepthwiseConv{N,M,F,A,V}
σ::F
@ -162,20 +297,54 @@ struct DepthwiseConv{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function DepthwiseConv(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
"""
DepthwiseConv(weight::AbstractArray, bias::AbstractArray)
DepthwiseConv(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the `DepthwiseConv` layer with user defined weight and bias arrays.
forward pass.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function DepthwiseConv(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return DepthwiseConv(σ, w, b, stride, pad, dilation)
end
function DepthwiseConv(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
DepthwiseConv(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
"""
depthwiseconvfilter(filter::Tuple, in=>out)
Constructs a depthwise convolutional weight array defined by `filter` and channels
from `in` to `out`.
Accepts the keyword `init` (default: `glorot_uniform`) to control the sampling
distribution.
See also: [`convfilter`](@ref)
"""
depthwiseconvfilter(filter::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer};
init = glorot_uniform) where N = init(filter..., div(ch[2], ch[1]), ch[1])
function DepthwiseConv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = depthwiseconvfilter(k, ch, init = init), bias = zeros(ch[2])) where N
@assert ch[2] % ch[1] == 0 "Output channels must be integer multiple of input channels"
return DepthwiseConv(
init(k..., div(ch[2], ch[1]), ch[1]),
zeros(ch[2]),
weight,
bias,
σ;
stride = stride,
pad = pad,
@ -204,26 +373,38 @@ end
(a::DepthwiseConv{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
"""
CrossCor(size, in=>out)
CrossCor(size, in=>out, relu)
outdims(l::DepthwiseConv, isize) =
output_size(DepthwiseConvDims(_paddims(isize, (1, 1, size(l.weight)[end], 1)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
Standard cross convolutional layer. `size` should be a tuple like `(2, 2)`.
"""
CrossCor(filter, in=>out)
CrossCor(filter, in=>out, activation)
CrossCor(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard cross convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Example: Applying CrossCor layer to a 1-channel input using a 2x2 window size,
giving us a 16-channel output. Output is activated with ReLU.
size = (2,2)
in = 1
out = 16
CrossCor((2, 2), 1=>16, relu)
Data should be stored in WHCN order (width, height, # channels, # batches).
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
# Examples
Apply a `CrossCor` layer to a 1-channel input using a 2×2 window filter size, giving us a
16-channel output. Output is activated with ReLU.
```julia
filter = (2,2)
in = 1
out = 16
CrossCor((2, 2), 1=>16, relu)
```
"""
struct CrossCor{N,M,F,A,V}
σ::F
@ -234,18 +415,39 @@ struct CrossCor{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function CrossCor(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
"""
CrossCor(weight::AbstractArray, bias::AbstractArray)
CrossCor(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the standard cross convolutional layer with user defined weight and bias
arrays.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function CrossCor(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return CrossCor(σ, w, b, stride, pad, dilation)
end
CrossCor(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N =
CrossCor(init(k..., ch...), zeros(ch[2]), σ,
function CrossCor(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
CrossCor(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
function CrossCor(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, ch, init = init), bias = zeros(ch[2])) where N
CrossCor(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor CrossCor
@ -275,12 +477,66 @@ end
(a::CrossCor{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
outdims(l::CrossCor, isize) =
output_size(DenseConvDims(_paddims(isize, size(l.weight)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
MaxPool(k)
GlobalMaxPool()
Max pooling layer. `k` stands for the size of the window for each dimension of the input.
Global max pooling layer.
Takes the keyword arguments `pad` and `stride`.
Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output,
by performing max pooling on the complete (w,h)-shaped feature maps.
"""
struct GlobalMaxPool end
function (g::GlobalMaxPool)(x)
# Input size
x_size = size(x)
# Kernel size
k = x_size[1:end-2]
# Pooling dimensions
pdims = PoolDims(x, k)
return maxpool(x, pdims)
end
function Base.show(io::IO, g::GlobalMaxPool)
print(io, "GlobalMaxPool()")
end
"""
GlobalMeanPool()
Global mean pooling layer.
Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output,
by performing mean pooling on the complete (w,h)-shaped feature maps.
"""
struct GlobalMeanPool end
function (g::GlobalMeanPool)(x)
# Input size
x_size = size(x)
# Kernel size
k = x_size[1:end-2]
# Pooling dimensions
pdims = PoolDims(x, k)
return meanpool(x, pdims)
end
function Base.show(io::IO, g::GlobalMeanPool)
print(io, "GlobalMeanPool()")
end
"""
MaxPool(k; pad = 0, stride = k)
Max pooling layer. `k` is the size of the window for each dimension of the input.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
=======
"""
struct MaxPool{N,M}
k::NTuple{N,Int}
@ -290,8 +546,7 @@ end
function MaxPool(k::NTuple{N,Integer}; pad = 0, stride = k) where N
stride = expand(Val(N), stride)
pad = expand(Val(2*N), pad)
pad = calc_padding(pad, k, 1, stride)
return MaxPool(k, pad, stride)
end
@ -304,12 +559,14 @@ function Base.show(io::IO, m::MaxPool)
print(io, "MaxPool(", m.k, ", pad = ", m.pad, ", stride = ", m.stride, ")")
end
outdims(l::MaxPool{N}, isize) where N = output_size(PoolDims(_paddims(isize, (l.k..., 1, 1)), l.k; stride = l.stride, padding = l.pad))
"""
MeanPool(k)
MeanPool(k; pad = 0, stride = k)
Mean pooling layer. `k` stands for the size of the window for each dimension of the input.
Mean pooling layer. `k` is the size of the window for each dimension of the input.
Takes the keyword arguments `pad` and `stride`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
"""
struct MeanPool{N,M}
k::NTuple{N,Int}
@ -319,7 +576,7 @@ end
function MeanPool(k::NTuple{N,Integer}; pad = 0, stride = k) where N
stride = expand(Val(N), stride)
pad = expand(Val(2*N), pad)
pad = calc_padding(pad, k, 1, stride)
return MeanPool(k, pad, stride)
end
@ -331,3 +588,5 @@ end
function Base.show(io::IO, m::MeanPool)
print(io, "MeanPool(", m.k, ", pad = ", m.pad, ", stride = ", m.stride, ")")
end
outdims(l::MeanPool{N}, isize) where N = output_size(PoolDims(_paddims(isize, (l.k..., 1, 1)), l.k; stride = l.stride, padding = l.pad))

View File

@ -2,11 +2,23 @@ istraining() = false
@adjoint istraining() = true, _ -> nothing
_isactive(m) = isnothing(m.active) ? istraining() : m.active
_dropout_shape(s, ::Colon) = size(s)
_dropout_shape(s, dims) = tuple((i dims ? 1 : si for (i, si) enumerate(size(s)))...)
_dropout_kernel(y::T, p, q) where {T} = y > p ? T(1 / q) : T(0)
"""
dropout(x, p; dims = :)
The dropout function. For each input, either sets that input to `0` (with probability
`p`) or scales it by `1 / (1 - p)`. `dims` specifies the unbroadcasted dimensions,
e.g. `dims=1` applies dropout along columns and `dims=2` along rows.
This is used as a regularisation, i.e. it reduces overfitting during training.
See also the [`Dropout`](@ref) layer.
"""
dropout(x, p; dims = :) = x
@adjoint function dropout(x, p; dims = :)
@ -18,22 +30,31 @@ end
"""
Dropout(p, dims = :)
A Dropout layer. For each input, either sets that input to `0` (with probability
`p`) or scales it by `1/(1-p)`. The `dims` argument is to specified the unbroadcasted
dimensions, i.e. `dims=1` does dropout along columns and `dims=2` along rows. This is
used as a regularisation, i.e. it reduces overfitting during training. see also [`dropout`](@ref).
Dropout layer. In the forward pass, apply the [`Flux.dropout`](@ref) function on the input.
Does nothing to the input once [`Flux.testmode!`](@ref) is `true`.
"""
mutable struct Dropout{F,D}
p::F
dims::D
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
Dropout(p, dims) = Dropout(p, dims, nothing)
function Dropout(p; dims = :)
@assert 0 p 1
Dropout{typeof(p),typeof(dims)}(p, dims)
Dropout{typeof(p),typeof(dims)}(p, dims, nothing)
end
(a::Dropout)(x) = dropout(x, a.p; dims = a.dims)
function (a::Dropout)(x)
_isactive(a) || return x
return dropout(x, a.p; dims = a.dims)
end
testmode!(m::Dropout, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, d::Dropout)
print(io, "Dropout(", d.p)
@ -43,20 +64,25 @@ end
"""
AlphaDropout(p)
A dropout layer. It is used in Self-Normalizing Neural Networks.
(https://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf)
The AlphaDropout layer ensures that mean and variance of activations remains the same as before.
A dropout layer. Used in
[Self-Normalizing Neural Networks](https://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf).
The AlphaDropout layer ensures that mean and variance of activations
remain the same as before.
Does nothing to the input once [`testmode!`](@ref) is true.
"""
mutable struct AlphaDropout{F}
p::F
function AlphaDropout(p)
active::Union{Bool, Nothing}
function AlphaDropout(p, active = nothing)
@assert 0 p 1
new{typeof(p)}(p)
new{typeof(p)}(p, active)
end
end
function (a::AlphaDropout)(x)
istraining() || return x
_isactive(a) || return x
λ = eltype(x)(1.0507009873554804934193349852946)
α = eltype(x)(1.6732632423543772848170429916717)
α1 = eltype(x)(-λ*α)
@ -68,12 +94,15 @@ function (a::AlphaDropout)(x)
return x
end
testmode!(m::AlphaDropout, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
"""
LayerNorm(h::Integer)
A [normalisation layer](https://arxiv.org/pdf/1607.06450.pdf) designed to be
used with recurrent hidden states of size `h`. Normalises the mean/stddev of
each input before applying a per-neuron gain/bias.
used with recurrent hidden states of size `h`. Normalises the mean and standard
deviation of each input before applying a per-neuron gain/bias.
"""
struct LayerNorm{T}
diag::Diagonal{T}
@ -95,8 +124,8 @@ end
initβ = zeros, initγ = ones,
ϵ = 1e-8, momentum = .1)
Batch Normalization layer. The `channels` input should be the size of the
channel dimension in your data (see below).
[Batch Normalization](https://arxiv.org/pdf/1502.03167.pdf) layer.
`channels` should be the size of the channel dimension in your data (see below).
Given an array with `N` dimensions, call the `N-1`th the channel dimension. (For
a batch of feature vectors this is just the data dimension, for `WHCN` images
@ -106,10 +135,9 @@ it's the usual channel dimension.)
shifts them to have a new mean and variance (corresponding to the learnable,
per-channel `bias` and `scale` parameters).
See [Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf).
Use [`testmode!`](@ref) during inference.
Example:
# Examples
```julia
m = Chain(
Dense(28^2, 64),
@ -127,12 +155,16 @@ mutable struct BatchNorm{F,V,W,N}
σ²::W # moving std
ϵ::N
momentum::N
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
BatchNorm(λ, β, γ, μ, σ², ϵ, momentum) = BatchNorm(λ, β, γ, μ, σ², ϵ, momentum, nothing)
BatchNorm(chs::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
BatchNorm(λ, initβ(chs), initγ(chs),
zeros(chs), ones(chs), ϵ, momentum)
zeros(chs), ones(chs), ϵ, momentum, nothing)
trainable(bn::BatchNorm) = (bn.β, bn.γ)
@ -145,7 +177,7 @@ function (BN::BatchNorm)(x)
m = div(prod(size(x)), channels)
γ = reshape(BN.γ, affine_shape...)
β = reshape(BN.β, affine_shape...)
if !istraining()
if !_isactive(BN)
μ = reshape(BN.μ, affine_shape...)
σ² = reshape(BN.σ², affine_shape...)
ϵ = BN.ϵ
@ -170,41 +202,15 @@ end
@functor BatchNorm
testmode!(m::BatchNorm, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, l::BatchNorm)
print(io, "BatchNorm($(join(size(l.β), ", "))")
(l.λ == identity) || print(io, ", λ = $(l.λ)")
print(io, ")")
end
"""
InstanceNorm(channels::Integer, σ = identity;
initβ = zeros, initγ = ones,
ϵ = 1e-8, momentum = .1)
Instance Normalization layer. The `channels` input should be the size of the
channel dimension in your data (see below).
Given an array with `N` dimensions, call the `N-1`th the channel dimension. (For
a batch of feature vectors this is just the data dimension, for `WHCN` images
it's the usual channel dimension.)
`InstanceNorm` computes the mean and variance for each each `W×H×1×1` slice and
shifts them to have a new mean and variance (corresponding to the learnable,
per-channel `bias` and `scale` parameters).
See [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).
Example:
```julia
m = Chain(
Dense(28^2, 64),
InstanceNorm(64, relu),
Dense(64, 10),
InstanceNorm(10),
softmax)
```
"""
expand_inst = (x, as) -> reshape(repeat(x, outer=[1, as[length(as)]]), as...)
mutable struct InstanceNorm{F,V,W,N}
@ -215,12 +221,44 @@ mutable struct InstanceNorm{F,V,W,N}
σ²::W # moving std
ϵ::N
momentum::N
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
"""
InstanceNorm(channels::Integer, σ = identity;
initβ = zeros, initγ = ones,
ϵ = 1e-8, momentum = .1)
[Instance Normalization](https://arxiv.org/abs/1607.08022) layer.
`channels` should be the size of the channel dimension in your data (see below).
Given an array with `N` dimensions, call the `N-1`th the channel dimension. (For
a batch of feature vectors this is just the data dimension, for `WHCN` images
it's the usual channel dimension.)
`InstanceNorm` computes the mean and variance for each each `W×H×1×1` slice and
shifts them to have a new mean and variance (corresponding to the learnable,
per-channel `bias` and `scale` parameters).
Use [`testmode!`](@ref) during inference.
# Examples
```julia
m = Chain(
Dense(28^2, 64),
InstanceNorm(64, relu),
Dense(64, 10),
InstanceNorm(10),
softmax)
```
"""
InstanceNorm(λ, β, γ, μ, σ², ϵ, momentum) = InstanceNorm(λ, β, γ, μ, σ², ϵ, momentum, nothing)
InstanceNorm(chs::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
InstanceNorm(λ, initβ(chs), initγ(chs),
zeros(chs), ones(chs), ϵ, momentum)
zeros(chs), ones(chs), ϵ, momentum, nothing)
trainable(in::InstanceNorm) = (in.β, in.γ)
@ -237,7 +275,7 @@ function (in::InstanceNorm)(x)
m = div(prod(size(x)), c*bs)
γ, β = expand_inst(in.γ, affine_shape), expand_inst(in.β, affine_shape)
if !istraining()
if !_isactive(in)
μ = expand_inst(in.μ, affine_shape)
σ² = expand_inst(in.σ², affine_shape)
ϵ = in.ϵ
@ -263,6 +301,9 @@ end
@functor InstanceNorm
testmode!(m::InstanceNorm, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, l::InstanceNorm)
print(io, "InstanceNorm($(join(size(l.β), ", "))")
(l.λ == identity) || print(io, ", λ = $(l.λ)")
@ -270,26 +311,27 @@ function Base.show(io::IO, l::InstanceNorm)
end
"""
Group Normalization.
This layer can outperform Batch-Normalization and Instance-Normalization.
GroupNorm(chs::Integer, G::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i),
ϵ = 1f-5, momentum = 0.1f0)
GroupNorm(chs::Integer, G::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i),
ϵ = 1f-5, momentum = 0.1f0)
[Group Normalization](https://arxiv.org/pdf/1803.08494.pdf) layer.
This layer can outperform Batch Normalization and Instance Normalization.
``chs`` is the number of channels, the channel dimension of your input.
For an array of N dimensions, the (N-1)th index is the channel dimension.
`chs` is the number of channels, the channel dimension of your input.
For an array of N dimensions, the `N-1`th index is the channel dimension.
``G`` is the number of groups along which the statistics would be computed.
`G` is the number of groups along which the statistics are computed.
The number of channels must be an integer multiple of the number of groups.
Example:
```
m = Chain(Conv((3,3), 1=>32, leakyrelu;pad = 1),
GroupNorm(32,16)) # 32 channels, 16 groups (G = 16), thus 2 channels per group used
```
Use [`testmode!`](@ref) during inference.
Link : https://arxiv.org/pdf/1803.08494.pdf
# Examples
```julia
m = Chain(Conv((3,3), 1=>32, leakyrelu;pad = 1),
GroupNorm(32,16))
# 32 channels, 16 groups (G = 16), thus 2 channels per group used
```
"""
mutable struct GroupNorm{F,V,W,N,T}
G::T # number of groups
@ -300,12 +342,16 @@ mutable struct GroupNorm{F,V,W,N,T}
σ²::W # moving std
ϵ::N
momentum::N
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
GroupNorm(G, λ, β, γ, μ, σ², ϵ, momentum) = GroupNorm(G, λ, β, γ, μ, σ², ϵ, momentum, nothing)
GroupNorm(chs::Integer, G::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
GroupNorm(G, λ, initβ(chs), initγ(chs),
zeros(G,1), ones(G,1), ϵ, momentum)
zeros(G,1), ones(G,1), ϵ, momentum, nothing)
trainable(gn::GroupNorm) = (gn.β, gn.γ)
@ -329,7 +375,7 @@ function(gn::GroupNorm)(x)
β = reshape(gn.β, affine_shape...)
y = reshape(x,((size(x))[1:end-2]...,channels_per_group,groups,batches))
if !istraining()
if !_isactive(gn)
og_shape = size(x)
μ = reshape(gn.μ, μ_affine_shape...) # Shape : (1,1,...C/G,G,1)
σ² = reshape(gn.σ², μ_affine_shape...) # Shape : (1,1,...C/G,G,1)
@ -360,6 +406,9 @@ end
@functor GroupNorm
testmode!(m::GroupNorm, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, l::GroupNorm)
print(io, "GroupNorm($(join(size(l.β), ", "))")
(l.λ == identity) || print(io, ", λ = $(l.λ)")

View File

@ -12,16 +12,16 @@ in the background. `cell` should be a model of the form:
h, y = cell(h, x...)
For example, here's a recurrent network that keeps a running total of its inputs.
For example, here's a recurrent network that keeps a running total of its inputs:
```julia
accum(h, x) = (h+x, x)
accum(h, x) = (h + x, x)
rnn = Flux.Recur(accum, 0)
rnn(2) # 2
rnn(3) # 3
rnn.state # 5
rnn.(1:10) # apply to a sequence
rnn.state # 60
rnn(2) # 2
rnn(3) # 3
rnn.state # 5
rnn.(1:10) # apply to a sequence
rnn.state # 60
```
"""
mutable struct Recur{T}
@ -45,12 +45,12 @@ Base.show(io::IO, m::Recur) = print(io, "Recur(", m.cell, ")")
"""
reset!(rnn)
Reset the hidden state of a recurrent layer back to its original value. See also
`truncate!`.
Reset the hidden state of a recurrent layer back to its original value.
Assuming you have a `Recur` layer `rnn`, this is roughly equivalent to
rnn.state = hidden(rnn.cell)
Assuming you have a `Recur` layer `rnn`, this is roughly equivalent to:
```julia
rnn.state = hidden(rnn.cell)
```
"""
reset!(m::Recur) = (m.state = m.init)
reset!(m) = foreach(reset!, functor(m)[1])
@ -136,8 +136,8 @@ Base.show(io::IO, l::LSTMCell) =
"""
LSTM(in::Integer, out::Integer)
Long Short Term Memory recurrent layer. Behaves like an RNN but generally
exhibits a longer memory span over sequences.
[Long Short Term Memory](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory)
recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.
See [this article](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
for a good overview of the internals.
@ -177,8 +177,8 @@ Base.show(io::IO, l::GRUCell) =
"""
GRU(in::Integer, out::Integer)
Gated Recurrent Unit layer. Behaves like an RNN but generally
exhibits a longer memory span over sequences.
[Gated Recurrent Unit](https://arxiv.org/abs/1406.1078) layer. Behaves like an
RNN but generally exhibits a longer memory span over sequences.
See [this article](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
for a good overview of the internals.

View File

@ -1,86 +1,296 @@
using CuArrays
using NNlib: logsoftmax, logσ
# Cost functions
"""
mae(, y)
Return the mean of absolute error; calculated as
`sum(abs.(ŷ .- y)) / length(y)`.
"""
mae(, y) = sum(abs.( .- y)) * 1 // length(y)
"""
mse(, y)
Return the mean squared error between and y; calculated as
`sum((ŷ .- y).^2) / length(y)`.
# Examples
```jldoctest
julia> Flux.mse([0, 2], [1, 1])
1//1
```
"""
mse(, y) = sum(( .- y).^2) * 1 // length(y)
"""
msle(, y; ϵ=eps(eltype()))
Return the mean of the squared logarithmic errors; calculated as
`sum((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2) / length(y)`.
The `ϵ` term provides numerical stability.
Penalizes an under-predicted estimate greater than an over-predicted estimate.
"""
msle(, y; ϵ=eps(eltype())) = sum((log.( .+ ϵ) .- log.(y .+ ϵ)).^2) * 1 // length(y)
"""
huber_loss(, y; δ=1.0)
Return the mean of the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss)
given the prediction `` and true values `y`.
| 0.5 * | - y|, for | - y| <= δ
Huber loss = |
| δ * (| - y| - 0.5 * δ), otherwise
"""
#TODO: remove dropgrad when Zygote can handle this function with CuArrays
function huber_loss(, y; δ=eltype()(1))
abs_error = abs.( .- y)
temp = Zygote.dropgrad(abs_error .< δ)
x = eltype()(0.5)
hub_loss = sum(((abs_error.^2) .* temp) .* x .+ δ*(abs_error .- x*δ) .* (1 .- temp)) * 1 // length(y)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::Nothing)
return -sum(y .* log.()) * 1 // size(y, 2)
return -sum(xlogy.(y, )) * 1 // size(y, 2)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::Number)
return -sum(y .* log.()) .* weight * 1 // size(y, 2)
return -sum(xlogy.(y, )) .* weight * 1 // size(y, 2)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::AbstractVector)
return -sum(y .* log.() .* weight) * 1 // size(y, 2)
return -sum(xlogy.(y, ) .* weight) * 1 // size(y, 2)
end
"""
crossentropy(, y; weight = nothing)
Return the cross entropy between the given probability distributions;
calculated as `-sum(y .* log.(ŷ) .* weight) / size(y, 2)`.
`weight` can be `Nothing`, a `Number` or an `AbstractVector`.
`weight=nothing` acts like `weight=1` but is faster.
See also: [`Flux.logitcrossentropy`](@ref), [`Flux.binarycrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.crossentropy(softmax([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
3.085467254747739
```
"""
crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat; weight=nothing) = _crossentropy(, y, weight)
function logitcrossentropy(logŷ::AbstractVecOrMat, y::AbstractVecOrMat; weight = 1)
return -sum(y .* logsoftmax(logŷ) .* weight) * 1 // size(y, 2)
"""
logitcrossentropy(, y; weight = 1)
Return the crossentropy computed after a [`Flux.logsoftmax`](@ref) operation;
calculated as `-sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2)`.
`logitcrossentropy(ŷ, y)` is mathematically equivalent to
[`Flux.crossentropy(softmax(ŷ), y)`](@ref) but it is more numerically stable.
See also: [`Flux.crossentropy`](@ref), [`Flux.binarycrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.logitcrossentropy([-1.1491, 0.8619, 0.3127], [1, 1, 0])
3.085467254747738
```
"""
function logitcrossentropy(::AbstractVecOrMat, y::AbstractVecOrMat; weight = 1)
return -sum(y .* logsoftmax() .* weight) * 1 // size(y, 2)
end
"""
binarycrossentropy(, y; ϵ=eps())
Return `-y*log(ŷ + ϵ) - (1-y)*log(1-ŷ + ϵ)`. The ϵ term provides numerical stability.
Return ``-y*\\log( + ϵ) - (1-y)*\\log(1- + ϵ)``. The `ϵ` term provides numerical stability.
julia> binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [1, 1, 0.])
3-element Array{Float64,1}:
1.4244
0.352317
0.86167
Typically, the prediction `` is given by the output of a [`sigmoid`](@ref) activation.
See also: [`Flux.crossentropy`](@ref), [`Flux.logitcrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
3-element Array{Float64,1}:
1.424397097347566
0.35231664672364077
0.8616703662235441
```
"""
binarycrossentropy(, y; ϵ=eps()) = -y*log( + ϵ) - (1 - y)*log(1 - + ϵ)
binarycrossentropy(, y; ϵ=eps()) = -xlogy(y, + ϵ) - xlogy(1 - y, 1 - + ϵ)
# Re-definition to fix interaction with CuArrays.
CuArrays.@cufunc binarycrossentropy(, y; ϵ=eps()) = -y*log( + ϵ) - (1 - y)*log(1 - + ϵ)
"""
logitbinarycrossentropy(logŷ, y)
logitbinarycrossentropy(ŷ, y)
`logitbinarycrossentropy(logŷ, y)` is mathematically equivalent to `binarycrossentropy(σ(logŷ), y)`
but it is more numerically stable.
`logitbinarycrossentropy(ŷ, y)` is mathematically equivalent to
[`Flux.binarycrossentropy(σ(ŷ), y)`](@ref) but it is more numerically stable.
julia> logitbinarycrossentropy.([-1.1491, 0.8619, 0.3127], [1, 1, 0.])
3-element Array{Float64,1}:
1.4244
0.352317
0.86167
See also: [`Flux.crossentropy`](@ref), [`Flux.logitcrossentropy`](@ref), [`Flux.binarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.logitbinarycrossentropy.([-1.1491, 0.8619, 0.3127], [1, 1, 0])
3-element Array{Float64,1}:
1.4243970973475661
0.35231664672364094
0.8616703662235443
```
"""
logitbinarycrossentropy(logŷ, y) = (1 - y)*logŷ - logσ(logŷ)
logitbinarycrossentropy(ŷ, y) = (1 - y)*ŷ - logσ()
# Re-definition to fix interaction with CuArrays.
CuArrays.@cufunc logitbinarycrossentropy(logŷ, y) = (1 - y)*logŷ - logσ(logŷ)
CuArrays.@cufunc logitbinarycrossentropy(ŷ, y) = (1 - y)*ŷ - logσ()
"""
normalise(x::AbstractArray; dims=1)
normalise(x; dims=1)
Normalises `x` to mean 0 and standard deviation 1, across the dimensions given by `dims`. Defaults to normalising over columns.
Normalise `x` to mean 0 and standard deviation 1 across the dimensions given by `dims`.
Defaults to normalising over columns.
julia> a = reshape(collect(1:9), 3, 3)
3×3 Array{Int64,2}:
1 4 7
2 5 8
3 6 9
```jldoctest
julia> a = reshape(collect(1:9), 3, 3)
3×3 Array{Int64,2}:
1 4 7
2 5 8
3 6 9
julia> normalise(a)
3×3 Array{Float64,2}:
-1.22474 -1.22474 -1.22474
0.0 0.0 0.0
1.22474 1.22474 1.22474
julia> Flux.normalise(a)
3×3 Array{Float64,2}:
-1.22474 -1.22474 -1.22474
0.0 0.0 0.0
1.22474 1.22474 1.22474
julia> normalise(a, dims=2)
3×3 Array{Float64,2}:
-1.22474 0.0 1.22474
-1.22474 0.0 1.22474
-1.22474 0.0 1.22474
julia> Flux.normalise(a, dims=2)
3×3 Array{Float64,2}:
-1.22474 0.0 1.22474
-1.22474 0.0 1.22474
-1.22474 0.0 1.22474
```
"""
function normalise(x::AbstractArray; dims=1)
μ′ = mean(x, dims = dims)
σ = std(x, dims = dims, mean = μ′, corrected=false)
return (x .- μ′) ./ σ
end
"""
kldivergence(, y)
Return the
[Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
between the given probability distributions.
KL divergence is a measure of how much one probability distribution is different
from the other.
It is always non-negative and zero only when both the distributions are equal
everywhere.
"""
function kldivergence(, y)
entropy = sum(xlogx.(y)) * 1 //size(y,2)
cross_entropy = crossentropy(, y)
return entropy + cross_entropy
end
"""
poisson(, y)
Return how much the predicted distribution `` diverges from the expected Poisson
distribution `y`; calculated as `sum(ŷ .- y .* log.(ŷ)) / size(y, 2)`.
[More information.](https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/poisson).
"""
poisson(, y) = sum( .- xlogy.(y, )) * 1 // size(y,2)
"""
hinge(, y)
Return the [hinge loss](https://en.wikipedia.org/wiki/Hinge_loss) given the
prediction `` and true labels `y` (containing 1 or -1); calculated as
`sum(max.(0, 1 .- ŷ .* y)) / size(y, 2)`.
See also: [`squared_hinge`](@ref)
"""
hinge(, y) = sum(max.(0, 1 .- .* y)) * 1 // size(y, 2)
"""
squared_hinge(, y)
Return the squared hinge loss given the prediction `` and true labels `y`
(containing 1 or -1); calculated as `sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)`.
See also: [`hinge`](@ref)
"""
squared_hinge(, y) = sum((max.(0, 1 .- .* y)).^2) * 1 // size(y, 2)
"""
dice_coeff_loss(, y; smooth=1)
Return a loss based on the dice coefficient.
Used in the [V-Net](https://arxiv.org/pdf/1606.04797v1.pdf) image segmentation
architecture.
Similar to the F1_score. Calculated as:
1 - 2*sum(| .* y| + smooth) / (sum(.^2) + sum(y.^2) + smooth)`
"""
dice_coeff_loss(, y; smooth=eltype()(1.0)) = 1 - (2*sum(y .* ) + smooth) / (sum(y.^2) + sum(.^2) + smooth)
"""
tversky_loss(, y; β=0.7)
Return the [Tversky loss](https://arxiv.org/pdf/1706.05721.pdf).
Used with imbalanced data to give more weight to false negatives.
Larger β weigh recall higher than precision (by placing more emphasis on false negatives)
Calculated as:
1 - sum(|y .* | + 1) / (sum(y .* + β*(1 .- y) .* + (1 - β)*y .* (1 .- )) + 1)
"""
tversky_loss(, y; β=eltype()(0.7)) = 1 - (sum(y .* ) + 1) / (sum(y .* + β*(1 .- y) .* + (1 - β)*y .* (1 .- )) + 1)
"""
flatten(x::AbstractArray)
Transform (w, h, c, b)-shaped input into (w × h × c, b)-shaped output
by linearizing all values for each element in the batch.
"""
function flatten(x::AbstractArray)
return reshape(x, :, size(x)[end])
end
"""
xlogx(x)
Return `x * log(x)` for `x ≥ 0`, handling `x = 0` by taking the downward limit.
"""
function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
CuArrays.@cufunc function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
"""
xlogy(x, y)
Return `x * log(y)` for `y > 0` with correct limit at `x = 0`.
"""
function xlogy(x, y)
result = x * log(y)
ifelse(iszero(x), zero(result), result)
end
CuArrays.@cufunc function xlogy(x, y)
result = x * log(y)
ifelse(iszero(x), zero(result), result)
end
@adjoint function broadcasted(::typeof(xlogy), x::Zygote.Numeric, y::Zygote.Numeric)
res = xlogy.(x, y)
res, Δ -> (nothing, Zygote.unbroadcast(x, xlogy.(Δ, y)), Zygote.unbroadcast(y, Δ .* x ./ y))
end

View File

@ -27,7 +27,8 @@ Base.getindex(xs::OneHotMatrix, ::Colon, ::Colon) = OneHotMatrix(xs.height, copy
Base.getindex(xs::OneHotMatrix, i::Integer, ::Colon) = map(x -> x[i], xs.data)
A::AbstractMatrix * B::OneHotMatrix = A[:, map(x->x.ix, B.data)]
# remove workaround when https://github.com/JuliaGPU/CuArrays.jl/issues/676 is fixed
A::AbstractMatrix * B::OneHotMatrix = A[:, cpu(map(x->x.ix, B.data))]
Base.hcat(x::OneHotVector, xs::OneHotVector...) = OneHotMatrix(length(x), [x, xs...])
@ -37,30 +38,28 @@ import Adapt: adapt, adapt_structure
adapt_structure(T, xs::OneHotMatrix) = OneHotMatrix(xs.height, adapt(T, xs.data))
import .CuArrays: CuArray, cudaconvert
import .CuArrays: CuArray, CuArrayStyle, cudaconvert
import Base.Broadcast: BroadcastStyle, ArrayStyle
BroadcastStyle(::Type{<:OneHotMatrix{<:CuArray}}) = ArrayStyle{CuArray}()
BroadcastStyle(::Type{<:OneHotMatrix{<:CuArray}}) = CuArrayStyle{2}()
cudaconvert(x::OneHotMatrix{<:CuArray}) = OneHotMatrix(x.height, cudaconvert(x.data))
"""
onehot(l, labels[, unk])
Create an [`OneHotVector`](@ref) wtih `l`-th element be `true` based on possible `labels` set.
If `unk` is given, it retruns `onehot(unk, labels)` if the input label `l` is not find in `labels`; otherwise
it will error.
## Examples
Create a `OneHotVector` with its `l`-th element `true` based on the
possible set of `labels`.
If `unk` is given, return `onehot(unk, labels)` if the input label `l` is not found
in `labels`; otherwise, it will raise an error.
# Examples
```jldoctest
julia> using Flux: onehot
julia> onehot(:b, [:a, :b, :c])
julia> Flux.onehot(:b, [:a, :b, :c])
3-element Flux.OneHotVector:
0
1
0
julia> onehot(:c, [:a, :b, :c])
julia> Flux.onehot(:c, [:a, :b, :c])
3-element Flux.OneHotVector:
0
0
@ -82,15 +81,14 @@ end
"""
onehotbatch(ls, labels[, unk...])
Create an [`OneHotMatrix`](@ref) with a batch of labels based on possible `labels` set, returns the
`onehot(unk, labels)` if given labels `ls` is not found in set `labels`.
## Examples
Create a `OneHotMatrix` with a batch of labels based on the
possible set of `labels`.
If `unk` is given, return [`onehot(unk, labels)`](@ref) if one of the input
labels `ls` is not found in `labels`; otherwise it will error.
# Examples
```jldoctest
julia> using Flux: onehotbatch
julia> onehotbatch([:b, :a, :b], [:a, :b, :c])
julia> Flux.onehotbatch([:b, :a, :b], [:a, :b, :c])
3×3 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
0 1 0
1 0 1
@ -107,13 +105,12 @@ Base.argmax(xs::OneHotVector) = xs.ix
Inverse operations of [`onehot`](@ref).
# Examples
```jldoctest
julia> using Flux: onecold
julia> onecold([true, false, false], [:a, :b, :c])
julia> Flux.onecold([true, false, false], [:a, :b, :c])
:a
julia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])
julia> Flux.onecold([0.3, 0.2, 0.5], [:a, :b, :c])
:c
```
"""
@ -125,6 +122,4 @@ onecold(y::AbstractMatrix, labels...) =
onecold(y::OneHotMatrix, labels...) =
mapreduce(x -> Flux.onecold(x, labels...), |, y.data, dims = 2, init = 0)
# TODO probably still want this as a custom adjoint Zygote
# onecold(x::TrackedVector, l...) = onecold(data(x), l...)
# onecold(x::TrackedMatrix, l...) = onecold(data(x), l...)
@nograd onecold, onehot, onehotbatch

View File

@ -1,9 +1,12 @@
module Optimise
export train!,
SGD, Descent, ADAM, Momentum, Nesterov, RMSProp,
using LinearAlgebra
export train!, update!,
Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, ADAMW,RADAM,
InvDecay, ExpDecay, WeightDecay, stop, Optimiser
InvDecay, ExpDecay, WeightDecay, stop, Optimiser,
ClipValue, ClipNorm
include("optimisers.jl")
include("train.jl")

View File

@ -1,5 +1,4 @@
using Flux
using Base: @get!
using MacroTools: @forward
const ϵ = 1e-8
@ -7,24 +6,25 @@ const ϵ = 1e-8
# TODO: should use weak refs
"""
Descent(η)
Descent(η = 0.1)
Classic gradient descent optimiser with learning rate `η`.
For each parameter `p` and its gradient `δp`, this runs `p -= η*δp`
## Parameters
- Learning Rate (η): The amount by which the gradients are discounted before updating the weights. Defaults to `0.1`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
## Example
```julia-repl
opt = Descent() # uses default η (0.1)
# Examples
```julia
opt = Descent()
opt = Descent(0.3) # use provided η
opt = Descent(0.3)
ps = params(model)
gs = gradient(ps) do
loss(x, y)
loss(x, y)
end
Flux.Optimise.update!(opt, ps, gs)
@ -41,17 +41,19 @@ function apply!(o::Descent, x, Δ)
end
"""
Momentum(η, ρ)
Momentum(η = 0.01, ρ = 0.9)
Gradient descent with learning rate `η` and momentum `ρ`.
Gradient descent optimizer with learning rate `η` and momentum `ρ`.
## Parameters
- Learning Rate (`η`): Amount by which gradients are discounted before updating the weights. Defaults to `0.01`.
- Momentum (`ρ`): Parameter that accelerates descent in the relevant direction and dampens oscillations. Defaults to `0.9`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Momentum (`ρ`): Controls the acceleration of gradient descent in the
prominent direction, in effect dampening oscillations.
## Examples
# Examples
```julia
opt = Momentum() # uses defaults of η = 0.01 and ρ = 0.9
opt = Momentum()
opt = Momentum(0.01, 0.99)
```
@ -72,17 +74,19 @@ function apply!(o::Momentum, x, Δ)
end
"""
Nesterov(η, ρ)
Nesterov(η = 0.001, ρ = 0.9)
Gradient descent with learning rate `η` and Nesterov momentum `ρ`.
Gradient descent optimizer with learning rate `η` and Nesterov momentum `ρ`.
## Parameters
- Learning Rate (η): Amount by which the gradients are dicsounted berfore updating the weights. Defaults to `0.001`.
- Nesterov Momentum (ρ): Paramters controlling the amount of nesterov momentum to be applied. Defaults to `0.9`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Nesterov momentum (`ρ`): Controls the acceleration of gradient descent in the
prominent direction, in effect dampening oscillations.
## Examples
# Examples
```julia
opt = Nesterov() # uses defaults η = 0.001 and ρ = 0.9
opt = Nesterov()
opt = Nesterov(0.003, 0.95)
```
@ -104,23 +108,25 @@ function apply!(o::Nesterov, x, Δ)
end
"""
RMSProp(η, ρ)
RMSProp(η = 0.001, ρ = 0.9)
Implements the RMSProp algortihm. Often a good choice for recurrent networks. Paramters other than learning rate generally don't need tuning.
Optimizer using the
[RMSProp](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
algorithm. Often a good choice for recurrent networks. Parameters other than learning rate
generally don't need tuning.
## Parameters
- Learning Rate (η): Defaults to `0.001`.
- Rho (ρ): Defaults to `0.9`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Momentum (`ρ`): Controls the acceleration of gradient descent in the
prominent direction, in effect dampening oscillations.
## Examples
# Examples
```julia
opt = RMSProp() # uses default η = 0.001 and ρ = 0.9
opt = RMSProp()
opt = RMSProp(0.002, 0.95)
```
## References
[RMSProp](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
"""
mutable struct RMSProp
eta::Float64
@ -138,23 +144,22 @@ function apply!(o::RMSProp, x, Δ)
end
"""
ADAM(η, β::Tuple)
ADAM(η = 0.001, β::Tuple = (0.9, 0.999))
Implements the ADAM optimiser.
[ADAM](https://arxiv.org/abs/1412.6980v8) optimiser.
## Paramters
- Learning Rate (`η`): Defaults to `0.001`.
- Beta (`β::Tuple`): The first element refers to β1 and the second to β2. Defaults to `(0.9, 0.999)`.
## Examples
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = ADAM() # uses the default η = 0.001 and β = (0.9, 0.999)
opt = ADAM()
opt = ADAM(0.001, (0.9, 0.8))
```
## References
[ADAM](https://arxiv.org/abs/1412.6980v8) optimiser.
"""
mutable struct ADAM
eta::Float64
@ -175,24 +180,22 @@ function apply!(o::ADAM, x, Δ)
end
"""
RADAM(η, β::Tuple)
RADAM(η = 0.001, β::Tuple = (0.9, 0.999))
Implements the rectified ADAM optimizer.
[Rectified ADAM](https://arxiv.org/pdf/1908.03265v1.pdf) optimizer.
## Parameters
- Learning Rate (η): Defaults to `0.001`
- Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to `(0.9, 0.999)`.
## Examples
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = RADAM() # uses the default η = 0.001 and β = (0.9, 0.999)
opt = RADAM()
opt = RADAM(0.001, (0.9, 0.8))
```
## References
[RADAM](https://arxiv.org/pdf/1908.03265v1.pdf) optimiser (Rectified ADAM).
"""
mutable struct RADAM
eta::Float64
@ -220,22 +223,22 @@ function apply!(o::RADAM, x, Δ)
end
"""
AdaMax(η, β::Tuple)
AdaMax(η = 0.001, β::Tuple = (0.9, 0.999))
Variant of ADAM based on -norm.
[AdaMax](https://arxiv.org/abs/1412.6980v9) is a variant of ADAM based on the -norm.
## Parameters
- Learning Rate (η): Defaults to `0.001`
- Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to `(0.9, 0.999)`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
## Examples
# Examples
```julia
opt = AdaMax() # uses default η and β
opt = AdaMax()
opt = AdaMax(0.001, (0.9, 0.995))
```
## References
[AdaMax](https://arxiv.org/abs/1412.6980v9) optimiser.
"""
mutable struct AdaMax
eta::Float64
@ -256,23 +259,22 @@ function apply!(o::AdaMax, x, Δ)
end
"""
ADAGrad(η)
ADAGrad(η = 0.1)
Implements AdaGrad. It has parameter specific learning rates based on how frequently it is updated.
[ADAGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) optimizer. It has
parameter specific learning rates based on how frequently it is updated.
Parameters don't need tuning.
## Parameters
- Learning Rate (η): Defaults to `0.1`
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
## Examples
# Examples
```julia
opt = ADAGrad() # uses default η = 0.1
opt = ADAGrad()
opt = ADAGrad(0.001)
```
## References
[ADAGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) optimiser.
Parameters don't need tuning.
"""
mutable struct ADAGrad
eta::Float64
@ -289,21 +291,21 @@ function apply!(o::ADAGrad, x, Δ)
end
"""
ADADelta(ρ)
ADADelta(ρ = 0.9)
Version of ADAGrad that adapts learning rate based on a window of past gradient updates. Parameters don't need tuning.
[ADADelta](https://arxiv.org/abs/1212.5701) is a version of ADAGrad adapting its learning
rate based on a window of past gradient updates.
Parameters don't need tuning.
## Parameters
- Rho (ρ): Factor by which gradient is decayed at each time step. Defaults to `0.9`.
# Parameters
- Rho (`ρ`): Factor by which the gradient is decayed at each time step.
## Examples
# Examples
```julia
opt = ADADelta() # uses default ρ = 0.9
opt = ADADelta()
opt = ADADelta(0.89)
```
## References
[ADADelta](https://arxiv.org/abs/1212.5701) optimiser.
"""
mutable struct ADADelta
rho::Float64
@ -322,22 +324,23 @@ function apply!(o::ADADelta, x, Δ)
end
"""
AMSGrad(η, β::Tuple)
AMSGrad(η = 0.001, β::Tuple = (0.9, 0.999))
Implements AMSGrad version of the ADAM optimiser. Parameters don't need tuning.
The [AMSGrad](https://openreview.net/forum?id=ryQu7f-RZ) version of the ADAM
optimiser. Parameters don't need tuning.
## Parameters
- Learning Rate (η): Defaults to `0.001`.
- Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to `(0.9, 0.999)`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
## Examples
# Examples
```julia
opt = AMSGrad() # uses default η and β
opt = AMSGrad()
opt = AMSGrad(0.001, (0.89, 0.995))
```
## References
[AMSGrad](https://openreview.net/forum?id=ryQu7f-RZ) optimiser.
"""
mutable struct AMSGrad
eta::Float64
@ -357,22 +360,23 @@ function apply!(o::AMSGrad, x, Δ)
end
"""
NADAM(η, β::Tuple)
NADAM(η = 0.001, β::Tuple = (0.9, 0.999))
Nesterov variant of ADAM. Parameters don't need tuning.
[NADAM](http://cs229.stanford.edu/proj2015/054_report.pdf) is a Nesterov variant of ADAM.
Parameters don't need tuning.
## Parameters
- Learning Rate (η): Defaults to `0.001`.
- Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to `(0.9, 0.999)`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
## Examples
# Examples
```julia
opt = NADAM() # uses default η and β
opt = NADAM()
opt = NADAM(0.002, (0.89, 0.995))
```
## References
[NADAM](http://cs229.stanford.edu/proj2015/054_report.pdf) optimiser.
"""
mutable struct NADAM
eta::Float64
@ -393,23 +397,24 @@ function apply!(o::NADAM, x, Δ)
end
"""
ADAMW(η, β::Tuple, decay)
ADAMW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)
Variant of ADAM defined by fixing weight decay regularization.
[ADAMW](https://arxiv.org/abs/1711.05101) is a variant of ADAM fixing (as in repairing) its
weight decay regularization.
## Parameters
- Learning Rate (η): Defaults to `0.001`.
- Beta (β::Tuple): The first element refers to β1 and the second to β2. Defaults to (0.9, 0.999).
- decay: Decay applied to weights during optimisation. Defaults to 0.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
- `decay`: Decay applied to weights during optimisation.
## Examples
# Examples
```julia
opt = ADAMW() # uses default η, β and decay
opt = ADAMW()
opt = ADAMW(0.001, (0.89, 0.995), 0.1)
```
## References
[ADAMW](https://arxiv.org/abs/1711.05101)
"""
ADAMW(η = 0.001, β = (0.9, 0.999), decay = 0) =
Optimiser(ADAM(η, β), WeightDecay(decay))
@ -442,17 +447,15 @@ function apply!(o::Optimiser, x, Δ)
end
"""
InvDecay(γ)
InvDecay(γ = 0.001)
Applies inverse time decay to an optimiser, i.e., the effective step size at iteration `n` is `eta / (1 + γ * n)` where `eta` is the initial step size. The wrapped optimiser's step size is not modified.
```
Apply inverse time decay to an optimiser, so that the effective step size at
iteration `n` is `eta / (1 + γ * n)` where `eta` is the initial step size.
The wrapped optimiser's step size is not modified.
## Parameters
- gamma (γ): Defaults to `0.001`
## Example
# Examples
```julia
Optimiser(InvDecay(..), Opt(..))
Optimiser(InvDecay(..), Opt(..))
```
"""
mutable struct InvDecay
@ -471,22 +474,25 @@ function apply!(o::InvDecay, x, Δ)
end
"""
ExpDecay(eta, decay, decay_step, clip)
ExpDecay(η = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4)
Discount the learning rate `eta` by a multiplicative factor `decay` every `decay_step` till a minimum of `clip`.
Discount the learning rate `η` by the factor `decay` every `decay_step` steps till
a minimum of `clip`.
## Parameters
- Learning Rate (eta): Defaults to `0.001`.
- decay: Factor by which the learning rate is discounted. Defaults to `0.1`.
- decay_step: Schedules decay operations by setting number of steps between two decay operations. Defaults to `1000`.
- clip: Minimum value of learning rate. Defaults to `1e-4`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- `decay`: Factor by which the learning rate is discounted.
- `decay_step`: Schedule decay operations by setting the number of steps between
two decay operations.
- `clip`: Minimum value of learning rate.
## Example
# Examples
To apply exponential decay to an optimiser:
```julia
Optimiser(ExpDecay(..), Opt(..))
Optimiser(ExpDecay(..), Opt(..))
opt = Optimiser(ExpDecay(), ADAM())
opt = Optimiser(ExpDecay(), ADAM())
```
"""
mutable struct ExpDecay
@ -503,19 +509,19 @@ function apply!(o::ExpDecay, x, Δ)
η, s, decay = o.eta, o.step, o.decay
n = o.current[x] = get(o.current, x, 0) + 1
if o.current[x]%s == 0 && count(x -> x%s == 0, values(o.current)) == 1
η = max(η * decay^(s / n), o.clip)
η = max(η * decay, o.clip)
o.eta = η
end
@. Δ *= η
end
"""
WeightDecay(wd)
WeightDecay(wd = 0)
Decays the weight by `wd`
Decay weights by `wd`.
## Parameters
- weight decay (wd): 0
# Parameters
- Weight decay (`wd`)
"""
mutable struct WeightDecay
wd::Real
@ -527,3 +533,31 @@ function apply!(o::WeightDecay, x, Δ)
wd = o.wd
@. Δ += wd * x
end
"""
ClipValue(thresh)
Clip gradients when their absolute value exceeds `thresh`.
"""
mutable struct ClipValue{T}
thresh::T
end
apply!(o::ClipValue, x, Δ) = clamp!(Δ, -o.thresh, o.thresh)
"""
ClipNorm(thresh)
Clip gradients when their L2 norm exceeds `thresh`.
"""
mutable struct ClipNorm{T}
thresh::T
end
function apply!(o::ClipNorm, x, Δ)
Δnrm = norm(Δ)
if Δnrm > o.thresh
rmul!(Δ, o.thresh / Δnrm)
end
return Δ
end

View File

@ -1,11 +1,26 @@
using Juno
import Zygote: Params, gradient
"""
update!(x, )
Update the array `x` according to `x .-= x̄`.
"""
function update!(x::AbstractArray, )
x .+=
return x
x .-=
end
"""
update!(opt, p, g)
update!(opt, ps::Params, gs)
Perform an update step of the parameters `ps` (or the single parameter `p`)
according to optimizer `opt` and the gradients `gs` (the gradient `g`).
As a result, the parameters are mutated and the optimizer's internal state may change.
"""
function update!(opt, x, )
x .-= apply!(opt, x, )
end
@ -28,11 +43,10 @@ struct StopException <: Exception end
stop()
Call `Flux.stop()` in a callback to indicate when a callback condition is met.
This would trigger the train loop to stop and exit.
This will trigger the train loop to stop and exit.
# Examples
```julia
# Example callback:
cb = function ()
accuracy() > 0.9 && Flux.stop()
end
@ -45,18 +59,18 @@ end
"""
train!(loss, params, data, opt; cb)
For each datapoint `d` in `data` computes the gradient of `loss(d...)` through
backpropagation and calls the optimizer `opt`.
For each datapoint `d` in `data` compute the gradient of `loss(d...)` through
backpropagation and call the optimizer `opt`.
Takes a callback as keyword argument `cb`. For example, this will print "training"
every 10 seconds:
In case datapoints `d` are of numeric array type, assume no splatting is needed
and compute the gradient of `loss(d)`.
```julia
Flux.train!(loss, params, data, opt,
cb = throttle(() -> println("training"), 10))
```
A callback is given with the keyword argument `cb`. For example, this will print
"training" every 10 seconds (using [`Flux.throttle`](@ref)):
The callback can call `Flux.stop()` to interrupt the training loop.
train!(loss, params, data, opt, cb = throttle(() -> println("training"), 10))
The callback can call [`Flux.stop`](@ref) to interrupt the training loop.
Multiple optimisers and callbacks can be passed to `opt` and `cb` as arrays.
"""
@ -65,8 +79,14 @@ function train!(loss, ps, data, opt; cb = () -> ())
cb = runall(cb)
@progress for d in data
try
gs = gradient(ps) do
loss(d...)
if d isa AbstractArray{<:Number}
gs = gradient(ps) do
loss(d)
end
else
gs = gradient(ps) do
loss(d...)
end
end
update!(opt, ps, gs)
cb()
@ -86,11 +106,12 @@ end
Run `body` `N` times. Mainly useful for quickly doing multiple epochs of
training in a REPL.
```julia
julia> @epochs 2 println("hello")
INFO: Epoch 1
# Examples
```jldoctest
julia> Flux.@epochs 2 println("hello")
[ Info: Epoch 1
hello
INFO: Epoch 2
[ Info: Epoch 2
hello
```
"""

View File

@ -1,10 +1,40 @@
# Arrays
nfan() = 1, 1 #fan_in, fan_out
nfan(n) = 1, n #A vector is treated as a n×1 matrix
nfan(n_out, n_in) = n_in, n_out #In case of Dense kernels: arranged as matrices
nfan(dims...) = prod(dims[1:end-2]) .* (dims[end-1], dims[end]) #In case of convolution kernels
nfan() = 1, 1 # fan_in, fan_out
nfan(n) = 1, n # A vector is treated as a n×1 matrix
nfan(n_out, n_in) = n_in, n_out # In case of Dense kernels: arranged as matrices
nfan(dims...) = prod(dims[1:end-2]) .* (dims[end-1], dims[end]) # In case of convolution kernels
"""
glorot_uniform(dims...)
Return an `Array` of size `dims` containing random variables taken from a uniform
distribution in the interval ``[-x, x]``, where `x = sqrt(24 / sum(dims)) / 2`.
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
julia> Flux.glorot_uniform(2, 3)
2×3 Array{Float32,2}:
0.601094 -0.57414 -0.814925
0.900868 0.805994 0.057514
```
"""
glorot_uniform(dims...) = (rand(Float32, dims...) .- 0.5f0) .* sqrt(24.0f0 / sum(nfan(dims...)))
"""
glorot_normal(dims...)
Return an `Array` of size `dims` containing random variables taken from a normal
distribution with mean 0 and standard deviation `sqrt(2 / sum(dims))`.
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
julia> Flux.glorot_normal(3, 2)
3×2 Array{Float32,2}:
0.429505 -0.0852891
0.523935 0.371009
-0.223261 0.188052
```
"""
glorot_normal(dims...) = randn(Float32, dims...) .* sqrt(2.0f0 / sum(nfan(dims...)))
ones(T::Type, dims...) = Base.ones(T, dims...)
@ -13,9 +43,81 @@ zeros(T::Type, dims...) = Base.zeros(T, dims...)
ones(dims...) = Base.ones(Float32, dims...)
zeros(dims...) = Base.zeros(Float32, dims...)
"""
unsqueeze(xs, dim)
Return `xs` reshaped into an `Array` one dimensionality higher than `xs`,
where `dim` indicates in which dimension `xs` is extended.
# Examples
```jldoctest
julia> xs = [[1, 2], [3, 4], [5, 6]]
3-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
julia> Flux.unsqueeze(xs, 1)
1×3 Array{Array{Int64,1},2}:
[1, 2] [3, 4] [5, 6]
julia> Flux.unsqueeze([1 2; 3 4], 2)
2×1×2 Array{Int64,3}:
[:, :, 1] =
1
3
[:, :, 2] =
2
4
```
"""
unsqueeze(xs, dim) = reshape(xs, (size(xs)[1:dim-1]..., 1, size(xs)[dim:end]...))
"""
stack(xs, dim)
Concatenate the given `Array` of `Array`s `xs` into a single `Array` along the
given dimension `dim`.
# Examples
```jldoctest
julia> xs = [[1, 2], [3, 4], [5, 6]]
3-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
julia> Flux.stack(xs, 1)
3×2 Array{Int64,2}:
1 2
3 4
5 6
julia> cat(xs, dims=1)
3-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
```
"""
stack(xs, dim) = cat(unsqueeze.(xs, dim)..., dims=dim)
"""
unstack(xs, dim)
Unroll the given `xs` into an `Array` of `Array`s along the given dimension `dim`.
# Examples
```jldoctest
julia> Flux.unstack([1 3 5 7; 2 4 6 8], 2)
4-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
[7, 8]
```
"""
unstack(xs, dim) = [copy(selectdim(xs, dim, i)) for i in 1:size(xs, dim)]
"""
@ -23,9 +125,16 @@ unstack(xs, dim) = [copy(selectdim(xs, dim, i)) for i in 1:size(xs, dim)]
Split `xs` into `n` parts.
```julia
julia> chunk(1:10, 3)
3-element Array{Array{Int64,1},1}:
# Examples
```jldoctest
julia> Flux.chunk(1:10, 3)
3-element Array{UnitRange{Int64},1}:
1:4
5:8
9:10
julia> Flux.chunk(collect(1:10), 3)
3-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10]
@ -40,11 +149,12 @@ batchindex(xs, i) = (reverse(Base.tail(reverse(axes(xs))))..., i)
Count the number of times that each element of `xs` appears.
```julia
julia> frequencies(['a','b','b'])
# Examples
```jldoctest
julia> Flux.frequencies(['a','b','b'])
Dict{Char,Int64} with 2 entries:
'b' => 2
'a' => 1
'b' => 2
```
"""
function frequencies(xs)
@ -60,12 +170,13 @@ head(x::Tuple) = reverse(Base.tail(reverse(x)))
squeezebatch(x) = reshape(x, head(size(x)))
"""
batch(xs)
batch(xs)
Batch the arrays in `xs` into a single array.
```julia
julia> batch([[1,2,3],[4,5,6]])
# Examples
```jldoctest
julia> Flux.batch([[1,2,3],[4,5,6]])
3×2 Array{Int64,2}:
1 4
2 5
@ -82,6 +193,25 @@ function batch(xs)
return data
end
"""
Return the given sequence padded with `p` up to a maximum length of `n`.
# Examples
```jldoctest
julia> rpad([1, 2], 4, 0)
4-element Array{Int64,1}:
1
2
0
0
julia> rpad([1, 2, 3], 2, 0)
3-element Array{Int64,1}:
1
2
3
```
"""
Base.rpad(v::AbstractVector, n::Integer, p) = [v; fill(p, max(n - length(v), 0))]
"""
@ -90,8 +220,9 @@ Base.rpad(v::AbstractVector, n::Integer, p) = [v; fill(p, max(n - length(v), 0))
Take a list of `N` sequences, and turn them into a single sequence where each
item is a batch of `N`. Short sequences will be padded by `pad`.
```julia
julia> batchseq([[1, 2, 3], [4, 5]], 0)
# Examples
```jldoctest
julia> Flux.batchseq([[1, 2, 3], [4, 5]], 0)
3-element Array{Array{Int64,1},1}:
[1, 4]
[2, 5]
@ -115,6 +246,10 @@ function _restructure(m, xs)
end
end
@adjoint function _restructure(m, xs)
_restructure(m, xs), dm -> (nothing,destructure(dm)[1])
end
"""
destructure(m)
@ -148,11 +283,15 @@ end
# Other
"""
Returns a function that when invoked, will only be triggered at most once
during `timeout` seconds. Normally, the throttled function will run
as much as it can, without ever going more than once per `wait` duration;
but if you'd like to disable the execution on the leading edge, pass
`leading=false`. To enable execution on the trailing edge, ditto.
throttle(f, timeout; leading=true, trailing=false)
Return a function that when invoked, will only be triggered at most once
during `timeout` seconds.
Normally, the throttled function will run as much as it can, without ever
going more than once per `wait` duration; but if you'd like to disable the
execution on the leading edge, pass `leading=false`. To enable execution on
the trailing edge, pass `trailing=true`.
"""
function throttle(f, timeout; leading=true, trailing=false)
cooldown = true

106
src/zeros.jl Normal file
View File

@ -0,0 +1,106 @@
import Base: +, -, *, reshape, size
import Base.Broadcast: broadcasted, Broadcasted, BroadcastStyle
"""
Zeros()
Zeros(size...)
Zeros(Type, size...)
Acts as a stand-in for an array of zeros that can be
used during training which is ignored by the optimisers.
Useful to turn bias off for a forward pass of a layer.
## Examples
```julia
julia> Flux.Zeros(3,3)
3×3 Flux.Zeros{Bool,2}:
false false false
false false false
false false false
julia> Flux.Zeros(Float32, 3,3)
3×3 Flux.Zeros{Float32,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> rand(3,3) .+ Flux.Zeros()
3×3 Array{Float64,2}:
0.198739 0.490459 0.785386
0.779074 0.39986 0.66383
0.854981 0.447292 0.314497
julia> bias_less_conv = Conv((2,2), 1=>3, bias = Flux.Zeros())
Conv((2, 2), 1=>3)
```
"""
struct Zeros{T,N} <: AbstractArray{T,N}
size::Tuple
end
Zeros(::Type{T}, sz...) where T = Zeros{T,length(sz)}(sz)
Zeros(sz::Integer...) = Zeros(Bool, sz...)
Base.size(xs::Zeros) = xs.size
Base.axes(xs::Zeros) = Base.OneTo.(size(xs))
Base.IndexStyle(::Type{<:Zeros}) = IndexLinear()
Base.getindex(xs::Zeros{T,N}, I::Int) where {T,N} = zero(T)
Base.getindex(xs::Zeros{T,N}, inds::Union{Base.OneTo, Base.UnitRange}) where {T,N} =
Zeros(T, length(inds))
Base.collect(xs::Zeros{T,N}) where {T,N} = fill(zero(T), size(xs))
@adjoint reshape(xs::Zeros{T}, dims...) where T =
reshape(xs, dims...), _ -> nothing
# Define basic ops
for f in (:+, :-)
@eval @inline function $f(a::Union{AbstractArray{<:Number}, Zeros}, b::Zeros)
@assert size(a) == size(b) throw(DimensionMismatch("dimensions must match"))
a
end
end
+(a::Zeros, b::AbstractArray) = b + a
-(a::Zeros, b::AbstractArray) = -b + a
Base.copy(xs::Zeros{T,N}) where {T,N} = xs
# Define broadcasting behaviour
for op in (:+, :-)
@eval function broadcasted(::typeof($op), a::AbstractArray, b::Zeros)
bs = Broadcast.broadcast_shape(size(a), size(b))
size(a) == bs && return a
sz = similar(a, bs)
sz .= a
end
end
broadcasted(::typeof(+), a::Zeros, b::AbstractArray) = broadcasted(+, b, a)
broadcasted(::typeof(-), a::Zeros, b::AbstractArray) = broadcasted(+, -b, a)
function broadcasted(::typeof(*), a::AbstractArray, b::Zeros)
Zeros(Broadcast.broadcast_shape(size(a), size(b))...)
end
broadcasted(::typeof(*), a::Zeros, b::AbstractArray) = broadcasted(*, b, a)
for op in (:+, :-, :*)
@eval broadcasted(::typeof($op), a::Zeros, b::Zeros) = Zeros(Broadcast.broadcast_shape(size(a), size(b))...)
end
# Some opportunities to avoid scalar indexing, intermediaries
# Since it replicates a little of what we expect Base to do,
# it should be possible to remove in the future, but for now,
# these help with performance.
broadcasted(::typeof(+), a::AbstractArray, b::Zeros{T,0}) where T = a
broadcasted(::typeof(+), a::Zeros{T,0}, b::AbstractArray) where T = b
broadcasted(::typeof(-), a::AbstractArray, b::Zeros{T,0}) where T = a
broadcasted(::typeof(-), a::Zeros{T,0}, b::AbstractArray) where T = -b
broadcasted(::typeof(*), a::AbstractArray, b::Zeros{T,0}) where T = zero(a)
broadcasted(::typeof(*), a::Zeros{T,0}, b::AbstractArray) where T = zero(b)
broadcasted(::typeof(/), a::Zeros{T,0}, b::AbstractArray) where T = zero(b)

View File

@ -25,7 +25,7 @@ cm = gpu(m)
@test all(p isa CuArray for p in params(cm))
@test cm(gpu(rand(10, 10))) isa CuArray{Float32,2}
x = [1,2,3]
x = [1.,2.,3.]
cx = gpu(x)
@test Flux.crossentropy(x,x) Flux.crossentropy(cx,cx)
@test Flux.crossentropy(x,x, weight=1.0) Flux.crossentropy(cx,cx, weight=1.0)
@ -33,8 +33,8 @@ cx = gpu(x)
x = [-1.1491, 0.8619, 0.3127]
y = [1, 1, 0.]
@test Flux.binarycrossentropy.(σ.(x),y) Flux.binarycrossentropy.(cu(σ.(x)),cu(y))
@test Flux.logitbinarycrossentropy.(x,y) Flux.logitbinarycrossentropy.(cu(x),cu(y))
@test Flux.binarycrossentropy.(σ.(x),y) Array(Flux.binarycrossentropy.(cu(σ.(x)),cu(y)))
@test Flux.logitbinarycrossentropy.(x,y) Array(Flux.logitbinarycrossentropy.(cu(x),cu(y)))
xs = rand(5, 5)
ys = Flux.onehotbatch(1:5,1:5)
@ -58,10 +58,18 @@ end
@test y[3,:] isa CuArray
end
@testset "restructure gpu" begin
dudt = Dense(1,1) |> gpu
p,re = Flux.destructure(dudt)
foo(x) = sum(re(p)(x))
@test gradient(foo, cu(rand(1)))[1] isa CuArray
end
if CuArrays.has_cudnn()
@info "Testing Flux/CUDNN"
include("cudnn.jl")
include("curnn.jl")
include("layers.jl")
else
@warn "CUDNN unavailable, not testing GPU DNN support"
end

98
test/cuda/layers.jl Normal file
View File

@ -0,0 +1,98 @@
# Test layers and data/model movements on and off the GPU
# Add tests for layers and their gradients on the GPU
# Most of the forward passes should be fine being applied
# to bitstype objects, but this gives higher coverage for our use-cases
# Check that getting the gradients does not throw
# generic movement tests
@testset "Basic GPU Movement" begin
@test gradient(x -> sum(gpu(x)), rand(3,3)) isa Tuple
@test gradient(x -> sum(cpu(x)), gpu(rand(3,3))) isa Tuple
end
# TODO: These layers get into scalar indexing
# `AlphaDropout` throws a compilation error on GPUs,
# whereas, the rest are scalar indexing issues.
const BROKEN_LAYERS = [DepthwiseConv,
AlphaDropout,
InstanceNorm,
GroupNorm]
function gradtest(name::String, layers::Vector, xs = nothing, args...)
isnothing(xs) && error("Missing input to test the layers against.")
@testset "$name GPU grad tests" begin
for layer in layers
@testset "$layer GPU grad test" begin
l = gpu(layer(args...))
xs = gpu(xs)
if any(x -> isa(l, x), BROKEN_LAYERS)
ps = Flux.params(l)
@test_broken gradient(() -> sum(l(xs)), ps) isa Flux.Zygote.Grads
else
ps = Flux.params(l)
@test gradient(() -> sum(l(xs)), ps) isa Flux.Zygote.Grads
gs = gradient(() -> sum(l(xs)), ps)
# Handle pooling layers
if !isempty(ps)
@test gs[first(ps)] isa Flux.CuArrays.CuArray
end
end
end
end
end
end
# Repeats from Conv, CrossCor
r = rand(Float32, 28, 28, 1, 1)
conv_layers = [Conv, ConvTranspose, CrossCor, DepthwiseConv]
gradtest("Conv", conv_layers, r, (2,2), 1=>3)
pooling_layers = [MaxPool, MeanPool]
gradtest("Pooling", pooling_layers, r, (2,2))
dropout_layers = [Dropout, AlphaDropout]
gradtest("Dropout", dropout_layers, r, 0.5f0)
norm_layers = [LayerNorm, BatchNorm]
gradtest("Normalising", norm_layers, rand(Float32, 28,28,3,1), 1)
instancenorm = [InstanceNorm]
gradtest("InstanceNorm", instancenorm, r, 1)
groupnorm = [GroupNorm]
gradtest("GroupNorm", groupnorm, rand(Float32, 28,28,3,1), 3, 1)
const stateless_layers = [Flux.mse,
Flux.crossentropy,
Flux.logitcrossentropy,
Flux.normalise]
const stateless_layers_broadcasted = [Flux.binarycrossentropy,
Flux.logitbinarycrossentropy]
function stateless_gradtest(f, args...)
@test gradient((args...) -> sum(f(args...)), args...)[1] isa CuArray
end
function stateless_gradtest_broadcasted(f, args...)
@test gradient((args...) -> sum(f.(args...)), args...)[1] isa CuArray
end
@testset "Stateless GPU grad tests" begin
x = gpu(rand(3,3))
y = gpu(rand(3,3))
for layer in stateless_layers
if layer == Flux.normalise
stateless_gradtest(layer, x)
else
stateless_gradtest(layer, x, y)
end
end
for layer in stateless_layers_broadcasted
stateless_gradtest_broadcasted(layer, x, y)
end
end

View File

@ -1,22 +1,116 @@
using Flux.Data
using Test
@testset "DataLoader" begin
X = reshape([1:10;], (2, 5))
Y = [1:5;]
@test cmudict()["CATASTROPHE"] == :[K,AH0,T,AE1,S,T,R,AH0,F,IY0].args
d = DataLoader(X, batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == typeof(X)
@test length(batches) == 3
@test batches[1] == X[:,1:2]
@test batches[2] == X[:,3:4]
@test batches[3] == X[:,5:5]
@test length(CMUDict.phones()) == 39
d = DataLoader(X, batchsize=2, partial=false)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == typeof(X)
@test length(batches) == 2
@test batches[1] == X[:,1:2]
@test batches[2] == X[:,3:4]
@test length(CMUDict.symbols()) == 84
d = DataLoader((X,), batchsize=2, partial=false)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == Tuple{typeof(X)}
@test length(batches) == 2
@test batches[1] == (X[:,1:2],)
@test batches[2] == (X[:,3:4],)
@test MNIST.images()[1] isa Matrix
@test MNIST.labels() isa Vector{Int64}
d = DataLoader((X, Y), batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == Tuple{typeof(X), typeof(Y)}
@test length(batches) == 3
@test length(batches[1]) == 2
@test length(batches[2]) == 2
@test length(batches[3]) == 2
@test batches[1][1] == X[:,1:2]
@test batches[1][2] == Y[1:2]
@test batches[2][1] == X[:,3:4]
@test batches[2][2] == Y[3:4]
@test batches[3][1] == X[:,5:5]
@test batches[3][2] == Y[5:5]
@test FashionMNIST.images()[1] isa Matrix
@test FashionMNIST.labels() isa Vector{Int64}
# test with NamedTuple
d = DataLoader((x=X, y=Y), batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == NamedTuple{(:x, :y), Tuple{typeof(X), typeof(Y)}}
@test length(batches) == 3
@test length(batches[1]) == 2
@test length(batches[2]) == 2
@test length(batches[3]) == 2
@test batches[1][1] == batches[1].x == X[:,1:2]
@test batches[1][2] == batches[1].y == Y[1:2]
@test batches[2][1] == batches[2].x == X[:,3:4]
@test batches[2][2] == batches[2].y == Y[3:4]
@test batches[3][1] == batches[3].x == X[:,5:5]
@test batches[3][2] == batches[3].y == Y[5:5]
@test Data.Sentiment.train() isa Vector{Data.Tree{Any}}
# test interaction with `train!`
θ = ones(2)
X = zeros(2, 10)
loss(x) = sum((x .- θ).^2)
d = DataLoader(X)
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1))
@test norm(θ) < 1e-4
@test Iris.features() isa Matrix
@test size(Iris.features()) == (4,150)
# test interaction with `train!`
θ = zeros(2)
X = ones(2, 10)
Y = fill(2, 10)
loss(x, y) = sum((y - x'*θ).^2)
d = DataLoader((X, Y))
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1))
@test norm(θ .- 1) < 1e-10
end
@test Iris.labels() isa Vector{String}
@test size(Iris.labels()) == (150,)
@testset "CMUDict" begin
@test cmudict()["CATASTROPHE"] == :[K,AH0,T,AE1,S,T,R,AH0,F,IY0].args
@test length(CMUDict.phones()) == 39
@test length(CMUDict.symbols()) == 84
end
@testset "MNIST" begin
@test MNIST.images()[1] isa Matrix
@test MNIST.labels() isa Vector{Int64}
end
@testset "FashionMNIST" begin
@test FashionMNIST.images()[1] isa Matrix
@test FashionMNIST.labels() isa Vector{Int64}
end
@testset "Sentiment" begin
@test Data.Sentiment.train() isa Vector{Data.Tree{Any}}
end
@testset "Iris" begin
@test Iris.features() isa Matrix
@test size(Iris.features()) == (4,150)
@test Iris.labels() isa Vector{String}
@test size(Iris.labels()) == (150,)
end
@testset "Housing" begin
@test Housing.features() isa Matrix # test broken due to SSL certifate expiration problem
@test size(Housing.features()) == (506, 13)
@test Housing.targets() isa Array{Float64}
@test size(Housing.targets()) == (506, 1)
end

View File

@ -28,6 +28,14 @@ import Flux: activations
end
@testset "Dense" begin
@testset "constructors" begin
@test size(Dense(10, 100).W) == (100, 10)
@test Dense(rand(100,10), rand(10)).σ == identity
@test_throws MethodError Dense(10, 10.5)
@test_throws MethodError Dense(10, 10.5, tanh)
end
@test length(Dense(10, 5)(randn(10))) == 5
@test_throws DimensionMismatch Dense(10, 5)(randn(1))
@test_throws MethodError Dense(10, 5)(1) # avoid broadcasting
@ -37,7 +45,6 @@ import Flux: activations
@test Dense(10, 1, identity, initW = ones, initb = zeros)(ones(10,2)) == 10*ones(1, 2)
@test Dense(10, 2, identity, initW = ones, initb = zeros)(ones(10,1)) == 10*ones(2, 1)
@test Dense(10, 2, identity, initW = ones, initb = zeros)([ones(10,1) 2*ones(10,1)]) == [10 20; 10 20]
end
@testset "Diagonal" begin
@ -92,4 +99,19 @@ import Flux: activations
@test size(SkipConnection(Dense(10,10), (a,b) -> cat(a, b, dims = 2))(input)) == (10,4)
end
end
@testset "output dimensions" begin
m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32))
@test Flux.outdims(m, (10, 10)) == (6, 6)
m = Dense(10, 5)
@test Flux.outdims(m, (5, 2)) == (5,)
@test Flux.outdims(m, (10,)) == (5,)
m = Flux.Diagonal(10)
@test Flux.outdims(m, (10,)) == (10,)
m = Maxout(() -> Conv((3, 3), 3 => 16), 2)
@test Flux.outdims(m, (10, 10)) == (8, 8)
end
end

View File

@ -4,6 +4,10 @@ using Flux: gradient
@testset "Pooling" begin
x = randn(Float32, 10, 10, 3, 2)
gmp = GlobalMaxPool()
@test size(gmp(x)) == (1, 1, 3, 2)
gmp = GlobalMeanPool()
@test size(gmp(x)) == (1, 1, 3, 2)
mp = MaxPool((2, 2))
@test mp(x) == maxpool(x, PoolDims(x, 2))
mp = MeanPool((2, 2))
@ -21,6 +25,35 @@ end
Dense(288, 10), softmax)
@test size(m(r)) == (10, 5)
# Test bias switch
bias = Conv(ones(Float32, 2, 2, 1, 3), ones(Float32, 3))
ip = zeros(Float32, 28,28,1,1)
op = bias(ip)
@test sum(op) == prod(size(op))
bias = Conv((2,2), 1=>3, bias = Flux.Zeros())
op = bias(ip)
@test sum(op) === 0.f0
gs = gradient(() -> sum(bias(ip)), Flux.params(bias))
@test gs[bias.bias] == nothing
# Train w/o bias and make sure no convergence happens
# when only bias can be converged
bias = Conv((2, 2), 1=>3, bias = Flux.Zeros());
ip = zeros(Float32, 28,28,1,1)
op = zeros(Float32, 27,27,3,1) .+ 2.f0
opt = Descent()
for _ = 1:10^3
gs = gradient(params(bias)) do
Flux.mse(bias(ip), op)
end
Flux.Optimise.update!(opt, params(bias), gs)
end
@test Flux.mse(bias(ip), op) 4.f0
end
@testset "asymmetric padding" begin
@ -107,3 +140,79 @@ end
true
end
end
@testset "conv output dimensions" begin
m = Conv((3, 3), 3 => 16)
@test Flux.outdims(m, (10, 10)) == (8, 8)
m = Conv((3, 3), 3 => 16; stride = 2)
@test Flux.outdims(m, (5, 5)) == (2, 2)
m = Conv((3, 3), 3 => 16; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = Conv((3, 3), 3 => 16; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = ConvTranspose((3, 3), 3 => 16)
@test Flux.outdims(m, (8, 8)) == (10, 10)
m = ConvTranspose((3, 3), 3 => 16; stride = 2)
@test Flux.outdims(m, (2, 2)) == (5, 5)
m = ConvTranspose((3, 3), 3 => 16; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = ConvTranspose((3, 3), 3 => 16; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (4, 4)) == (5, 5)
m = DepthwiseConv((3, 3), 3 => 6)
@test Flux.outdims(m, (10, 10)) == (8, 8)
m = DepthwiseConv((3, 3), 3 => 6; stride = 2)
@test Flux.outdims(m, (5, 5)) == (2, 2)
m = DepthwiseConv((3, 3), 3 => 6; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = DepthwiseConv((3, 3), 3 => 6; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = CrossCor((3, 3), 3 => 16)
@test Flux.outdims(m, (10, 10)) == (8, 8)
m = CrossCor((3, 3), 3 => 16; stride = 2)
@test Flux.outdims(m, (5, 5)) == (2, 2)
m = CrossCor((3, 3), 3 => 16; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = CrossCor((3, 3), 3 => 16; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MaxPool((2, 2))
@test Flux.outdims(m, (10, 10)) == (5, 5)
m = MaxPool((2, 2); stride = 1)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MaxPool((2, 2); stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = MeanPool((2, 2))
@test Flux.outdims(m, (10, 10)) == (5, 5)
m = MeanPool((2, 2); stride = 1)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MeanPool((2, 2); stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
end
@testset "$ltype SamePad kernelsize $k" for ltype in (Conv, ConvTranspose, DepthwiseConv, CrossCor), k in ( (1,), (2,), (3,), (4,5), (6,7,8))
data = ones(Float32, (k .+ 3)..., 1,1)
l = ltype(k, 1=>1, pad=SamePad())
@test size(l(data)) == size(data)
l = ltype(k, 1=>1, pad=SamePad(), dilation = k 2)
@test size(l(data)) == size(data)
stride = 3
l = ltype(k, 1=>1, pad=SamePad(), stride = stride)
if ltype == ConvTranspose
@test size(l(data))[1:end-2] == stride .* size(data)[1:end-2] .- stride .+ 1
else
@test size(l(data))[1:end-2] == ceil.(Int, size(data)[1:end-2] ./ stride)
end
end
@testset "$ltype SamePad windowsize $k" for ltype in (MeanPool, MaxPool), k in ( (1,), (2,), (3,), (4,5), (6,7,8))
data = ones(Float32, (k .+ 3)..., 1,1)
l = ltype(k, pad=SamePad())
@test size(l(data))[1:end-2] == ceil.(Int, size(data)[1:end-2] ./ k)
end

View File

@ -1,30 +1,32 @@
using Flux, Test, Statistics
using Zygote: pullback
trainmode(f, x...) = pullback(f, x...)[1]
trainmode(f) = (x...) -> trainmode(f, x...)
evalwgrad(f, x...) = pullback(f, x...)[1]
@testset "Dropout" begin
x = [1.,2.,3.]
@test x == Dropout(0.1)(x)
@test x == trainmode(Dropout(0), x)
@test zero(x) == trainmode(Dropout(1), x)
@test x == evalwgrad(Dropout(0), x)
@test zero(x) == evalwgrad(Dropout(1), x)
x = rand(100)
m = Dropout(0.9)
y = trainmode(m, x)
y = evalwgrad(m, x)
@test count(a->a==0, y) > 50
y = m(x)
testmode!(m, true)
y = evalwgrad(m, x) # should override istraining
@test count(a->a==0, y) == 0
y = trainmode(m, x)
testmode!(m, false)
y = evalwgrad(m, x)
@test count(a->a==0, y) > 50
x = rand(Float32, 100)
m = Chain(Dense(100,100),
Dropout(0.9))
y = trainmode(m, x)
y = evalwgrad(m, x)
@test count(a->a == 0, y) > 50
y = m(x)
testmode!(m, true)
y = evalwgrad(m, x) # should override istraining
@test count(a->a == 0, y) == 0
x = rand(100, 50)
@ -49,7 +51,7 @@ end
# initial m.σ is 1
# initial m.μ is 0
y = trainmode(m, x)
y = evalwgrad(m, x)
@test isapprox(y, [-1.22474 0 1.22474; -1.22474 0 1.22474], atol = 1.0e-5)
# julia> x
# 2×3 Array{Float64,2}:
@ -82,19 +84,19 @@ end
@test isapprox(y, sigmoid.((x .- m.μ) ./ sqrt.(m.σ² .+ m.ϵ)), atol = 1.0e-7)
end
let m = trainmode(BatchNorm(2)), x = reshape(Float32.(1:6), 3, 2, 1)
let m = trainmode!(BatchNorm(2)), x = reshape(Float32.(1:6), 3, 2, 1)
y = reshape(permutedims(x, [2, 1, 3]), 2, :)
y = permutedims(reshape(m(y), 2, 3, 1), [2, 1, 3])
@test m(x) == y
end
let m = trainmode(BatchNorm(2)), x = reshape(Float32.(1:12), 2, 3, 2, 1)
let m = trainmode!(BatchNorm(2)), x = reshape(Float32.(1:12), 2, 3, 2, 1)
y = reshape(permutedims(x, [3, 1, 2, 4]), 2, :)
y = permutedims(reshape(m(y), 2, 2, 3, 1), [2, 3, 1, 4])
@test m(x) == y
end
let m = trainmode(BatchNorm(2)), x = reshape(Float32.(1:24), 2, 2, 3, 2, 1)
let m = trainmode!(BatchNorm(2)), x = reshape(Float32.(1:24), 2, 2, 3, 2, 1)
y = reshape(permutedims(x, [4, 1, 2, 3, 5]), 2, :)
y = permutedims(reshape(m(y), 2, 2, 2, 3, 1), [2, 3, 4, 1, 5])
@test m(x) == y
@ -117,7 +119,7 @@ end
x = Float64.(x)
@test m.β == [0, 0] # initβ(2)
@test m.γ == [1, 1] # initγ(2)
y = trainmode(m, x)
y = evalwgrad(m, x)
#julia> x
#[:, :, 1] =
@ -162,7 +164,7 @@ end
@test isapprox(y, sigmoid.((x .- expand_inst(m.μ, affine_shape)) ./ sqrt.(expand_inst(m.σ², affine_shape) .+ m.ϵ)), atol = 1.0e-7)
end
let m = trainmode(InstanceNorm(2)), sizes = (2, 4, 1, 2, 3),
let m = trainmode!(InstanceNorm(2)), sizes = (2, 4, 1, 2, 3),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
y = reshape(permutedims(x, [3, 1, 2, 4, 5]), :, 2, 3)
y = reshape(m(y), sizes...)
@ -172,14 +174,14 @@ end
# check that μ, σ², and the output are the correct size for higher rank tensors
let m = InstanceNorm(2), sizes = (5, 5, 3, 4, 2, 6),
x = reshape(Float32.(collect(1:prod(sizes))), sizes)
y = trainmode(m, x)
y = evalwgrad(m, x)
@test size(m.μ) == (sizes[end - 1], )
@test size(m.σ²) == (sizes[end - 1], )
@test size(y) == sizes
end
# show that instance norm is equal to batch norm when channel and batch dims are squashed
let m_inorm = trainmode(InstanceNorm(2)), m_bnorm = trainmode(BatchNorm(12)), sizes = (5, 5, 3, 4, 2, 6),
let m_inorm = trainmode!(InstanceNorm(2)), m_bnorm = trainmode!(BatchNorm(12)), sizes = (5, 5, 3, 4, 2, 6),
x = reshape(Float32.(collect(1:prod(sizes))), sizes)
@test m_inorm(x) == reshape(m_bnorm(reshape(x, (sizes[1:end - 2]..., :, 1))), sizes)
end
@ -204,7 +206,7 @@ if VERSION >= v"1.1"
@test m.β == [0, 0, 0, 0] # initβ(32)
@test m.γ == [1, 1, 1, 1] # initγ(32)
y = trainmode(m, x)
y = evalwgrad(m, x)
#julia> x
#[:, :, 1] =
@ -263,7 +265,7 @@ if VERSION >= v"1.1"
@test isapprox(y, out, atol = 1.0e-7)
end
let m = trainmode(GroupNorm(2,2)), sizes = (2, 4, 1, 2, 3),
let m = trainmode!(GroupNorm(2,2)), sizes = (2, 4, 1, 2, 3),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
y = reshape(permutedims(x, [3, 1, 2, 4, 5]), :, 2, 3)
y = reshape(m(y), sizes...)
@ -273,20 +275,20 @@ if VERSION >= v"1.1"
# check that μ, σ², and the output are the correct size for higher rank tensors
let m = GroupNorm(4,2), sizes = (5, 5, 3, 4, 4, 6),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
y = trainmode(m, x)
y = evalwgrad(m, x)
@test size(m.μ) == (m.G,1)
@test size(m.σ²) == (m.G,1)
@test size(y) == sizes
end
# show that group norm is the same as instance norm when the group size is the same as the number of channels
let IN = trainmode(InstanceNorm(4)), GN = trainmode(GroupNorm(4,4)), sizes = (2,2,3,4,5),
let IN = trainmode!(InstanceNorm(4)), GN = trainmode!(GroupNorm(4,4)), sizes = (2,2,3,4,5),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
@test IN(x) GN(x)
end
# show that group norm is the same as batch norm for a group of size 1 and batch of size 1
let BN = trainmode(BatchNorm(4)), GN = trainmode(GroupNorm(4,4)), sizes = (2,2,3,4,1),
let BN = trainmode!(BatchNorm(4)), GN = trainmode!(GroupNorm(4,4)), sizes = (2,2,3,4,1),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
@test BN(x) GN(x)
end

View File

@ -1,9 +1,26 @@
using Test
using Flux: onehotbatch, mse, crossentropy, logitcrossentropy,
σ, binarycrossentropy, logitbinarycrossentropy
σ, binarycrossentropy, logitbinarycrossentropy, flatten,
xlogx, xlogy
const ϵ = 1e-7
@testset "xlogx & xlogy" begin
@test iszero(xlogx(0))
@test isnan(xlogx(NaN))
@test xlogx(2) 2.0 * log(2.0)
@inferred xlogx(2)
@inferred xlogx(0)
@test iszero(xlogy(0, 1))
@test isnan(xlogy(NaN, 1))
@test isnan(xlogy(1, NaN))
@test isnan(xlogy(NaN, NaN))
@test xlogy(2, 3) 2.0 * log(3.0)
@inferred xlogy(2, 3)
@inferred xlogy(0, 1)
end
@testset "losses" begin
# First, regression-style y's
y = [1, 1, 0, 0]
@ -13,6 +30,20 @@ const ϵ = 1e-7
@test mse(ŷ, y) (.1^2 + .9^2)/2
end
@testset "mae" begin
@test Flux.mae(ŷ, y) 1/2
end
@testset "huber_loss" begin
@test Flux.huber_loss(ŷ, y) 0.20500000000000002
end
y = [123.0,456.0,789.0]
ŷ = [345.0,332.0,789.0]
@testset "msle" begin
@test Flux.msle(ŷ, y) 0.38813985859136585
end
# Now onehot y's
y = onehotbatch([1, 1, 0, 0], 0:1)
ŷ = [.1 .9; .9 .1; .9 .1; .1 .9]'
@ -21,6 +52,7 @@ const ϵ = 1e-7
lossvalue = 1.203972804325936
@testset "crossentropy" begin
@test crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9]) crossentropy([0.1,0.9], [0.1,0.9])
@test crossentropy(ŷ, y) lossvalue
end
@ -50,11 +82,52 @@ const ϵ = 1e-7
@test logitbinarycrossentropy.(logŷ, y) binarycrossentropy.(σ.(logŷ), y; ϵ=0)
end
y = [1 2 3]
ŷ = [4.0 5.0 6.0]
@testset "kldivergence" begin
@test Flux.kldivergence([0.1,0.0,0.9], [0.1,0.0,0.9]) Flux.kldivergence([0.1,0.9], [0.1,0.9])
@test Flux.kldivergence(ŷ, y) -1.7661057888493457
@test Flux.kldivergence(y, y) 0
end
y = [1 2 3 4]
ŷ = [5.0 6.0 7.0 8.0]
@testset "hinge" begin
@test Flux.hinge(ŷ, y) 0
@test Flux.hinge(y, 0.5 .* y) 0.125
end
@testset "squared_hinge" begin
@test Flux.squared_hinge(ŷ, y) 0
@test Flux.squared_hinge(y, 0.5 .* y) 0.0625
end
y = [0.1 0.2 0.3]
ŷ = [0.4 0.5 0.6]
@testset "poisson" begin
@test Flux.poisson(ŷ, y) 0.6278353988097339
@test Flux.poisson(y, y) 0.5044459776946685
end
y = [1.0 0.5 0.3 2.4]
ŷ = [0 1.4 0.5 1.2]
@testset "dice_coeff_loss" begin
@test Flux.dice_coeff_loss(ŷ, y) 0.2799999999999999
@test Flux.dice_coeff_loss(y, y) 0.0
end
@testset "tversky_loss" begin
@test Flux.tversky_loss(ŷ, y) -0.06772009029345383
@test Flux.tversky_loss(ŷ, y, β = 0.8) -0.09490740740740744
@test Flux.tversky_loss(y, y) -0.5576923076923075
end
@testset "no spurious promotions" begin
for T in (Float32, Float64)
y = rand(T, 2)
ŷ = rand(T, 2)
for f in (mse, crossentropy, logitcrossentropy)
for f in (mse, crossentropy, logitcrossentropy, Flux.kldivergence, Flux.hinge, Flux.poisson,
Flux.mae, Flux.huber_loss, Flux.msle, Flux.squared_hinge, Flux.dice_coeff_loss, Flux.tversky_loss)
fwd, back = Flux.pullback(f, , y)
@test fwd isa T
@test eltype(back(one(T))[1]) == T
@ -62,3 +135,10 @@ const ϵ = 1e-7
end
end
end
@testset "helpers" begin
@testset "flatten" begin
x = randn(Float32, 10, 10, 3, 2)
@test size(flatten(x)) == (300, 2)
end
end

View File

@ -57,35 +57,57 @@ end
end
@testset "ExpDecay" begin
w = randn(10, 10)
o = ExpDecay(0.1, 0.1, 1000, 1e-4)
w1 = randn(10,10)
loss(x) = Flux.mse(w*x, w1*x)
flag = 1
decay_steps = []
for t = 1:10^5
prev_eta = o.eta
θ = Params([w1])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
prev_grad = collect(θ̄[w1])
delta = Optimise.apply!(o, w1, θ̄[w1])
w1 .-= delta
new_eta = o.eta
if new_eta != prev_eta
push!(decay_steps, t)
end
array = fill(o.eta, size(prev_grad))
if array .* prev_grad != delta
flag = 0
end
@testset "Sanity Check" begin
o = ExpDecay(0.2, 0.5, 1, 1e-3)
p = [0.0]
steps = 1:8
eta_expected = @. max(o.eta * 0.5 ^ steps, o.clip)
eta_actual = [Optimise.apply!(o, p, [1.0])[1] for _ in steps]
@test eta_actual == eta_expected
end
w = randn(10, 10)
o = ExpDecay(0.1, 0.1, 1000, 1e-4)
w1 = randn(10,10)
loss(x) = Flux.mse(w*x, w1*x)
flag = 1
decay_steps = []
for t = 1:10^5
prev_eta = o.eta
θ = Params([w1])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
prev_grad = collect(θ̄[w1])
delta = Optimise.apply!(o, w1, θ̄[w1])
w1 .-= delta
new_eta = o.eta
if new_eta != prev_eta
push!(decay_steps, t)
end
@test flag == 1
# Test to check if decay happens at decay steps. Eta reaches clip value eventually.
ground_truth = []
for i in 1:11
push!(ground_truth, 1000*i) # Expected decay steps for this example.
array = fill(o.eta, size(prev_grad))
if array .* prev_grad != delta
flag = 0
end
@test decay_steps == ground_truth
@test o.eta == o.clip
end
@test flag == 1
# Test to check if decay happens at decay steps. Eta reaches clip value (1e-4) after 4000 steps (decay by 0.1 every 1000 steps starting at 0.1).
ground_truth = []
for i in 1:4
push!(ground_truth, 1000*i) # Expected decay steps for this example.
end
@test decay_steps == ground_truth
@test o.eta == o.clip
end
@testset "Clipping" begin
w = randn(10, 10)
loss(x) = sum(w * x)
θ = Params([w])
x = 1000 * randn(10)
= gradient(() -> loss(x), θ)[w]
w̄_value = Optimise.apply!(ClipValue(1.0), w, copy())
@test all(w̄_value .<= 1)
w̄_norm = Optimise.apply!(ClipNorm(1.0), w, copy())
@test norm(w̄_norm) <= 1
end

View File

@ -1,32 +1,46 @@
using Flux, Test, Random, Statistics, Documenter
using Random
using Flux
using Flux.Data
using Test
using Random, Statistics, LinearAlgebra
using IterTools: ncycle
Random.seed!(0)
@testset "Flux" begin
@info "Testing Basics"
include("utils.jl")
include("onehot.jl")
include("optimise.jl")
include("data.jl")
@info "Testing Layers"
include("layers/basic.jl")
include("layers/normalisation.jl")
include("layers/stateless.jl")
include("layers/conv.jl")
if Flux.use_cuda[]
include("cuda/cuda.jl")
else
@warn "CUDA unavailable, not testing GPU support"
@testset "Utils" begin
include("utils.jl")
end
if VERSION >= v"1.2"
doctest(Flux)
@testset "Onehot" begin
include("onehot.jl")
end
@testset "Optimise" begin
include("optimise.jl")
end
@testset "Data" begin
include("data.jl")
end
@testset "Layers" begin
include("layers/basic.jl")
include("layers/normalisation.jl")
include("layers/stateless.jl")
include("layers/conv.jl")
end
@testset "CUDA" begin
if Flux.use_cuda[]
include("cuda/cuda.jl")
else
@warn "CUDA unavailable, not testing GPU support"
end
end
@static if VERSION >= v"1.4"
using Documenter
@testset "Docs" begin
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
doctest(Flux)
end
end