Compare commits

..

162 Commits

Author SHA1 Message Date
bors[bot]
7035ee9bea
Merge #1238
1238: Fix inline code block r=dhairyagandhi96 a=harryscholes

### PR Checklist

- [ ] Tests are added
- [ ] Entry in NEWS.md
- [x] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: harryscholes <harryscholes@gmail.com>
2020-06-19 08:28:41 +00:00
harryscholes
57efd7fead Fix inline code block 2020-06-19 09:24:44 +01:00
bors[bot]
19b45b49d3
Merge #1221
1221: DataLoader with NamedTuple r=CarloLucibello a=cossio

Just a couple of small changes, so that `DataLoader` can be created with a `NamedTuple` of tensors instead of `Tuple`. This way the tensors can be referred to by name. For example

```
train_loader = DataLoader((images = Xtrain, labels = Ytrain), batchsize=16)
batch = first(train_loader)
y = model(batch.images)
logitcrossentropy(y, batch.labels)
```

If we only use tuples, then in datasets with multiple tensors one has to be careful about the order in which the tensors are fed into the `DataLoader` constructor and be consistent with this elsewhere. With `NamedTuples` one just have to be consistent about the names used, which I think is a minor improvement.

CC @CarloLucibello 

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md
- [x] Documentation, if applicable

I don't think this qualifies as an API change. It's just a minor feature addition. So final review probably not required.

- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: cossio <j.cossio.diaz@gmail.com>
Co-authored-by: cossio <cossio@users.noreply.github.com>
2020-06-16 17:21:28 +00:00
bors[bot]
254e4a7058
Merge #1231
1231: use `ntuple` in conv r=MikeInnes a=MikeInnes

This is the right abstraction over `map`, and in particular is a bit easier to compile away in some cases. 

As this is a trivial change from Flux's perspective it's not easy to test here, but there are downstream tests in XLA.jl.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-06-16 13:04:20 +00:00
Mike J Innes
9f931dd7fa use ntuple in conv 2020-06-16 14:02:24 +01:00
cossio
9078f85096 revert selectdim
selectdim can lead to type instability, see https://discourse.julialang.org/t/why-selectdim-is-type-instable/25271/5
2020-06-16 13:32:27 +02:00
cossio
1dbaf32810 DataLoader type inference tests 2020-06-16 13:32:27 +02:00
cossio
cb34bb848b simplify _getobs 2020-06-16 13:32:27 +02:00
cossio
75692161a7 Apply suggestions from code review
accept suggested changes

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-06-16 13:32:27 +02:00
cossio
909a55ac10 news and docs 2020-06-16 13:32:27 +02:00
cossio
02ee6ba426 DataLoader with NamedTuple 2020-06-16 13:31:29 +02:00
bors[bot]
97406507fd
Merge #1218
1218: Require weight and bias to be AbstractArrays r=CarloLucibello a=oxinabox

closes #1199
While in theory someone could be using Dense with weights and biases that are not abstract arrays, I would be surprised.
So allowing it is just leaving a food-gun laying around.
If it is common then we can instead close #1199 by adding a special constructor for `Number` subtypes that error if they are not integers, or something a long those lines.

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md

I think this is a bug-fix thus the following are not required:

- [ ] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: Lyndon White <lyndon.white@invenialabs.co.uk>
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
2020-06-15 15:21:21 +00:00
Lyndon White
e61787c1c8
Update test/layers/basic.jl 2020-06-12 13:58:10 +01:00
Lyndon White
601f842eaf
bonus test 2020-06-11 23:17:40 +01:00
bors[bot]
99ec30c8c2
Merge #1220
1220: CompatHelper: bump compat for "Adapt" to "2.0" r=CarloLucibello a=github-actions[bot]

This pull request changes the compat entry for the `Adapt` package from `1` to `1, 2.0`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-06-11 09:54:46 +00:00
github-actions[bot]
fbfc973011 CompatHelper: bump compat for "Adapt" to "2.0" 2020-06-11 00:18:47 +00:00
Lyndon White
a1623aca76
move into 0.11 news 2020-06-10 12:39:00 +01:00
Lyndon White
15c7354c4e
Make release as DEV 2020-06-10 12:38:33 +01:00
Lyndon White
97b0aa4d36 bump version 2020-06-10 12:14:47 +01:00
Lyndon White
cf90517a8a update news.md 2020-06-10 12:14:19 +01:00
Lyndon White
df84628c29 Require weight and bias to be AbstractArrays 2020-06-10 12:06:57 +01:00
bors[bot]
e1f80d4627
Merge #1213
1213: Fixing indentation in train! docstring r=CarloLucibello a=natema

One code block is not correctly displayed in the doc of [Flux.Optimise.train!
](https://fluxml.ai/Flux.jl/stable/training/training/#Flux.Optimise.train!). 
Based on the previous code block, I guess it's an indentation problem.


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-08 18:29:46 +00:00
bors[bot]
a7bbd3d35b
Merge #1152
1152: extend dataloader r=CarloLucibello a=CarloLucibello

cfr discussion in #1149. Currently DataLoader interface supports

1. `for x in DataLoader(X)`
2. `for (x, y) in DataLoader(X, Y)`

This PR adds

3. `for (x,) in DataLoader((X,))`
4. `for (x, y) in DataLoader((X, Y))`

Edit:
the constructor in 2. is removed in this PR

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-06-08 18:01:06 +00:00
CarloLucibello
0cf46432cf cleanup 2020-06-08 19:59:34 +02:00
natema
70bbf18180
Fixing indentation in train! docstring
One code block is not correctly displayed in the doc of [Flux.Optimise.train!
](https://fluxml.ai/Flux.jl/stable/training/training/#Flux.Optimise.train!). 
Based on the previous code block, I guess it's an indentation problem.
2020-06-07 15:44:04 +02:00
bors[bot]
d9b07475b0
Merge #1129
1129: Added dropgrad in huber_loss r=CarloLucibello a=HenriDeh

Workaround to prevent `iterate(::nothing)` when working with CuArrays. See issue #1128

Co-authored-by: HenriDeh <47037088+HenriDeh@users.noreply.github.com>
2020-06-06 17:21:19 +00:00
bors[bot]
9ebbe8cb4c
Merge #1141
1141: Speedup matmul of CuMatrix and OneHotMatrix r=CarloLucibello a=AStupidBear

This solves #189.

```julia
julia> using Flux


julia> using Flux: CuArrays

julia> A = zeros(300, 10000) |> gpu;

julia> B = Flux.onehotbatch(rand(1:10000, 256), 1:10000) |> gpu;

julia> A * B; CuArrays.@time A * B;
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/shared/.julia/packages/GPUArrays/OXvxB/src/host/indexing.jl:43
  0.002824 seconds (951 CPU allocations: 38.156 KiB) (2 GPU allocations: 301.000 KiB, 2.32% gc time of which 46.42% spent allocating)

julia> import Base: *

julia> A::AbstractMatrix * B::Flux.OneHotMatrix = @inbounds A[:, map(x->x.ix, B.data)]
* (generic function with 522 methods)

julia> A * B; CuArrays.@time A * B;
  0.000343 seconds (169 CPU allocations: 5.000 KiB) (2 GPU allocations: 301.000 KiB, 15.53% gc time of which 65.97% spent allocating)
```

Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-06-06 17:00:01 +00:00
CarloLucibello
b1f226eb34 add news 2020-06-06 18:15:04 +02:00
CarloLucibello
a643cb6758 extend dataloader 2020-06-06 18:02:03 +02:00
bors[bot]
792a1c54f8
Merge #1211
1211: Fixing syntax in onehot docstring r=CarloLucibello a=natema

`otherwise, it will error` -> `otherwise, it will raise an error`


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-06 15:02:40 +00:00
natema
8f6aed5770
Fixing syntax in onehot docstring
`otherwise, it will error` -> `otherwise, it will raise an error`
2020-06-05 18:20:50 +02:00
bors[bot]
22d5e318e5
Merge #1192
1192: Improve `restructure` performance r=dhairyagandhi96 a=MikeInnes

A small change, but it significantly improves the performance on the following test case:

```julia
julia> VERSION
v"1.5.0-DEV.876"

julia> using Flux, DiffEqFlux, BenchmarkTools

julia> using Flux: mse

julia> fastdense = FastDense(784, 32, tanh);

julia> p = initial_params(fastdense);

julia> dense = Dense(784, 32, tanh);

julia> p,re = Flux.destructure(dense);

julia> x = rand(Float32, 784, 10);

julia> y = rand(Float32, 32, 10);

julia> @btime gradient((x,p) -> mse(fastdense(x, p), y), x, p);
  505.530 μs (87 allocations: 240.73 KiB)

julia> @btime gradient((x,p) -> mse(re(p)(x), y), x, p);
  107.796 μs (139 allocations: 340.94 KiB)
```

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-06-05 14:53:11 +00:00
bors[bot]
71ebd51e45
Merge #1208
1208: Fixing output format for `onehot` r=dhairyagandhi96 a=natema

Currently `Flux.OneHotVector` is displayed as a binary vector (0/1) rather than a boolean one (true/false). This is also shown in successive examples in the same page. 
I fixed the `onehot(:b, [:a, :b, :c])` and `onehot(:c, [:a, :b, :c])` outputs in the first example of the page accordingly.


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-05 09:17:12 +00:00
bors[bot]
b5a73f8532
Merge #1207
1207: Fixing typo in docs r=dhairyagandhi96 a=natema

`what ever` -> `whatever`


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-05 09:00:06 +00:00
natema
48d6f2d0c0
Fixing output format for onehot
`Flux.OneHotVector` is displayed as a binary vector (0/1) rather than a boolean (true/false) one, as is also shown in successive examples in the same page, so I fixed the `onehot(:b, [:a, :b, :c])` and `onehot(:c, [:a, :b, :c])` output as given by the current Julia version 1.4.2.
2020-06-03 17:03:08 +02:00
natema
2c4b1e521e
Fixing typo in docs
`what ever` -> `whatever`
2020-06-02 19:20:41 +02:00
bors[bot]
ca1b1b2c7c
Merge #1206
1206: Fixing ambiguous remark in Preserve inputs' types r=dhairyagandhi96 a=natema

This PR is based on the [discussion in the forum](https://discourse.julialang.org/t/not-clear-what-0-01f0x-is-in-the-flux-docs/40553?u=mathematics) on the ambiguity of `0.01f0x` in the line
> While one could change the activation function (e.g. to use `0.01f0x`)

Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-02 17:09:58 +00:00
natema
a24f46b606
Fixing ambiguous remark in Preserve inputs' types
This PR is based on the [discussion in the forum](https://discourse.julialang.org/t/not-clear-what-0-01f0x-is-in-the-flux-docs/40553?u=mathematics) on the ambiguity of `0.01f0x` in the line
> While one could change the activation function (e.g. to use `0.01f0x`)
2020-06-02 18:48:07 +02:00
Mike J Innes
089ec0832c improved restructure adjoint 2020-05-27 12:28:22 +01:00
bors[bot]
ddd0f4e747
Merge #1191
1191: Pull Request Template r=MikeInnes a=MikeInnes

Hopefully makes it a little clearer what the requirements are, which will lead to easier review, and encourage things like NEWS.md that we want to be better in sync.

cc @dhairyagandhi96 and @CarloLucibello for thoughts.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-27 11:15:26 +00:00
Mike J Innes
e10818bbad
Update pull_request_template.md 2020-05-27 12:12:13 +01:00
Mike J Innes
8c3a80c940
Create pull_request_template.md 2020-05-26 12:52:28 +01:00
bors[bot]
85c39e2309
Merge #1190
1190: Correcting advanced.md r=dhairyagandhi96 a=Sleort

To make the example consistent, it should be 
```
julia> Flux.trainable(a::Affine) = (a.W,)
```
not
```
julia> Flux.trainable(a::Affine) = (a.W, a.b)
```

Co-authored-by: Troels Arnfred Bojesen <tr-ab@online.no>
2020-05-25 14:47:42 +00:00
Troels Arnfred Bojesen
17bb00a3fa
Correcting advanced.md
To make the example consistent, it should be 
```
julia> Flux.trainable(a::Affine) = (a.W,)
```
not
```
julia> Flux.trainable(a::Affine) = (a.W, a.b)
```
2020-05-25 23:33:09 +09:00
bors[bot]
bd152ca099
Merge #1177
1177: Align ExpDecay implementation with documentation r=dhairyagandhi96 a=DrChainsaw

Fix for #1176 



Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
2020-05-21 14:33:20 +00:00
bors[bot]
f343172daf
Merge #1185
1185: Add some news r=dhairyagandhi96 a=dhairyagandhi96

cc @CarloLucibello please add to this list as well

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-21 12:46:39 +00:00
bors[bot]
472e1fbf5e
Merge #957
957: Add some gradient checking tests on GPUs r=dhairyagandhi96 a=dhairyagandhi96

Good to add generic tests for tracking gradients through the various layers on the GPU.

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
2020-05-21 12:25:53 +00:00
Dhairya Gandhi
0801064d50 add comment on broken layers 2020-05-20 00:11:38 +05:30
Dhairya Gandhi
c4409fa6d1 clearing failures 2020-05-19 23:54:18 +05:30
bors[bot]
87ba651add
Merge #1165
1165: Fix docstring of logitcrossentropy r=dhairyagandhi96 a=cossio

Since `y` is a logit, there is no log (see the diff).

Co-authored-by: cossio <cossio@users.noreply.github.com>
2020-05-19 11:07:15 +00:00
Dhairya Gandhi
55430e207d add news 2020-05-19 16:34:28 +05:30
bors[bot]
0b10f1a8df
Merge #1184
1184: Add some functions to docs r=dhairyagandhi96 a=dhairyagandhi96



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-18 21:10:46 +00:00
DrChainsaw
9a24ee0bd7 Change intendation to 2 spaces 2020-05-18 21:52:40 +02:00
Dhairya Gandhi
bdfe567519 add some layers to docs 2020-05-18 23:53:11 +05:30
bors[bot]
b6a5dd7152
Merge #1133
1133: add ClipValue and ClipNorm r=CarloLucibello a=AStupidBear



Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-05-15 17:15:07 +00:00
Yao Lu
007586858c fix export merge conflict 2020-05-14 17:13:35 +08:00
Dhairya Gandhi
fab53e0a01
Merge pull request #1179 from FluxML/compathelper/new_version/2020-05-13-00-13-17-919-1190174363
CompatHelper: add new compat entry for "Functors" at version "0.1"
2020-05-13 11:27:40 +05:30
github-actions[bot]
3fa9e91c41 CompatHelper: add new compat entry for "Functors" at version "0.1" 2020-05-13 00:13:46 +00:00
DrChainsaw
e8433d0abe Align ExpDecay implementation with documentation 2020-05-12 22:50:17 +02:00
bors[bot]
de39d1095b
Merge #1175
1175: xlogy broadcast adjoint r=MikeInnes a=MikeInnes

This is helpful for performance, since it avoids having to differentiate `xlogy` itself inside of a map.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-12 17:10:58 +00:00
Mike J Innes
f5a8900ffb xlogy broadcast adjoint 2020-05-12 17:29:35 +01:00
Mike J Innes
bd43201f37
fix logitcrossentropy doc string 2020-05-12 16:18:29 +01:00
bors[bot]
a84e08cf28
Merge #1174
1174: Functors r=MikeInnes a=MikeInnes

Just splits out the implementation to the [Functors](https://github.com/FluxML/Functors.jl) package, so the same traits can be used elsewhere (e.g. Optimisers.jl) without depending on all of Flux.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-12 14:39:08 +00:00
Dhairya Gandhi
36d3a9ce99
Merge pull request #1162 from aminya/patch-5
Update CompatHelper.yml
2020-05-10 14:21:14 +05:30
Yao Lu
5a9eb7411a cpu 2020-05-10 14:39:48 +08:00
Yao Lu
888f286c51 use @inbounds 2020-05-09 19:40:46 +08:00
Yao Lu
63cb70dd23 remove importing CuMatrix 2020-05-09 19:13:52 +08:00
Yao Lu
30648910c8 transfer onehot indices back to cpu 2020-05-09 19:10:46 +08:00
Yao Lu
d1ad8db625 add to docs 2020-05-09 16:40:26 +08:00
bors[bot]
d89ee6cdba
Merge #1167
1167: Update basics.md r=dhairyagandhi96 a=mipals

Removing superfluous ```using Flux```

Co-authored-by: Mikkel Paltorp Schmitt <mikkel.paltorp@gmail.com>
2020-05-08 11:38:22 +00:00
bors[bot]
0287abbf66
Merge #1166
1166: Fix crossentropy when some probabilities are zero r=dhairyagandhi96 a=cossio

Use a function `xlogy(x,y) = x * log(y)` that has the correct limit at `x=0`.

Before this PR:

```julia
julia> Flux.crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9])
NaN
```

After this PR:

```julia
julia> Flux.crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9])
0.3250829733914482
```

Co-authored-by: cossio <j.cossio.diaz@gmail.com>
2020-05-08 11:14:31 +00:00
cossio
17f54e4c6f bump version 2020-05-08 12:57:34 +02:00
cossio
feb72d400a NaN 2020-05-07 12:44:32 +02:00
cossio
86d6555269 cufunc 2020-05-07 09:58:33 +02:00
Mikkel Paltorp Schmitt
40efa9df49
Update basics.md
Removing superfluous ```using Flux```
2020-05-06 13:41:56 +02:00
cossio
8314200c51 generic 2020-05-05 19:23:05 +02:00
cossio
06c1e20372 add tests 2020-05-05 19:05:04 +02:00
cossio
480473a81b xlogy 2020-05-05 18:33:50 +02:00
cossio
9e1fd883d5
Fix docstring of logitbinarycrossentropy and logitcrossentropy 2020-05-05 16:29:29 +02:00
Amin Yahyaabadi
70f76fd6db
Update CompatHelper.yml 2020-05-05 07:11:22 -05:00
bors[bot]
c444226db5
Merge #1160
1160: Build docs on Julia 1.3 r=dhairyagandhi96 a=dhairyagandhi96

This causes red CI otherwise

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-04 12:59:25 +00:00
Dhairya Gandhi
f2c66579ec yaml syntax fix 2020-05-04 18:01:33 +05:30
Dhairya Gandhi
fc464f5ef8 build docs on Julia 1.3 2020-05-04 17:54:04 +05:30
bors[bot]
1e2476b3c2
Merge #1156
1156: Add correct overload for apply! in docs r=dhairyagandhi96 a=dhairyagandhi96

Maybe we should considering adding a `const` name that is better than `apply!` (or rename `apply!`) and export it, so folks can just overload `descriptive_apply_my_optimiser_rule!` rather than have to go to the sub-project `Flux.Optimise`?

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-04 06:01:23 +00:00
Dhairya Gandhi
d6a1ccd354 add correct overload for apply in docs 2020-05-03 16:56:39 +05:30
bors[bot]
5d9acc7e73
Merge #873
873: Make bias optional r=MikeInnes a=dhairyagandhi96

Addresses #868 



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-01 13:28:15 +00:00
Mike J Innes
8f877f2dbf quick fix 2020-05-01 14:22:46 +01:00
Dhairya Gandhi
29215fa5d7 comment on possible future deprecations 2020-04-29 16:17:44 +05:30
Dhairya Gandhi
534809ae78 move zeros to its own file 2020-04-29 16:15:35 +05:30
Dhairya Gandhi
5086c0f4f0 merge conflicts 2020-04-29 16:11:39 +05:30
Yao Lu
114f63a214 norm(Δ) 2020-04-26 17:28:07 +08:00
Yao Lu
eb6898ea19 speedup matmul of CuMatrix and OneHotMatrix 2020-04-25 23:22:46 +08:00
Yao Lu
7d6f711c6f Merge branch 'master' into clip 2020-04-25 22:18:58 +08:00
bors[bot]
9237cdaf5b
Merge #901
901: Add option for "Same" padding to conv and pooling layers r=dhairyagandhi96 a=DrChainsaw

Fixes #813 

This adds the possibility to set "pad=SamePad()" to automatically calculate the amount of padding to apply so that outputsize==inputsize (assuming stide == 1).

Comments on API more than welcome. I considered the following options:

* Call the type just Same and export it, but I was afraid to cause name collisions due to a too generic name
* Call the type Same and not export it
* Dispatch on type instead of instance (so that one can type pad=Same instead of pad=Same())
* Supply a method instead of a type, giving a similar API as above. 

Happy to change to any of the above or to anything else.

I don't think that same padding is common for pooling layers, but I added it just for the sake of consistency. It is a separate commit so it can easily be removed if not wanted.

Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
2020-04-25 04:39:18 +00:00
DrChainsaw
4e4f6d9d1f Change next version entry to 0.10.5 2020-04-24 22:07:57 +02:00
DrChainsaw
deff98812a Add v0.11.0 entry and added samepadding option 2020-04-24 21:59:02 +02:00
DrChainsaw
1544f84bb9 Fix merge conflicts 2020-04-24 21:56:26 +02:00
Yao Lu
58a72ec879 Merge branch 'master' of https://github.com/FluxML/Flux.jl into clip 2020-04-22 01:29:13 +08:00
Yao Lu
c4f5e83697 resolve conflict 2020-04-22 01:24:13 +08:00
Yao Lu
1dfec7f38b add test 2020-04-22 01:22:34 +08:00
Yao Lu
def19b058e simplify docstrings 2020-04-21 10:56:38 +08:00
Yao Lu
cc1dcd5590 rm requires 2020-04-20 20:02:29 +08:00
Yao Lu
68b84bba36 add LinearAlgebra 2020-04-20 19:54:44 +08:00
Yao Lu
ba0fca5a19 remove onehot 2020-04-20 19:45:15 +08:00
Yao Lu
b33c4b49be add ClipValue and ClipNorm 2020-04-20 19:41:10 +08:00
Yao Lu
427c55af92 speedup matmul of CuMatrix and OneHotMatrix 2020-04-20 19:11:57 +08:00
HenriDeh
ac94754281
Update stateless.jl 2020-04-18 13:23:11 +02:00
bors[bot]
cdada06472
Merge #1131
1131: Update glorot_normal doc r=dhairyagandhi96 a=AdarshKumar712

Just a minute correction in glorot_normal function doc.

Co-authored-by: Adarsh Kumar <45385384+AdarshKumar712@users.noreply.github.com>
2020-04-18 00:58:49 +00:00
Adarsh Kumar
d53deb9132
Update glorot_normal doc 2020-04-18 03:19:32 +05:30
HenriDeh
1f2643c95c
Add dropgrad in huber_loss
Workaround for issue #1128
2020-04-17 13:34:04 +02:00
bors[bot]
d49d121a65
Merge #1127
1127: Removed deprecated SGD exports r=dhairyagandhi96 a=bhvieira

Closes #1121 

Co-authored-by: Bruno Hebling Vieira <bruno.hebling.vieira@usp.br>
2020-04-16 13:28:00 +00:00
Bruno Hebling Vieira
2c9881bca6 Merge branch 'master' into removeSGD 2020-04-16 09:56:38 -03:00
Bruno Hebling Vieira
db99e41959 Removed SGD exports 2020-04-16 09:50:41 -03:00
Dhairya Gandhi
d8e44fcc1c correct broadcasting for addition 2020-03-04 18:22:45 +05:30
Dhairya Gandhi
7e308e77fd rm unneccesary fns 2020-03-04 17:57:16 +05:30
Dhairya Gandhi
20e78e274e docs fix 2020-02-26 22:41:45 +05:30
Dhairya Gandhi
cf82393ae8 type signatures 2020-02-26 22:36:25 +05:30
Dhairya Gandhi
cd931793ef more docs and constructors 2020-02-26 22:29:14 +05:30
Dhairya Gandhi
58211e31bd docs improve 2020-02-26 22:22:11 +05:30
Dhairya Gandhi
f889d0c4d4 add kwarg constructors 2020-02-26 22:19:17 +05:30
Dhairya Gandhi
26631e1361 test_broken AlphaDropout 2020-02-16 21:22:37 +05:30
Dhairya Gandhi
bc20103ea6 no-op copy 2020-01-31 13:23:33 +05:30
Dhairya Gandhi
b9fbee1ff0 ::typeof(op) -> op 2020-01-31 12:24:36 +05:30
Dhairya Gandhi
29ab410794 test gradients are allocated on the gpu 2020-01-17 15:52:26 +05:30
Dhairya Gandhi
b1e68813a8 cpu -> test_throws 2019-12-20 23:02:44 +05:30
Dhairya Gandhi
efa2cbfd0e checkin Manifest#master 2019-12-11 14:13:41 +05:30
Dhairya Gandhi
a72ca2b05d fix args 2019-12-09 23:18:01 +05:30
Dhairya Gandhi
894c075b6d rm Zeros setindex 2019-12-09 21:40:58 +05:30
Dhairya Gandhi
f39e184814 rm Zeros warning 2019-12-09 21:07:30 +05:30
Dhairya Gandhi
9b6155c77d
Merge branch 'master' into dg/gradtests 2019-12-05 18:17:47 +05:30
Dhairya Gandhi
76dc8ea9d4 formatting fixes 2019-12-05 18:14:04 +05:30
Dhairya Gandhi
717ad9328d add some grad tests on GPU 2019-12-05 18:12:23 +05:30
DrChainsaw
755536bf5e Merge remote-tracking branch 'upstream/master' into samepad 2019-12-04 23:45:03 +01:00
Dhairya Gandhi
ec872bb579 test that bias has no grads with Zeros 2019-11-27 19:45:04 +05:30
Dhairya Gandhi
245563077b cleaner API 2019-11-27 19:40:58 +05:30
Dhairya Gandhi
eb41715d26 define manual rules 2019-11-19 13:30:33 +05:30
Dhairya Gandhi
e89b8eba77 fixes 2019-11-13 01:12:26 +05:30
DrChainsaw
453ecd1f24 Merge remote-tracking branch 'upstream/master' into samepad 2019-11-08 18:49:47 +01:00
Dhairya Gandhi
a4a987f0b0 hook into bcasting 2019-11-07 16:53:41 +05:30
Dhairya Gandhi
7c90fb469d use array to define Zeros 2019-10-23 20:02:15 +05:30
Dhairya Gandhi
4a183aeaf0 make Zeros a dimensionlesss number 2019-10-22 16:11:27 +05:30
DrChainsaw
530d4edb67 Fix for reading comprehension error (dim is not always 2 * (N-2)) Fix for ambiguous method sig 2019-10-20 16:03:01 +02:00
DrChainsaw
411ce5dbd8 Add SamePad for pooling layers 2019-10-20 13:43:39 +02:00
DrChainsaw
fc123d6279 Add SamePad for conv layers 2019-10-20 13:43:23 +02:00
Dhairya Gandhi
c85bad4427 replace weight with filter 2019-10-08 20:26:09 +05:30
Dhairya Gandhi
49ea43e711 ZeroType => Zeros 2019-10-08 20:02:04 +05:30
Dhairya Gandhi
95c5845e99 document bias switch 2019-10-08 17:54:01 +05:30
Dhairya Gandhi
b596faaffa tests bias switch 2019-10-08 17:18:39 +05:30
Dhairya Gandhi
040697fb2b add bias and weight kwarg 2019-10-08 17:18:19 +05:30
Dhairya Gandhi
f3904b4e04 add ZeroType back 2019-10-08 17:17:36 +05:30
Dhairya Gandhi
a1e826b888 fixes 2019-10-06 05:10:56 +05:30
Dhairya Gandhi
214f71f492 add N 2019-10-06 04:55:33 +05:30
Dhairya Gandhi
2ae3ad3b31 doc fixes 2019-10-06 04:46:13 +05:30
Dhairya Gandhi
d00f833c17 rm ZeroType 2019-10-06 04:44:50 +05:30
Dhairya Gandhi
e97d61f257 fixes 2019-10-06 04:42:26 +05:30
Dhairya Gandhi
48a305bd21 ditto remaining layers 2019-10-06 04:41:06 +05:30
Dhairya Gandhi
55ef7c1aba add weight and bias kwargs 2019-10-06 04:25:23 +05:30
Dhairya Gandhi
1fe321781b add to docs 2019-10-01 21:29:18 +05:30
Dhairya Gandhi
dced8c04e5 use ZeroType 2019-10-01 21:25:07 +05:30
Dhairya Gandhi
a801fcb9e7 docstrings 2019-09-27 12:07:55 +05:30
Dhairya Gandhi
9f2ac8fdef ditto remaining conv layers 2019-09-27 12:04:27 +05:30
Dhairya Gandhi
5ea6a33f44 make bias optional 2019-09-27 11:48:12 +05:30
34 changed files with 913 additions and 260 deletions

12
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,12 @@
[Please delete this text and describe your change here.
For bugfixes, please detail the bug and include a test case which your patch fixes.
If you are adding a new feature, please clearly describe the design, its rationale, the possible alternatives considered.
It is easiest to merge new features when there is clear precedent in other systems; we need to know we're taking
the right direction since it can be hard to change later.]
### PR Checklist
- [ ] Tests are added
- [ ] Entry in NEWS.md
- [ ] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).

View File

@ -6,16 +6,8 @@ on:
jobs:
CompatHelper:
runs-on: ${{ matrix.os }}
strategy:
matrix:
julia-version: [1.3]
julia-arch: [x64]
os: [ubuntu-latest]
runs-on: ubuntu-latest
steps:
- uses: julia-actions/setup-julia@latest
with:
version: ${{ matrix.julia-version }}
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()

View File

@ -16,7 +16,7 @@ notifications:
jobs:
include:
- stage: "Documentation"
julia: 1
julia: 1.3
os: linux
script:
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd()));

View File

@ -8,35 +8,35 @@ version = "0.5.0"
[[AbstractTrees]]
deps = ["Markdown"]
git-tree-sha1 = "86d092c2599f1f7bb01668bf8eb3412f98d61e47"
git-tree-sha1 = "33e450545eaf7699da1a6e755f9ea65f14077a45"
uuid = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
version = "0.3.2"
version = "0.3.3"
[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "c88cfc7f9c1f9f8633cddf0b56e86302b70f64c5"
git-tree-sha1 = "fd04049c7dd78cfef0b06cdc1f0f181467655712"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "1.0.1"
version = "1.1.0"
[[ArrayLayouts]]
deps = ["FillArrays", "LinearAlgebra"]
git-tree-sha1 = "41956a49a8a4fefa1bf6664bca4a3035aba4c3a0"
git-tree-sha1 = "a504dca2ac7eda8761c8f7c1ed52427a1be75a3c"
uuid = "4c555306-a7a7-4459-81d9-ec55ddd5c99a"
version = "0.2.3"
version = "0.2.6"
[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
[[BinaryProvider]]
deps = ["Libdl", "SHA"]
git-tree-sha1 = "5b08ed6036d9d3f0ee6369410b830f8873d4024c"
deps = ["Libdl", "Logging", "SHA"]
git-tree-sha1 = "ecdec412a9abc8db54c0efc5548c64dfce072058"
uuid = "b99e7846-7c00-51b0-8f62-c81ae34c0232"
version = "0.5.8"
version = "0.5.10"
[[CEnum]]
git-tree-sha1 = "62847acab40e6855a9b5905ccb99c2b5cf6b3ebb"
git-tree-sha1 = "1b77a77c3b28e0b3f413f7567c9bb8dd9bdccd14"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.2.0"
version = "0.3.0"
[[CUDAapi]]
deps = ["Libdl", "Logging"]
@ -46,21 +46,21 @@ version = "4.0.0"
[[CUDAdrv]]
deps = ["CEnum", "CUDAapi", "Printf"]
git-tree-sha1 = "e650cbaee92b60433313157926b1e80d0c3a0e2e"
git-tree-sha1 = "f56bbf18c86bcff7a961a32a4947a5abb2963a29"
uuid = "c5f51814-7f29-56b8-a69c-e4d8f6be1fde"
version = "6.2.2"
version = "6.3.0"
[[CUDAnative]]
deps = ["Adapt", "BinaryProvider", "CEnum", "CUDAapi", "CUDAdrv", "Cthulhu", "DataStructures", "InteractiveUtils", "LLVM", "Libdl", "MacroTools", "Pkg", "Printf", "TimerOutputs"]
git-tree-sha1 = "d1fc99635d0002c8a819b78cb1f441eb44310725"
deps = ["Adapt", "BinaryProvider", "CEnum", "CUDAapi", "CUDAdrv", "ExprTools", "GPUCompiler", "LLVM", "Libdl", "Pkg", "Printf"]
git-tree-sha1 = "ac86db2b05fdfec96b011e25a504ffe7476e8a68"
uuid = "be33ccc6-a3ff-5ff2-a52e-74243cff1e17"
version = "3.0.2"
version = "3.1.0"
[[CodeTracking]]
deps = ["InteractiveUtils", "UUIDs"]
git-tree-sha1 = "0becdab7e6fbbcb7b88d8de5b72e5bb2f28239f3"
git-tree-sha1 = "cab4da992adc0a64f63fa30d2db2fd8bec40cab4"
uuid = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
version = "0.5.8"
version = "0.5.11"
[[CodecZlib]]
deps = ["TranscodingStreams", "Zlib_jll"]
@ -70,15 +70,15 @@ version = "0.7.0"
[[ColorTypes]]
deps = ["FixedPointNumbers", "Random"]
git-tree-sha1 = "c4c1cca28748906265ed62c788d6fe6f0134d264"
git-tree-sha1 = "c73d9cfc2a9d8433dc77f5bff4bddf46b1d78c20"
uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
version = "0.10.0"
version = "0.10.3"
[[Colors]]
deps = ["ColorTypes", "FixedPointNumbers", "InteractiveUtils", "Reexport"]
git-tree-sha1 = "2fdeb981ebcf52cd800ddb6a0aa5eac34153552d"
git-tree-sha1 = "1e9bba7984e78aa8cdeea7f9f7cc984ad4e4b1c7"
uuid = "5ae59095-9a9b-59fe-a467-6f913c188581"
version = "0.12.0"
version = "0.12.2"
[[CommonSubexpressions]]
deps = ["Test"]
@ -93,27 +93,27 @@ uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "0.3.3+0"
[[Cthulhu]]
deps = ["CodeTracking", "InteractiveUtils", "REPL", "Unicode"]
git-tree-sha1 = "484790098c85c26f8e59051f8ff1a0745c034a7d"
deps = ["CodeTracking", "InteractiveUtils", "REPL", "UUIDs", "Unicode"]
git-tree-sha1 = "f3643e78353199d3097821e806348bd83f364155"
uuid = "f68482b8-f384-11e8-15f7-abe071a5a75f"
version = "1.0.1"
version = "1.1.1"
[[CuArrays]]
deps = ["AbstractFFTs", "Adapt", "CEnum", "CUDAapi", "CUDAdrv", "CUDAnative", "DataStructures", "GPUArrays", "Libdl", "LinearAlgebra", "MacroTools", "NNlib", "Pkg", "Printf", "Random", "Reexport", "Requires", "SparseArrays", "Statistics", "TimerOutputs"]
git-tree-sha1 = "e8c55b38dcca955f5aed8ec4479cdc95810db1e1"
git-tree-sha1 = "1582b74d2322df7dd94549d4ac9d095e0f20e884"
uuid = "3a865a2d-5b23-5a0f-bc46-62713ec82fae"
version = "2.0.1"
version = "2.2.1"
[[DataAPI]]
git-tree-sha1 = "674b67f344687a88310213ddfa8a2b3c76cc4252"
git-tree-sha1 = "176e23402d80e7743fc26c19c681bfb11246af32"
uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
version = "1.1.0"
version = "1.3.0"
[[DataStructures]]
deps = ["InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "73eb18320fe3ba58790c8b8f6f89420f0a622773"
git-tree-sha1 = "af6d9c86e191c917c2276fbede1137e8ea20157f"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.17.11"
version = "0.17.17"
[[Dates]]
deps = ["Printf"]
@ -139,11 +139,16 @@ version = "1.0.1"
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
[[ExprTools]]
git-tree-sha1 = "6f0517056812fd6aa3af23d4b70d5325a2ae4e95"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.1"
[[FillArrays]]
deps = ["LinearAlgebra", "Random", "SparseArrays"]
git-tree-sha1 = "51cc2f9bc4eb9c6c0e81ec2f779d1085583cc956"
git-tree-sha1 = "44f561e293987ffc84272cd3d2b14b0b93123d63"
uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
version = "0.8.7"
version = "0.8.10"
[[FixedPointNumbers]]
git-tree-sha1 = "3ba9ea634d4c8b289d590403b4a06f8e227a6238"
@ -162,17 +167,27 @@ git-tree-sha1 = "f40adc6422f548176bb4351ebd29e4abf773040a"
uuid = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
version = "0.1.0"
[[Future]]
deps = ["Random"]
uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
[[GPUArrays]]
deps = ["AbstractFFTs", "Adapt", "LinearAlgebra", "Printf", "Random", "Serialization"]
git-tree-sha1 = "d586762b08dcda13228df8967119b9cb6f22ade5"
git-tree-sha1 = "d887693eb1bd5e1fd573262a978745481895ec7d"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "3.1.0"
version = "3.4.1"
[[GPUCompiler]]
deps = ["Cthulhu", "DataStructures", "InteractiveUtils", "LLVM", "Libdl", "TimerOutputs"]
git-tree-sha1 = "5275aa268ecd09640b32560e1eae90c78816e4d1"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.2.0"
[[IRTools]]
deps = ["InteractiveUtils", "MacroTools", "Test"]
git-tree-sha1 = "1a4355e4b5b50be2311ebb644f34f3306dbd0410"
git-tree-sha1 = "90ee39f9beaaa186e4968417ea2b8ed5673c91c0"
uuid = "7869d1d1-7146-5819-86e3-90919afe41df"
version = "0.3.1"
version = "0.3.3"
[[InteractiveUtils]]
deps = ["Markdown"]
@ -180,15 +195,15 @@ uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
[[Juno]]
deps = ["Base64", "Logging", "Media", "Profile"]
git-tree-sha1 = "e1ba2a612645b3e07c773c3a208f215745081fe6"
git-tree-sha1 = "a686b0cf235fa3e491b79b4783c2d2382292b436"
uuid = "e5e0dc1b-0480-54bc-9374-aad01c23163d"
version = "0.8.1"
version = "0.8.2"
[[LLVM]]
deps = ["CEnum", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "b6b86801ae2f2682e0a4889315dc76b68db2de71"
git-tree-sha1 = "dd3f584c3dbefe39b2a8fbafa1a3b77e31e21255"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "1.3.4"
version = "1.5.1"
[[LibGit2]]
deps = ["Printf"]
@ -247,10 +262,9 @@ uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
version = "0.5.3+3"
[[OrderedCollections]]
deps = ["Random", "Serialization", "Test"]
git-tree-sha1 = "c4c13474d23c60d20a67b217f1d7f22a40edf8f1"
git-tree-sha1 = "12ce190210d278e12644bcadf5b21cbdcf225cd3"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.1.0"
version = "1.2.0"
[[Pkg]]
deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
@ -305,15 +319,15 @@ uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
[[SpecialFunctions]]
deps = ["OpenSpecFun_jll"]
git-tree-sha1 = "e19b98acb182567bcb7b75bb5d9eedf3a3b5ec6c"
git-tree-sha1 = "d8d8b8a9f4119829410ecd706da4cc8594a1e020"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "0.10.0"
version = "0.10.3"
[[StaticArrays]]
deps = ["LinearAlgebra", "Random", "Statistics"]
git-tree-sha1 = "5a3bcb6233adabde68ebc97be66e95dcb787424c"
git-tree-sha1 = "5c06c0aeb81bef54aed4b3f446847905eb6cbda0"
uuid = "90137ffa-7385-5640-81b9-e52037218182"
version = "0.12.1"
version = "0.12.3"
[[Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
@ -331,9 +345,9 @@ uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[[TimerOutputs]]
deps = ["Printf"]
git-tree-sha1 = "311765af81bbb48d7bad01fb016d9c328c6ede03"
git-tree-sha1 = "f458ca23ff80e46a630922c555d838303e4b9603"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.3"
version = "0.5.6"
[[TranscodingStreams]]
deps = ["Random", "Test"]
@ -350,21 +364,21 @@ uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
[[ZipFile]]
deps = ["Libdl", "Printf", "Zlib_jll"]
git-tree-sha1 = "8748302cfdec02c4ae9c97b112cf10003f7f767f"
git-tree-sha1 = "254975fef2fc526583bb9b7c9420fe66ffe09f2f"
uuid = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
version = "0.9.1"
version = "0.9.2"
[[Zlib_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "2f6c3e15e20e036ee0a0965879b31442b7ec50fa"
git-tree-sha1 = "a2e0d558f6031002e380a90613b199e37a8565bf"
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
version = "1.2.11+9"
version = "1.2.11+10"
[[Zygote]]
deps = ["AbstractFFTs", "ArrayLayouts", "DiffRules", "FillArrays", "ForwardDiff", "IRTools", "InteractiveUtils", "LinearAlgebra", "MacroTools", "NNlib", "NaNMath", "Random", "Requires", "SpecialFunctions", "Statistics", "ZygoteRules"]
git-tree-sha1 = "1ccbfbe8930376e31752b812daa2532c723dc332"
deps = ["AbstractFFTs", "ArrayLayouts", "DiffRules", "FillArrays", "ForwardDiff", "Future", "IRTools", "InteractiveUtils", "LinearAlgebra", "MacroTools", "NNlib", "NaNMath", "Random", "Requires", "SpecialFunctions", "Statistics", "ZygoteRules"]
git-tree-sha1 = "707ceea58e2bd0ff3077ab13a92f8355181d3ee4"
uuid = "e88e6eb3-aa80-5325-afca-941959d7151f"
version = "0.4.13"
version = "0.4.20"
[[ZygoteRules]]
deps = ["MacroTools"]

16
NEWS.md
View File

@ -1,3 +1,19 @@
# v0.11
* Change to `DataLoader`'s constructor [https://github.com/FluxML/Flux.jl/pull/1152]
* Use `DataLoader` with `NamedTuple`s, so that tensors can be accessed by name [https://github.com/FluxML/Flux.jl/pull/1221].
* Error if Dense layers weights and biases are not arrays [https://github.com/FluxML/Flux.jl/pull/1218].
# v0.10.5
* Add option for [same padding](https://github.com/FluxML/Flux.jl/pull/901) to conv and pooling layers by setting `pad=SamePad()`.
* Added option to set `bias` to [Flux.Zeros](https://github.com/FluxML/Flux.jl/pull/873) to eliminating `bias` from being trained.
* Added `GlobalMaxPool` and `GlobalMeanPool` [layers](https://github.com/FluxML/Flux.jl/pull/950) for performing global pooling operations.
* Added `ClipValue` and `ClipNorm` in this [pr](https://github.com/FluxML/Flux.jl/pull/1133) to `Flux.Optimise` to provide a cleaner API for gradient clipping.
* Added new kwarg-only [constructors](https://github.com/FluxML/Flux.jl/pull/873) for the various convolutional layers.
* Documented the convolutional layer constructors accepting `weight` and `bias` keyword arguments to supply custom arrays for those fields.
* Testing suite improvements now test for gradients of all layers along with GPU support.
* Functors have now moved to [Functors.jl](https://github.com/FluxML/Flux.jl/pull/1174) to allow for their use outside of Flux.
* Added [helper functions](https://github.com/FluxML/Flux.jl/pull/873) `Flux.convfilter` and `Flux.depthwiseconvfilter` to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations.
# v0.10.0
* The default AD engine has switched from [Tracker to Zygote.jl](https://github.com/FluxML/Flux.jl/pull/669)
- The dependency on Tracker.jl has been removed.

View File

@ -1,6 +1,6 @@
name = "Flux"
uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c"
version = "0.10.4"
version = "0.11.0-DEV"
[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
@ -11,6 +11,7 @@ CuArrays = "3a865a2d-5b23-5a0f-bc46-62713ec82fae"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
Juno = "e5e0dc1b-0480-54bc-9374-aad01c23163d"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
@ -26,10 +27,11 @@ Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
[compat]
AbstractTrees = "0.2, 0.3"
Adapt = "1"
Adapt = "1, 2.0"
CodecZlib = "0.5, 0.6, 0.7"
Colors = "0.8, 0.9, 0.10, 0.11, 0.12"
CuArrays = "2"
Functors = "0.1"
Juno = "0.5, 0.6, 0.7, 0.8"
MacroTools = "0.3, 0.4, 0.5"
NNlib = "0.6"

View File

@ -7,15 +7,15 @@ julia> using Flux: onehot, onecold
julia> onehot(:b, [:a, :b, :c])
3-element Flux.OneHotVector:
false
true
false
0
1
0
julia> onehot(:c, [:a, :b, :c])
3-element Flux.OneHotVector:
false
false
true
0
0
1
```
The inverse is `onecold` (which can take a general probability distribution, as well as just booleans).

View File

@ -19,7 +19,7 @@ Affine{Array{Float64,2},Array{Float64,1}}([0.66722 0.774872 0.249809; 0.843321 0
julia> Flux.params(a) # default behavior
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955]])
julia> Flux.trainable(a::Affine) = (a.W, a.b,)
julia> Flux.trainable(a::Affine) = (a.W,)
julia> Flux.params(a)
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297]])

View File

@ -32,8 +32,6 @@ julia> gradient(f, [2, 1], [2, 0])
But machine learning models can have *hundreds* of parameters! To handle this, Flux lets you work with collections of parameters, via `params`. You can get the gradient of all parameters used in a program without explicitly passing them in.
```jldoctest basics
julia> using Flux
julia> x = [2, 1];
julia> y = [2, 0];

View File

@ -20,7 +20,11 @@ GlobalMeanPool
DepthwiseConv
ConvTranspose
CrossCor
SamePad
flatten
Flux.Zeros
Flux.convfilter
Flux.depthwiseconvfilter
```
## Recurrent Layers

View File

@ -39,7 +39,7 @@ E.g. the following will have run into the same problem as above:
leaky_tanh(x) = 0.01*x + tanh(x)
```
While one could change the activation function (e.g. to use `0.01f0x`), the idiomatic (and safe way) to avoid type casts whenever inputs changes is to use `oftype`:
While one could change the activation function (e.g. to use `0.01f0*x`), the idiomatic (and safe way) to avoid type casts whenever inputs changes is to use `oftype`:
```
leaky_tanh(x) = oftype(x/1, 0.01)*x + tanh(x)
```

View File

@ -80,7 +80,7 @@ Momentum(eta::Real, rho::Real) = Momentum(eta, rho, IdDict())
The `Momentum` type will act as our optimiser in this case. Notice that we have added all the parameters as fields, along with the velocity which we will use as our state dictionary. Each parameter in our models will get an entry in there. We can now define the rule applied when this optimiser is invoked.
```julia
function apply!(o::Momentum, x, Δ)
function Flux.Optimise.apply!(o::Momentum, x, Δ)
η, ρ = o.eta, o.rho
v = get!(o.velocity, x, zero(x))::typeof(x)
@. v = ρ * v - η * Δ
@ -140,3 +140,16 @@ ExpDecay
InvDecay
WeightDecay
```
## Gradient Clipping
Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is
```julia
opt = Optimiser(ClipValue(1e-3), ADAM(1e-3))
```
```@docs
ClipValue
ClipNorm
```

View File

@ -142,7 +142,7 @@ function my_custom_train!(loss, ps, data, opt)
for d in data
gs = gradient(ps) do
training_loss = loss(d...)
# Insert what ever code you want here that needs Training loss, e.g. logging
# Insert whatever code you want here that needs Training loss, e.g. logging
return training_loss
end
# insert what ever code you want here that needs gradient

View File

@ -3,14 +3,15 @@ module Flux
# Zero Flux Given
using Base: tail
using Zygote, MacroTools, Juno, Reexport, Statistics, Random
using Statistics, Random, LinearAlgebra
using Zygote, MacroTools, Juno, Reexport
using MacroTools: @forward
@reexport using NNlib
using Zygote: Params, @adjoint, gradient, pullback, @nograd
export gradient
export Chain, Dense, Maxout, RNN, LSTM, GRU, Conv, CrossCor, ConvTranspose,
export Chain, Dense, Maxout, RNN, LSTM, GRU, SamePad, Conv, CrossCor, ConvTranspose,
GlobalMaxPool, GlobalMeanPool, MaxPool, MeanPool, flatten,
DepthwiseConv, Dropout, AlphaDropout, LayerNorm, BatchNorm, InstanceNorm, GroupNorm,
SkipConnection, params, fmap, cpu, gpu, f32, f64, testmode!, trainmode!
@ -18,15 +19,17 @@ export Chain, Dense, Maxout, RNN, LSTM, GRU, Conv, CrossCor, ConvTranspose,
include("optimise/Optimise.jl")
using .Optimise
using .Optimise: @epochs
export SGD, Descent, ADAM, Momentum, Nesterov, RMSProp,
export Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM,
ADAMW, RADAM, InvDecay, ExpDecay, WeightDecay
ADAMW, RADAM, InvDecay, ExpDecay, WeightDecay,
ClipValue, ClipNorm
using CuArrays
const use_cuda = Ref(false)
include("utils.jl")
include("zeros.jl")
include("onehot.jl")
include("functor.jl")

View File

@ -51,4 +51,6 @@ export Iris
include("housing.jl")
export Housing
@deprecate DataLoader(x...; kws...) DataLoader(x; kws...)
end

View File

@ -1,7 +1,7 @@
# Adapted from Knet's src/data.jl (author: Deniz Yuret)
struct DataLoader
data
struct DataLoader{D}
data::D
batchsize::Int
nobs::Int
partial::Bool
@ -11,21 +11,20 @@ struct DataLoader
end
"""
DataLoader(data...; batchsize=1, shuffle=false, partial=true)
DataLoader(data; batchsize=1, shuffle=false, partial=true)
An object that iterates over mini-batches of `data`, each mini-batch containing `batchsize` observations
(except possibly the last one).
Takes as input one or more data tensors, e.g. X in unsupervised learning, X and Y in
supervised learning. The last dimension in each tensor is considered to be the observation
dimension.
Takes as input a single data tensor, or a tuple (or a named tuple) of tensors.
The last dimension in each tensor is considered to be the observation dimension.
If `shuffle=true`, shuffles the observations each time iterations are re-started.
If `partial=false`, drops the last mini-batch if it is smaller than the batchsize.
The original data is preserved as a tuple in the `data` field of the DataLoader.
The original data is preserved in the `data` field of the DataLoader.
Example usage:
Usage example:
Xtrain = rand(10, 100)
train_loader = DataLoader(Xtrain, batchsize=2)
@ -37,9 +36,16 @@ Example usage:
train_loader.data # original dataset
# similar, but yielding tuples
train_loader = DataLoader((Xtrain,), batchsize=2)
for (x,) in train_loader
@assert size(x) == (10, 2)
...
end
Xtrain = rand(10, 100)
Ytrain = rand(100)
train_loader = DataLoader(Xtrain, Ytrain, batchsize=2, shuffle=true)
train_loader = DataLoader((Xtrain, Ytrain), batchsize=2, shuffle=true)
for epoch in 1:100
for (x, y) in train_loader
@assert size(x) == (10, 2)
@ -51,26 +57,26 @@ Example usage:
# train for 10 epochs
using IterTools: ncycle
Flux.train!(loss, ps, ncycle(train_loader, 10), opt)
# can use NamedTuple to name tensors
train_loader = DataLoader((images=Xtrain, labels=Ytrain), batchsize=2, shuffle=true)
for datum in train_loader
@assert size(datum.images) == (10, 2)
@assert size(datum.labels) == (2,)
end
"""
function DataLoader(data...; batchsize=1, shuffle=false, partial=true)
length(data) > 0 || throw(ArgumentError("Need at least one data input"))
function DataLoader(data; batchsize=1, shuffle=false, partial=true)
batchsize > 0 || throw(ArgumentError("Need positive batchsize"))
nx = size(data[1])[end]
for i=2:length(data)
nx != size(data[i])[end] && throw(DimensionMismatch("All data should contain same number of observations"))
n = _nobs(data)
if n < batchsize
@warn "Number of observations less than batchsize, decreasing the batchsize to $n"
batchsize = n
end
if nx < batchsize
@warn "Number of data points less than batchsize, decreasing the batchsize to $nx"
batchsize = nx
end
imax = partial ? nx : nx - batchsize + 1
ids = 1:min(nx, batchsize)
DataLoader(data, batchsize, nx, partial, imax, [1:nx;], shuffle)
imax = partial ? n : n - batchsize + 1
DataLoader(data, batchsize, n, partial, imax, [1:n;], shuffle)
end
getdata(x::AbstractArray, ids) = x[(Base.Colon() for _=1:ndims(x)-1)..., ids]
@propagate_inbounds function Base.iterate(d::DataLoader, i=0) # returns data in d.indices[i+1:i+batchsize]
i >= d.imax && return nothing
if d.shuffle && i == 0
@ -78,11 +84,7 @@ getdata(x::AbstractArray, ids) = x[(Base.Colon() for _=1:ndims(x)-1)..., ids]
end
nexti = min(i + d.batchsize, d.nobs)
ids = d.indices[i+1:nexti]
if length(d.data) == 1
batch = getdata(d.data[1], ids)
else
batch = ((getdata(x, ids) for x in d.data)...,)
end
batch = _getobs(d.data, ids)
return (batch, nexti)
end
@ -90,3 +92,19 @@ function Base.length(d::DataLoader)
n = d.nobs / d.batchsize
d.partial ? ceil(Int,n) : floor(Int,n)
end
_nobs(data::AbstractArray) = size(data)[end]
function _nobs(data::Union{Tuple, NamedTuple})
length(data) > 0 || throw(ArgumentError("Need at least one data input"))
n = _nobs(data[1])
if !all(x -> _nobs(x) == n, Base.tail(data))
throw(DimensionMismatch("All data should contain same number of observations"))
end
return n
end
_getobs(data::AbstractArray, i) = data[ntuple(i -> Colon(), Val(ndims(data) - 1))..., i]
_getobs(data::Union{Tuple, NamedTuple}, i) = map(Base.Fix2(_getobs, i), data)
Base.eltype(::DataLoader{D}) where D = D

View File

@ -24,7 +24,7 @@ testmode!(m, mode = true) = m
trainmode!(m, mode = true)
Set a layer of model's train mode (see below).
Symmetric to [`testmode!`](@ref) (i.e. `trainmode!(m, mode) == testmode!(m, !mode)).
Symmetric to [`testmode!`](@ref) (i.e. `trainmode!(m, mode) == testmode!(m, !mode)`).
_Note_: if you manually set a model into train mode, you need to manually place
it into test mode during testing phase.

View File

@ -102,7 +102,7 @@ julia> d(rand(5))
-0.16210233
0.12311903```
"""
struct Dense{F,S,T}
struct Dense{F,S<:AbstractArray,T<:AbstractArray}
W::S
b::T
σ::F

View File

@ -7,26 +7,59 @@ _convtransoutdims(isize, ksize, ssize, dsize, pad) = (isize .- 1).*ssize .+ 1 .+
expand(N, i::Tuple) = i
expand(N, i::Integer) = ntuple(_ -> i, N)
"""
Conv(size, in => out, σ = identity; init = glorot_uniform,
SamePad
Padding for convolutional layers will be calculated so that outputshape == inputshape when stride = 1.
For stride > 1 the output shape depends on the type of convolution layer.
"""
struct SamePad end
calc_padding(pad, k::NTuple{N,T}, dilation, stride) where {T,N}= expand(Val(2*N), pad)
function calc_padding(::SamePad, k::NTuple{N,T}, dilation, stride) where {N,T}
#Ref: "A guide to convolution arithmetic for deep learning" https://arxiv.org/pdf/1603.07285
# Effective kernel size, including dilation
k_eff = @. k + (k - 1) * (dilation - 1)
# How much total padding needs to be applied?
pad_amt = @. k_eff - 1
# In case amount of padding is odd we need to apply different amounts to each side.
return Tuple(mapfoldl(i -> [ceil(Int, i/2), floor(Int, i/2)], vcat, pad_amt))
end
"""
Conv(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard convolutional layer. `size` should be a tuple like `(2, 2)`.
filter = (2,2)
in = 1
out = 16
Conv((2, 2), 1=>16, relu)
Standard convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
# Examples
Apply a `Conv` layer to a 1-channel input using a 2×2 window size, giving us a
Apply a `Conv` layer to a 1-channel input using a 2×2 window filter size, giving us a
16-channel output. Output is activated with ReLU.
```julia
size = (2,2)
filter = (2,2)
in = 1
out = 16
Conv(size, in => out, relu)
Conv(filter, in => out, relu)
```
"""
struct Conv{N,M,F,A,V}
@ -38,25 +71,68 @@ struct Conv{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function Conv(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
"""
Conv(weight::AbstractArray, bias::AbstractArray)
Conv(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the convolutional layer with user defined weight and bias arrays.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
There is also a keyword-only constuctor available for all convoultional
layers.
```julia
weight = rand(Float32, 3, 3, 5)
bias = zeros(Float32, 5)
Conv(weight = weight,
bias = bias,
σ = sigmoid)
```
"""
function Conv(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return Conv(σ, w, b, stride, pad, dilation)
end
Conv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N =
Conv(init(k..., ch...), zeros(ch[2]), σ,
stride = stride, pad = pad, dilation = dilation)
function Conv(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
Conv(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
"""
convfilter(filter::Tuple, in=>out)
Constructs a standard convolutional weight matrix with given `filter` and
channels from `in` to `out`.
Accepts the keyword `init` (default: `glorot_uniform`) to control the sampling
distribution.
See also: [`depthwiseconvfilter`](@ref)
"""
convfilter(filter::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer};
init = glorot_uniform) where N = init(filter..., ch...)
function Conv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, ch, init = init), bias = zeros(ch[2])) where N
Conv(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor Conv
function (c::Conv)(x::AbstractArray)
# TODO: breaks gpu broadcast :(
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
σ, b = c.σ, reshape(c.bias, ntuple(_->1, length(c.stride))..., :, 1)
cdims = DenseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
σ.(conv(x, c.weight, cdims) .+ b)
end
@ -90,15 +166,23 @@ outdims(l::Conv, isize) =
output_size(DenseConvDims(_paddims(isize, size(l.weight)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
ConvTranspose(size, in => out, σ = identity; init = glorot_uniform,
ConvTranspose(filter, in=>out)
ConvTranspose(filter, in=>out, activation)
ConvTranspose(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard convolutional transpose layer. `size` should be a tuple like `(2, 2)`.
Standard convolutional transpose layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == stride * inputsize - stride + 1.
"""
struct ConvTranspose{N,M,F,A,V}
σ::F
@ -109,18 +193,39 @@ struct ConvTranspose{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function ConvTranspose(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
"""
ConvTranspose(weight::AbstractArray, bias::AbstractArray)
ConvTranspose(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the convolutional transpose layer with user defined weight and bias arrays.
forward pass.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function ConvTranspose(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return ConvTranspose(σ, w, b, stride, pad, dilation)
end
ConvTranspose(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N =
ConvTranspose(init(k..., reverse(ch)...), zeros(ch[2]), σ,
function ConvTranspose(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
ConvTranspose(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
function ConvTranspose(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, reverse(ch), init = init), bias = zeros(ch[2])) where N
ConvTranspose(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor ConvTranspose
@ -132,9 +237,9 @@ function conv_transpose_dims(c::ConvTranspose, x::AbstractArray)
batch_size = size(x)[end]
# Create DenseConvDims() that looks like the corresponding conv()
return DenseConvDims((I..., C_in, batch_size), size(c.weight);
stride=c.stride,
padding=c.pad,
dilation=c.dilation,
stride=c.stride,
padding=c.pad,
dilation=c.dilation,
)
end
@ -145,7 +250,7 @@ function (c::ConvTranspose)(x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = conv_transpose_dims(c, x)
return σ.(∇conv_data(x, c.weight, cdims) .+ b)
σ.(∇conv_data(x, c.weight, cdims) .+ b)
end
function Base.show(io::IO, l::ConvTranspose)
@ -164,16 +269,24 @@ end
outdims(l::ConvTranspose{N}, isize) where N = _convtransoutdims(isize[1:2], size(l.weight)[1:N], l.stride, l.dilation, l.pad)
"""
DepthwiseConv(size, in => out, σ = identity; init = glorot_uniform,
DepthwiseConv(filter::Tuple, in=>out)
DepthwiseConv(filter::Tuple, in=>out, activation)
DepthwiseConv(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Depthwise convolutional layer. `size` should be a tuple like `(2, 2)`.
Depthwise convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Note that `out` must be an integer multiple of `in`.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
"""
struct DepthwiseConv{N,M,F,A,V}
σ::F
@ -184,20 +297,54 @@ struct DepthwiseConv{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function DepthwiseConv(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
"""
DepthwiseConv(weight::AbstractArray, bias::AbstractArray)
DepthwiseConv(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the `DepthwiseConv` layer with user defined weight and bias arrays.
forward pass.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function DepthwiseConv(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return DepthwiseConv(σ, w, b, stride, pad, dilation)
end
function DepthwiseConv(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
DepthwiseConv(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
"""
depthwiseconvfilter(filter::Tuple, in=>out)
Constructs a depthwise convolutional weight array defined by `filter` and channels
from `in` to `out`.
Accepts the keyword `init` (default: `glorot_uniform`) to control the sampling
distribution.
See also: [`convfilter`](@ref)
"""
depthwiseconvfilter(filter::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer};
init = glorot_uniform) where N = init(filter..., div(ch[2], ch[1]), ch[1])
function DepthwiseConv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = depthwiseconvfilter(k, ch, init = init), bias = zeros(ch[2])) where N
@assert ch[2] % ch[1] == 0 "Output channels must be integer multiple of input channels"
return DepthwiseConv(
init(k..., div(ch[2], ch[1]), ch[1]),
zeros(ch[2]),
weight,
bias,
σ;
stride = stride,
pad = pad,
@ -230,22 +377,30 @@ outdims(l::DepthwiseConv, isize) =
output_size(DepthwiseConvDims(_paddims(isize, (1, 1, size(l.weight)[end], 1)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
CrossCor(size, in => out, σ = identity; init = glorot_uniform,
CrossCor(filter, in=>out)
CrossCor(filter, in=>out, activation)
CrossCor(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard cross convolutional layer. `size` should be a tuple like `(2, 2)`.
Standard cross convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
# Examples
Apply a `CrossCor` layer to a 1-channel input using a 2×2 window size, giving us a
Apply a `CrossCor` layer to a 1-channel input using a 2×2 window filter size, giving us a
16-channel output. Output is activated with ReLU.
```julia
size = (2,2)
filter = (2,2)
in = 1
out = 16
CrossCor((2, 2), 1=>16, relu)
@ -260,18 +415,39 @@ struct CrossCor{N,M,F,A,V}
dilation::NTuple{N,Int}
end
function CrossCor(w::AbstractArray{T,N}, b::AbstractVector{T}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
"""
CrossCor(weight::AbstractArray, bias::AbstractArray)
CrossCor(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the standard cross convolutional layer with user defined weight and bias
arrays.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function CrossCor(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return CrossCor(σ, w, b, stride, pad, dilation)
end
CrossCor(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1) where N =
CrossCor(init(k..., ch...), zeros(ch[2]), σ,
function CrossCor(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
CrossCor(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
function CrossCor(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, ch, init = init), bias = zeros(ch[2])) where N
CrossCor(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor CrossCor
@ -358,6 +534,9 @@ end
MaxPool(k; pad = 0, stride = k)
Max pooling layer. `k` is the size of the window for each dimension of the input.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
=======
"""
struct MaxPool{N,M}
k::NTuple{N,Int}
@ -367,8 +546,7 @@ end
function MaxPool(k::NTuple{N,Integer}; pad = 0, stride = k) where N
stride = expand(Val(N), stride)
pad = expand(Val(2*N), pad)
pad = calc_padding(pad, k, 1, stride)
return MaxPool(k, pad, stride)
end
@ -387,6 +565,8 @@ outdims(l::MaxPool{N}, isize) where N = output_size(PoolDims(_paddims(isize, (l.
MeanPool(k; pad = 0, stride = k)
Mean pooling layer. `k` is the size of the window for each dimension of the input.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
"""
struct MeanPool{N,M}
k::NTuple{N,Int}
@ -396,7 +576,7 @@ end
function MeanPool(k::NTuple{N,Integer}; pad = 0, stride = k) where N
stride = expand(Val(N), stride)
pad = expand(Val(2*N), pad)
pad = calc_padding(pad, k, 1, stride)
return MeanPool(k, pad, stride)
end

View File

@ -46,23 +46,24 @@ given the prediction `ŷ` and true values `y`.
Huber loss = |
| δ * (| - y| - 0.5 * δ), otherwise
"""
#TODO: remove dropgrad when Zygote can handle this function with CuArrays
function huber_loss(, y; δ=eltype()(1))
abs_error = abs.( .- y)
temp = abs_error .< δ
temp = Zygote.dropgrad(abs_error .< δ)
x = eltype()(0.5)
hub_loss = sum(((abs_error.^2) .* temp) .* x .+ δ*(abs_error .- x*δ) .* (1 .- temp)) * 1 // length(y)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::Nothing)
return -sum(y .* log.()) * 1 // size(y, 2)
return -sum(xlogy.(y, )) * 1 // size(y, 2)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::Number)
return -sum(y .* log.()) .* weight * 1 // size(y, 2)
return -sum(xlogy.(y, )) .* weight * 1 // size(y, 2)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::AbstractVector)
return -sum(y .* log.() .* weight) * 1 // size(y, 2)
return -sum(xlogy.(y, ) .* weight) * 1 // size(y, 2)
end
"""
@ -91,7 +92,7 @@ Return the crossentropy computed after a [`Flux.logsoftmax`](@ref) operation;
calculated as `-sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2)`.
`logitcrossentropy(ŷ, y)` is mathematically equivalent to
[`Flux.crossentropy(softmax(log()), y)`](@ref) but it is more numerically stable.
[`Flux.crossentropy(softmax(ŷ), y)`](@ref) but it is more numerically stable.
See also: [`Flux.crossentropy`](@ref), [`Flux.binarycrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
@ -123,7 +124,7 @@ julia> Flux.binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
0.8616703662235441
```
"""
binarycrossentropy(, y; ϵ=eps()) = -y*log( + ϵ) - (1 - y)*log(1 - + ϵ)
binarycrossentropy(, y; ϵ=eps()) = -xlogy(y, + ϵ) - xlogy(1 - y, 1 - + ϵ)
# Re-definition to fix interaction with CuArrays.
CuArrays.@cufunc binarycrossentropy(, y; ϵ=eps()) = -y*log( + ϵ) - (1 - y)*log(1 - + ϵ)
@ -132,7 +133,7 @@ CuArrays.@cufunc binarycrossentropy(ŷ, y; ϵ=eps(ŷ)) = -y*log(ŷ + ϵ) - (1
logitbinarycrossentropy(ŷ, y)
`logitbinarycrossentropy(ŷ, y)` is mathematically equivalent to
[`Flux.binarycrossentropy(σ(log(ŷ)), y)`](@ref) but it is more numerically stable.
[`Flux.binarycrossentropy(σ(ŷ), y)`](@ref) but it is more numerically stable.
See also: [`Flux.crossentropy`](@ref), [`Flux.logitcrossentropy`](@ref), [`Flux.binarycrossentropy`](@ref)
@ -195,7 +196,7 @@ It is always non-negative and zero only when both the distributions are equal
everywhere.
"""
function kldivergence(, y)
entropy = sum(y .* log.(y)) * 1 //size(y,2)
entropy = sum(xlogx.(y)) * 1 //size(y,2)
cross_entropy = crossentropy(, y)
return entropy + cross_entropy
end
@ -208,7 +209,7 @@ distribution `y`; calculated as `sum(ŷ .- y .* log.(ŷ)) / size(y, 2)`.
[More information.](https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/poisson).
"""
poisson(, y) = sum( .- y .* log.()) * 1 // size(y,2)
poisson(, y) = sum( .- xlogy.(y, )) * 1 // size(y,2)
"""
hinge(, y)
@ -262,3 +263,34 @@ by linearizing all values for each element in the batch.
function flatten(x::AbstractArray)
return reshape(x, :, size(x)[end])
end
"""
xlogx(x)
Return `x * log(x)` for `x ≥ 0`, handling `x = 0` by taking the downward limit.
"""
function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
CuArrays.@cufunc function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
"""
xlogy(x, y)
Return `x * log(y)` for `y > 0` with correct limit at `x = 0`.
"""
function xlogy(x, y)
result = x * log(y)
ifelse(iszero(x), zero(result), result)
end
CuArrays.@cufunc function xlogy(x, y)
result = x * log(y)
ifelse(iszero(x), zero(result), result)
end
@adjoint function broadcasted(::typeof(xlogy), x::Zygote.Numeric, y::Zygote.Numeric)
res = xlogy.(x, y)
res, Δ -> (nothing, Zygote.unbroadcast(x, xlogy.(Δ, y)), Zygote.unbroadcast(y, Δ .* x ./ y))
end

View File

@ -27,7 +27,8 @@ Base.getindex(xs::OneHotMatrix, ::Colon, ::Colon) = OneHotMatrix(xs.height, copy
Base.getindex(xs::OneHotMatrix, i::Integer, ::Colon) = map(x -> x[i], xs.data)
A::AbstractMatrix * B::OneHotMatrix = A[:, map(x->x.ix, B.data)]
# remove workaround when https://github.com/JuliaGPU/CuArrays.jl/issues/676 is fixed
A::AbstractMatrix * B::OneHotMatrix = A[:, cpu(map(x->x.ix, B.data))]
Base.hcat(x::OneHotVector, xs::OneHotVector...) = OneHotMatrix(length(x), [x, xs...])
@ -48,7 +49,7 @@ cudaconvert(x::OneHotMatrix{<:CuArray}) = OneHotMatrix(x.height, cudaconvert(x.d
Create a `OneHotVector` with its `l`-th element `true` based on the
possible set of `labels`.
If `unk` is given, return `onehot(unk, labels)` if the input label `l` is not found
in `labels`; otherwise it will error.
in `labels`; otherwise, it will raise an error.
# Examples
```jldoctest

View File

@ -1,9 +1,12 @@
module Optimise
using LinearAlgebra
export train!, update!,
SGD, Descent, ADAM, Momentum, Nesterov, RMSProp,
Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, ADAMW,RADAM,
InvDecay, ExpDecay, WeightDecay, stop, Optimiser
InvDecay, ExpDecay, WeightDecay, stop, Optimiser,
ClipValue, ClipNorm
include("optimisers.jl")
include("train.jl")

View File

@ -509,7 +509,7 @@ function apply!(o::ExpDecay, x, Δ)
η, s, decay = o.eta, o.step, o.decay
n = o.current[x] = get(o.current, x, 0) + 1
if o.current[x]%s == 0 && count(x -> x%s == 0, values(o.current)) == 1
η = max(η * decay^(s / n), o.clip)
η = max(η * decay, o.clip)
o.eta = η
end
@. Δ *= η
@ -533,3 +533,31 @@ function apply!(o::WeightDecay, x, Δ)
wd = o.wd
@. Δ += wd * x
end
"""
ClipValue(thresh)
Clip gradients when their absolute value exceeds `thresh`.
"""
mutable struct ClipValue{T}
thresh::T
end
apply!(o::ClipValue, x, Δ) = clamp!(Δ, -o.thresh, o.thresh)
"""
ClipNorm(thresh)
Clip gradients when their L2 norm exceeds `thresh`.
"""
mutable struct ClipNorm{T}
thresh::T
end
function apply!(o::ClipNorm, x, Δ)
Δnrm = norm(Δ)
if Δnrm > o.thresh
rmul!(Δ, o.thresh / Δnrm)
end
return Δ
end

View File

@ -68,8 +68,7 @@ and compute the gradient of `loss(d)`.
A callback is given with the keyword argument `cb`. For example, this will print
"training" every 10 seconds (using [`Flux.throttle`](@ref)):
train!(loss, params, data, opt,
cb = throttle(() -> println("training"), 10))
train!(loss, params, data, opt, cb = throttle(() -> println("training"), 10))
The callback can call [`Flux.stop`](@ref) to interrupt the training loop.

View File

@ -24,7 +24,7 @@ glorot_uniform(dims...) = (rand(Float32, dims...) .- 0.5f0) .* sqrt(24.0f0 / sum
glorot_normal(dims...)
Return an `Array` of size `dims` containing random variables taken from a normal
distribution with mean 0 and standard deviation `(2 / sum(dims))`.
distribution with mean 0 and standard deviation `sqrt(2 / sum(dims))`.
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
@ -246,6 +246,10 @@ function _restructure(m, xs)
end
end
@adjoint function _restructure(m, xs)
_restructure(m, xs), dm -> (nothing,destructure(dm)[1])
end
"""
destructure(m)

106
src/zeros.jl Normal file
View File

@ -0,0 +1,106 @@
import Base: +, -, *, reshape, size
import Base.Broadcast: broadcasted, Broadcasted, BroadcastStyle
"""
Zeros()
Zeros(size...)
Zeros(Type, size...)
Acts as a stand-in for an array of zeros that can be
used during training which is ignored by the optimisers.
Useful to turn bias off for a forward pass of a layer.
## Examples
```julia
julia> Flux.Zeros(3,3)
3×3 Flux.Zeros{Bool,2}:
false false false
false false false
false false false
julia> Flux.Zeros(Float32, 3,3)
3×3 Flux.Zeros{Float32,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> rand(3,3) .+ Flux.Zeros()
3×3 Array{Float64,2}:
0.198739 0.490459 0.785386
0.779074 0.39986 0.66383
0.854981 0.447292 0.314497
julia> bias_less_conv = Conv((2,2), 1=>3, bias = Flux.Zeros())
Conv((2, 2), 1=>3)
```
"""
struct Zeros{T,N} <: AbstractArray{T,N}
size::Tuple
end
Zeros(::Type{T}, sz...) where T = Zeros{T,length(sz)}(sz)
Zeros(sz::Integer...) = Zeros(Bool, sz...)
Base.size(xs::Zeros) = xs.size
Base.axes(xs::Zeros) = Base.OneTo.(size(xs))
Base.IndexStyle(::Type{<:Zeros}) = IndexLinear()
Base.getindex(xs::Zeros{T,N}, I::Int) where {T,N} = zero(T)
Base.getindex(xs::Zeros{T,N}, inds::Union{Base.OneTo, Base.UnitRange}) where {T,N} =
Zeros(T, length(inds))
Base.collect(xs::Zeros{T,N}) where {T,N} = fill(zero(T), size(xs))
@adjoint reshape(xs::Zeros{T}, dims...) where T =
reshape(xs, dims...), _ -> nothing
# Define basic ops
for f in (:+, :-)
@eval @inline function $f(a::Union{AbstractArray{<:Number}, Zeros}, b::Zeros)
@assert size(a) == size(b) throw(DimensionMismatch("dimensions must match"))
a
end
end
+(a::Zeros, b::AbstractArray) = b + a
-(a::Zeros, b::AbstractArray) = -b + a
Base.copy(xs::Zeros{T,N}) where {T,N} = xs
# Define broadcasting behaviour
for op in (:+, :-)
@eval function broadcasted(::typeof($op), a::AbstractArray, b::Zeros)
bs = Broadcast.broadcast_shape(size(a), size(b))
size(a) == bs && return a
sz = similar(a, bs)
sz .= a
end
end
broadcasted(::typeof(+), a::Zeros, b::AbstractArray) = broadcasted(+, b, a)
broadcasted(::typeof(-), a::Zeros, b::AbstractArray) = broadcasted(+, -b, a)
function broadcasted(::typeof(*), a::AbstractArray, b::Zeros)
Zeros(Broadcast.broadcast_shape(size(a), size(b))...)
end
broadcasted(::typeof(*), a::Zeros, b::AbstractArray) = broadcasted(*, b, a)
for op in (:+, :-, :*)
@eval broadcasted(::typeof($op), a::Zeros, b::Zeros) = Zeros(Broadcast.broadcast_shape(size(a), size(b))...)
end
# Some opportunities to avoid scalar indexing, intermediaries
# Since it replicates a little of what we expect Base to do,
# it should be possible to remove in the future, but for now,
# these help with performance.
broadcasted(::typeof(+), a::AbstractArray, b::Zeros{T,0}) where T = a
broadcasted(::typeof(+), a::Zeros{T,0}, b::AbstractArray) where T = b
broadcasted(::typeof(-), a::AbstractArray, b::Zeros{T,0}) where T = a
broadcasted(::typeof(-), a::Zeros{T,0}, b::AbstractArray) where T = -b
broadcasted(::typeof(*), a::AbstractArray, b::Zeros{T,0}) where T = zero(a)
broadcasted(::typeof(*), a::Zeros{T,0}, b::AbstractArray) where T = zero(b)
broadcasted(::typeof(/), a::Zeros{T,0}, b::AbstractArray) where T = zero(b)

View File

@ -69,6 +69,7 @@ if CuArrays.has_cudnn()
@info "Testing Flux/CUDNN"
include("cudnn.jl")
include("curnn.jl")
include("layers.jl")
else
@warn "CUDNN unavailable, not testing GPU DNN support"
end

98
test/cuda/layers.jl Normal file
View File

@ -0,0 +1,98 @@
# Test layers and data/model movements on and off the GPU
# Add tests for layers and their gradients on the GPU
# Most of the forward passes should be fine being applied
# to bitstype objects, but this gives higher coverage for our use-cases
# Check that getting the gradients does not throw
# generic movement tests
@testset "Basic GPU Movement" begin
@test gradient(x -> sum(gpu(x)), rand(3,3)) isa Tuple
@test gradient(x -> sum(cpu(x)), gpu(rand(3,3))) isa Tuple
end
# TODO: These layers get into scalar indexing
# `AlphaDropout` throws a compilation error on GPUs,
# whereas, the rest are scalar indexing issues.
const BROKEN_LAYERS = [DepthwiseConv,
AlphaDropout,
InstanceNorm,
GroupNorm]
function gradtest(name::String, layers::Vector, xs = nothing, args...)
isnothing(xs) && error("Missing input to test the layers against.")
@testset "$name GPU grad tests" begin
for layer in layers
@testset "$layer GPU grad test" begin
l = gpu(layer(args...))
xs = gpu(xs)
if any(x -> isa(l, x), BROKEN_LAYERS)
ps = Flux.params(l)
@test_broken gradient(() -> sum(l(xs)), ps) isa Flux.Zygote.Grads
else
ps = Flux.params(l)
@test gradient(() -> sum(l(xs)), ps) isa Flux.Zygote.Grads
gs = gradient(() -> sum(l(xs)), ps)
# Handle pooling layers
if !isempty(ps)
@test gs[first(ps)] isa Flux.CuArrays.CuArray
end
end
end
end
end
end
# Repeats from Conv, CrossCor
r = rand(Float32, 28, 28, 1, 1)
conv_layers = [Conv, ConvTranspose, CrossCor, DepthwiseConv]
gradtest("Conv", conv_layers, r, (2,2), 1=>3)
pooling_layers = [MaxPool, MeanPool]
gradtest("Pooling", pooling_layers, r, (2,2))
dropout_layers = [Dropout, AlphaDropout]
gradtest("Dropout", dropout_layers, r, 0.5f0)
norm_layers = [LayerNorm, BatchNorm]
gradtest("Normalising", norm_layers, rand(Float32, 28,28,3,1), 1)
instancenorm = [InstanceNorm]
gradtest("InstanceNorm", instancenorm, r, 1)
groupnorm = [GroupNorm]
gradtest("GroupNorm", groupnorm, rand(Float32, 28,28,3,1), 3, 1)
const stateless_layers = [Flux.mse,
Flux.crossentropy,
Flux.logitcrossentropy,
Flux.normalise]
const stateless_layers_broadcasted = [Flux.binarycrossentropy,
Flux.logitbinarycrossentropy]
function stateless_gradtest(f, args...)
@test gradient((args...) -> sum(f(args...)), args...)[1] isa CuArray
end
function stateless_gradtest_broadcasted(f, args...)
@test gradient((args...) -> sum(f.(args...)), args...)[1] isa CuArray
end
@testset "Stateless GPU grad tests" begin
x = gpu(rand(3,3))
y = gpu(rand(3,3))
for layer in stateless_layers
if layer == Flux.normalise
stateless_gradtest(layer, x)
else
stateless_gradtest(layer, x, y)
end
end
for layer in stateless_layers_broadcasted
stateless_gradtest_broadcasted(layer, x, y)
end
end

View File

@ -3,20 +3,34 @@
Y = [1:5;]
d = DataLoader(X, batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == typeof(X)
@test length(batches) == 3
@test batches[1] == X[:,1:2]
@test batches[2] == X[:,3:4]
@test batches[3] == X[:,5:5]
d = DataLoader(X, batchsize=2, partial=false)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == typeof(X)
@test length(batches) == 2
@test batches[1] == X[:,1:2]
@test batches[2] == X[:,3:4]
d = DataLoader(X, Y, batchsize=2)
d = DataLoader((X,), batchsize=2, partial=false)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == Tuple{typeof(X)}
@test length(batches) == 2
@test batches[1] == (X[:,1:2],)
@test batches[2] == (X[:,3:4],)
d = DataLoader((X, Y), batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == Tuple{typeof(X), typeof(Y)}
@test length(batches) == 3
@test length(batches[1]) == 2
@test length(batches[2]) == 2
@ -28,6 +42,22 @@
@test batches[3][1] == X[:,5:5]
@test batches[3][2] == Y[5:5]
# test with NamedTuple
d = DataLoader((x=X, y=Y), batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == NamedTuple{(:x, :y), Tuple{typeof(X), typeof(Y)}}
@test length(batches) == 3
@test length(batches[1]) == 2
@test length(batches[2]) == 2
@test length(batches[3]) == 2
@test batches[1][1] == batches[1].x == X[:,1:2]
@test batches[1][2] == batches[1].y == Y[1:2]
@test batches[2][1] == batches[2].x == X[:,3:4]
@test batches[2][2] == batches[2].y == Y[3:4]
@test batches[3][1] == batches[3].x == X[:,5:5]
@test batches[3][2] == batches[3].y == Y[5:5]
# test interaction with `train!`
θ = ones(2)
X = zeros(2, 10)
@ -41,7 +71,7 @@
X = ones(2, 10)
Y = fill(2, 10)
loss(x, y) = sum((y - x'*θ).^2)
d = DataLoader(X, Y)
d = DataLoader((X, Y))
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1))
@test norm(θ .- 1) < 1e-10
end

View File

@ -28,6 +28,14 @@ import Flux: activations
end
@testset "Dense" begin
@testset "constructors" begin
@test size(Dense(10, 100).W) == (100, 10)
@test Dense(rand(100,10), rand(10)).σ == identity
@test_throws MethodError Dense(10, 10.5)
@test_throws MethodError Dense(10, 10.5, tanh)
end
@test length(Dense(10, 5)(randn(10))) == 5
@test_throws DimensionMismatch Dense(10, 5)(randn(1))
@test_throws MethodError Dense(10, 5)(1) # avoid broadcasting
@ -37,7 +45,6 @@ import Flux: activations
@test Dense(10, 1, identity, initW = ones, initb = zeros)(ones(10,2)) == 10*ones(1, 2)
@test Dense(10, 2, identity, initW = ones, initb = zeros)(ones(10,1)) == 10*ones(2, 1)
@test Dense(10, 2, identity, initW = ones, initb = zeros)([ones(10,1) 2*ones(10,1)]) == [10 20; 10 20]
end
@testset "Diagonal" begin

View File

@ -25,6 +25,35 @@ end
Dense(288, 10), softmax)
@test size(m(r)) == (10, 5)
# Test bias switch
bias = Conv(ones(Float32, 2, 2, 1, 3), ones(Float32, 3))
ip = zeros(Float32, 28,28,1,1)
op = bias(ip)
@test sum(op) == prod(size(op))
bias = Conv((2,2), 1=>3, bias = Flux.Zeros())
op = bias(ip)
@test sum(op) === 0.f0
gs = gradient(() -> sum(bias(ip)), Flux.params(bias))
@test gs[bias.bias] == nothing
# Train w/o bias and make sure no convergence happens
# when only bias can be converged
bias = Conv((2, 2), 1=>3, bias = Flux.Zeros());
ip = zeros(Float32, 28,28,1,1)
op = zeros(Float32, 27,27,3,1) .+ 2.f0
opt = Descent()
for _ = 1:10^3
gs = gradient(params(bias)) do
Flux.mse(bias(ip), op)
end
Flux.Optimise.update!(opt, params(bias), gs)
end
@test Flux.mse(bias(ip), op) 4.f0
end
@testset "asymmetric padding" begin
@ -162,4 +191,28 @@ end
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MeanPool((2, 2); stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
end
end
@testset "$ltype SamePad kernelsize $k" for ltype in (Conv, ConvTranspose, DepthwiseConv, CrossCor), k in ( (1,), (2,), (3,), (4,5), (6,7,8))
data = ones(Float32, (k .+ 3)..., 1,1)
l = ltype(k, 1=>1, pad=SamePad())
@test size(l(data)) == size(data)
l = ltype(k, 1=>1, pad=SamePad(), dilation = k 2)
@test size(l(data)) == size(data)
stride = 3
l = ltype(k, 1=>1, pad=SamePad(), stride = stride)
if ltype == ConvTranspose
@test size(l(data))[1:end-2] == stride .* size(data)[1:end-2] .- stride .+ 1
else
@test size(l(data))[1:end-2] == ceil.(Int, size(data)[1:end-2] ./ stride)
end
end
@testset "$ltype SamePad windowsize $k" for ltype in (MeanPool, MaxPool), k in ( (1,), (2,), (3,), (4,5), (6,7,8))
data = ones(Float32, (k .+ 3)..., 1,1)
l = ltype(k, pad=SamePad())
@test size(l(data))[1:end-2] == ceil.(Int, size(data)[1:end-2] ./ k)
end

View File

@ -1,9 +1,26 @@
using Test
using Flux: onehotbatch, mse, crossentropy, logitcrossentropy,
σ, binarycrossentropy, logitbinarycrossentropy, flatten
σ, binarycrossentropy, logitbinarycrossentropy, flatten,
xlogx, xlogy
const ϵ = 1e-7
@testset "xlogx & xlogy" begin
@test iszero(xlogx(0))
@test isnan(xlogx(NaN))
@test xlogx(2) 2.0 * log(2.0)
@inferred xlogx(2)
@inferred xlogx(0)
@test iszero(xlogy(0, 1))
@test isnan(xlogy(NaN, 1))
@test isnan(xlogy(1, NaN))
@test isnan(xlogy(NaN, NaN))
@test xlogy(2, 3) 2.0 * log(3.0)
@inferred xlogy(2, 3)
@inferred xlogy(0, 1)
end
@testset "losses" begin
# First, regression-style y's
y = [1, 1, 0, 0]
@ -12,15 +29,15 @@ const ϵ = 1e-7
@testset "mse" begin
@test mse(ŷ, y) (.1^2 + .9^2)/2
end
@testset "mae" begin
@test Flux.mae(ŷ, y) 1/2
end
@testset "huber_loss" begin
@test Flux.huber_loss(ŷ, y) 0.20500000000000002
end
end
y = [123.0,456.0,789.0]
ŷ = [345.0,332.0,789.0]
@testset "msle" begin
@ -35,6 +52,7 @@ const ϵ = 1e-7
lossvalue = 1.203972804325936
@testset "crossentropy" begin
@test crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9]) crossentropy([0.1,0.9], [0.1,0.9])
@test crossentropy(ŷ, y) lossvalue
end
@ -63,46 +81,47 @@ const ϵ = 1e-7
@testset "logitbinarycrossentropy" begin
@test logitbinarycrossentropy.(logŷ, y) binarycrossentropy.(σ.(logŷ), y; ϵ=0)
end
y = [1 2 3]
ŷ = [4.0 5.0 6.0]
@testset "kldivergence" begin
@test Flux.kldivergence([0.1,0.0,0.9], [0.1,0.0,0.9]) Flux.kldivergence([0.1,0.9], [0.1,0.9])
@test Flux.kldivergence(ŷ, y) -1.7661057888493457
@test Flux.kldivergence(y, y) 0
@test Flux.kldivergence(y, y) 0
end
y = [1 2 3 4]
ŷ = [5.0 6.0 7.0 8.0]
@testset "hinge" begin
@test Flux.hinge(ŷ, y) 0
@test Flux.hinge(y, 0.5 .* y) 0.125
end
@testset "squared_hinge" begin
@test Flux.squared_hinge(ŷ, y) 0
@test Flux.squared_hinge(y, 0.5 .* y) 0.0625
end
y = [0.1 0.2 0.3]
ŷ = [0.4 0.5 0.6]
@testset "poisson" begin
@test Flux.poisson(ŷ, y) 0.6278353988097339
@test Flux.poisson(y, y) 0.5044459776946685
end
y = [1.0 0.5 0.3 2.4]
ŷ = [0 1.4 0.5 1.2]
@testset "dice_coeff_loss" begin
@test Flux.dice_coeff_loss(ŷ, y) 0.2799999999999999
@test Flux.dice_coeff_loss(y, y) 0.0
end
@testset "tversky_loss" begin
@test Flux.tversky_loss(ŷ, y) -0.06772009029345383
@test Flux.tversky_loss(ŷ, y, β = 0.8) -0.09490740740740744
@test Flux.tversky_loss(y, y) -0.5576923076923075
end
@testset "no spurious promotions" begin
for T in (Float32, Float64)
y = rand(T, 2)

View File

@ -57,35 +57,57 @@ end
end
@testset "ExpDecay" begin
w = randn(10, 10)
o = ExpDecay(0.1, 0.1, 1000, 1e-4)
w1 = randn(10,10)
loss(x) = Flux.mse(w*x, w1*x)
flag = 1
decay_steps = []
for t = 1:10^5
prev_eta = o.eta
θ = Params([w1])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
prev_grad = collect(θ̄[w1])
delta = Optimise.apply!(o, w1, θ̄[w1])
w1 .-= delta
new_eta = o.eta
if new_eta != prev_eta
push!(decay_steps, t)
end
array = fill(o.eta, size(prev_grad))
if array .* prev_grad != delta
flag = 0
end
@testset "Sanity Check" begin
o = ExpDecay(0.2, 0.5, 1, 1e-3)
p = [0.0]
steps = 1:8
eta_expected = @. max(o.eta * 0.5 ^ steps, o.clip)
eta_actual = [Optimise.apply!(o, p, [1.0])[1] for _ in steps]
@test eta_actual == eta_expected
end
w = randn(10, 10)
o = ExpDecay(0.1, 0.1, 1000, 1e-4)
w1 = randn(10,10)
loss(x) = Flux.mse(w*x, w1*x)
flag = 1
decay_steps = []
for t = 1:10^5
prev_eta = o.eta
θ = Params([w1])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
prev_grad = collect(θ̄[w1])
delta = Optimise.apply!(o, w1, θ̄[w1])
w1 .-= delta
new_eta = o.eta
if new_eta != prev_eta
push!(decay_steps, t)
end
@test flag == 1
# Test to check if decay happens at decay steps. Eta reaches clip value eventually.
ground_truth = []
for i in 1:11
push!(ground_truth, 1000*i) # Expected decay steps for this example.
array = fill(o.eta, size(prev_grad))
if array .* prev_grad != delta
flag = 0
end
@test decay_steps == ground_truth
@test o.eta == o.clip
end
@test flag == 1
# Test to check if decay happens at decay steps. Eta reaches clip value (1e-4) after 4000 steps (decay by 0.1 every 1000 steps starting at 0.1).
ground_truth = []
for i in 1:4
push!(ground_truth, 1000*i) # Expected decay steps for this example.
end
@test decay_steps == ground_truth
@test o.eta == o.clip
end
@testset "Clipping" begin
w = randn(10, 10)
loss(x) = sum(w * x)
θ = Params([w])
x = 1000 * randn(10)
= gradient(() -> loss(x), θ)[w]
w̄_value = Optimise.apply!(ClipValue(1.0), w, copy())
@test all(w̄_value .<= 1)
w̄_norm = Optimise.apply!(ClipNorm(1.0), w, copy())
@test norm(w̄_norm) <= 1
end

View File

@ -2,49 +2,45 @@ using Flux
using Flux.Data
using Test
using Random, Statistics, LinearAlgebra
using Documenter
using IterTools: ncycle
Random.seed!(0)
@testset "Flux" begin
@testset "Utils" begin
include("utils.jl")
end
@testset "Utils" begin
include("utils.jl")
end
@testset "Onehot" begin
include("onehot.jl")
end
@testset "Optimise" begin
include("optimise.jl")
end
@testset "Data" begin
include("data.jl")
end
@testset "Layers" begin
include("layers/basic.jl")
include("layers/normalisation.jl")
include("layers/stateless.jl")
include("layers/conv.jl")
end
@testset "CUDA" begin
if Flux.use_cuda[]
include("cuda/cuda.jl")
else
@warn "CUDA unavailable, not testing GPU support"
end
@testset "Onehot" begin
include("onehot.jl")
end
@testset "Optimise" begin
include("optimise.jl")
end
@testset "Data" begin
include("data.jl")
end
@testset "Layers" begin
include("layers/basic.jl")
include("layers/normalisation.jl")
include("layers/stateless.jl")
include("layers/conv.jl")
end
@testset "CUDA" begin
if Flux.use_cuda[]
include("cuda/cuda.jl")
else
@warn "CUDA unavailable, not testing GPU support"
end
end
@static if VERSION >= v"1.4"
using Documenter
@testset "Docs" begin
if VERSION >= v"1.4"
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
doctest(Flux)
end
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
doctest(Flux)
end
end # testset Flux
end