Compare commits

..

1786 Commits

Author SHA1 Message Date
bors[bot]
7035ee9bea
Merge #1238
1238: Fix inline code block r=dhairyagandhi96 a=harryscholes

### PR Checklist

- [ ] Tests are added
- [ ] Entry in NEWS.md
- [x] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: harryscholes <harryscholes@gmail.com>
2020-06-19 08:28:41 +00:00
harryscholes
57efd7fead Fix inline code block 2020-06-19 09:24:44 +01:00
bors[bot]
19b45b49d3
Merge #1221
1221: DataLoader with NamedTuple r=CarloLucibello a=cossio

Just a couple of small changes, so that `DataLoader` can be created with a `NamedTuple` of tensors instead of `Tuple`. This way the tensors can be referred to by name. For example

```
train_loader = DataLoader((images = Xtrain, labels = Ytrain), batchsize=16)
batch = first(train_loader)
y = model(batch.images)
logitcrossentropy(y, batch.labels)
```

If we only use tuples, then in datasets with multiple tensors one has to be careful about the order in which the tensors are fed into the `DataLoader` constructor and be consistent with this elsewhere. With `NamedTuples` one just have to be consistent about the names used, which I think is a minor improvement.

CC @CarloLucibello 

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md
- [x] Documentation, if applicable

I don't think this qualifies as an API change. It's just a minor feature addition. So final review probably not required.

- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: cossio <j.cossio.diaz@gmail.com>
Co-authored-by: cossio <cossio@users.noreply.github.com>
2020-06-16 17:21:28 +00:00
bors[bot]
254e4a7058
Merge #1231
1231: use `ntuple` in conv r=MikeInnes a=MikeInnes

This is the right abstraction over `map`, and in particular is a bit easier to compile away in some cases. 

As this is a trivial change from Flux's perspective it's not easy to test here, but there are downstream tests in XLA.jl.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-06-16 13:04:20 +00:00
Mike J Innes
9f931dd7fa use ntuple in conv 2020-06-16 14:02:24 +01:00
cossio
9078f85096 revert selectdim
selectdim can lead to type instability, see https://discourse.julialang.org/t/why-selectdim-is-type-instable/25271/5
2020-06-16 13:32:27 +02:00
cossio
1dbaf32810 DataLoader type inference tests 2020-06-16 13:32:27 +02:00
cossio
cb34bb848b simplify _getobs 2020-06-16 13:32:27 +02:00
cossio
75692161a7 Apply suggestions from code review
accept suggested changes

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-06-16 13:32:27 +02:00
cossio
909a55ac10 news and docs 2020-06-16 13:32:27 +02:00
cossio
02ee6ba426 DataLoader with NamedTuple 2020-06-16 13:31:29 +02:00
bors[bot]
97406507fd
Merge #1218
1218: Require weight and bias to be AbstractArrays r=CarloLucibello a=oxinabox

closes #1199
While in theory someone could be using Dense with weights and biases that are not abstract arrays, I would be surprised.
So allowing it is just leaving a food-gun laying around.
If it is common then we can instead close #1199 by adding a special constructor for `Number` subtypes that error if they are not integers, or something a long those lines.

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md

I think this is a bug-fix thus the following are not required:

- [ ] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).


Co-authored-by: Lyndon White <lyndon.white@invenialabs.co.uk>
Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>
2020-06-15 15:21:21 +00:00
Lyndon White
e61787c1c8
Update test/layers/basic.jl 2020-06-12 13:58:10 +01:00
Lyndon White
601f842eaf
bonus test 2020-06-11 23:17:40 +01:00
bors[bot]
99ec30c8c2
Merge #1220
1220: CompatHelper: bump compat for "Adapt" to "2.0" r=CarloLucibello a=github-actions[bot]

This pull request changes the compat entry for the `Adapt` package from `1` to `1, 2.0`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-06-11 09:54:46 +00:00
github-actions[bot]
fbfc973011 CompatHelper: bump compat for "Adapt" to "2.0" 2020-06-11 00:18:47 +00:00
Lyndon White
a1623aca76
move into 0.11 news 2020-06-10 12:39:00 +01:00
Lyndon White
15c7354c4e
Make release as DEV 2020-06-10 12:38:33 +01:00
Lyndon White
97b0aa4d36 bump version 2020-06-10 12:14:47 +01:00
Lyndon White
cf90517a8a update news.md 2020-06-10 12:14:19 +01:00
Lyndon White
df84628c29 Require weight and bias to be AbstractArrays 2020-06-10 12:06:57 +01:00
bors[bot]
e1f80d4627
Merge #1213
1213: Fixing indentation in train! docstring r=CarloLucibello a=natema

One code block is not correctly displayed in the doc of [Flux.Optimise.train!
](https://fluxml.ai/Flux.jl/stable/training/training/#Flux.Optimise.train!). 
Based on the previous code block, I guess it's an indentation problem.


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-08 18:29:46 +00:00
bors[bot]
a7bbd3d35b
Merge #1152
1152: extend dataloader r=CarloLucibello a=CarloLucibello

cfr discussion in #1149. Currently DataLoader interface supports

1. `for x in DataLoader(X)`
2. `for (x, y) in DataLoader(X, Y)`

This PR adds

3. `for (x,) in DataLoader((X,))`
4. `for (x, y) in DataLoader((X, Y))`

Edit:
the constructor in 2. is removed in this PR

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-06-08 18:01:06 +00:00
CarloLucibello
0cf46432cf cleanup 2020-06-08 19:59:34 +02:00
natema
70bbf18180
Fixing indentation in train! docstring
One code block is not correctly displayed in the doc of [Flux.Optimise.train!
](https://fluxml.ai/Flux.jl/stable/training/training/#Flux.Optimise.train!). 
Based on the previous code block, I guess it's an indentation problem.
2020-06-07 15:44:04 +02:00
bors[bot]
d9b07475b0
Merge #1129
1129: Added dropgrad in huber_loss r=CarloLucibello a=HenriDeh

Workaround to prevent `iterate(::nothing)` when working with CuArrays. See issue #1128

Co-authored-by: HenriDeh <47037088+HenriDeh@users.noreply.github.com>
2020-06-06 17:21:19 +00:00
bors[bot]
9ebbe8cb4c
Merge #1141
1141: Speedup matmul of CuMatrix and OneHotMatrix r=CarloLucibello a=AStupidBear

This solves #189.

```julia
julia> using Flux


julia> using Flux: CuArrays

julia> A = zeros(300, 10000) |> gpu;

julia> B = Flux.onehotbatch(rand(1:10000, 256), 1:10000) |> gpu;

julia> A * B; CuArrays.@time A * B;
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays ~/shared/.julia/packages/GPUArrays/OXvxB/src/host/indexing.jl:43
  0.002824 seconds (951 CPU allocations: 38.156 KiB) (2 GPU allocations: 301.000 KiB, 2.32% gc time of which 46.42% spent allocating)

julia> import Base: *

julia> A::AbstractMatrix * B::Flux.OneHotMatrix = @inbounds A[:, map(x->x.ix, B.data)]
* (generic function with 522 methods)

julia> A * B; CuArrays.@time A * B;
  0.000343 seconds (169 CPU allocations: 5.000 KiB) (2 GPU allocations: 301.000 KiB, 15.53% gc time of which 65.97% spent allocating)
```

Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-06-06 17:00:01 +00:00
CarloLucibello
b1f226eb34 add news 2020-06-06 18:15:04 +02:00
CarloLucibello
a643cb6758 extend dataloader 2020-06-06 18:02:03 +02:00
bors[bot]
792a1c54f8
Merge #1211
1211: Fixing syntax in onehot docstring r=CarloLucibello a=natema

`otherwise, it will error` -> `otherwise, it will raise an error`


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-06 15:02:40 +00:00
natema
8f6aed5770
Fixing syntax in onehot docstring
`otherwise, it will error` -> `otherwise, it will raise an error`
2020-06-05 18:20:50 +02:00
bors[bot]
22d5e318e5
Merge #1192
1192: Improve `restructure` performance r=dhairyagandhi96 a=MikeInnes

A small change, but it significantly improves the performance on the following test case:

```julia
julia> VERSION
v"1.5.0-DEV.876"

julia> using Flux, DiffEqFlux, BenchmarkTools

julia> using Flux: mse

julia> fastdense = FastDense(784, 32, tanh);

julia> p = initial_params(fastdense);

julia> dense = Dense(784, 32, tanh);

julia> p,re = Flux.destructure(dense);

julia> x = rand(Float32, 784, 10);

julia> y = rand(Float32, 32, 10);

julia> @btime gradient((x,p) -> mse(fastdense(x, p), y), x, p);
  505.530 μs (87 allocations: 240.73 KiB)

julia> @btime gradient((x,p) -> mse(re(p)(x), y), x, p);
  107.796 μs (139 allocations: 340.94 KiB)
```

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-06-05 14:53:11 +00:00
bors[bot]
71ebd51e45
Merge #1208
1208: Fixing output format for `onehot` r=dhairyagandhi96 a=natema

Currently `Flux.OneHotVector` is displayed as a binary vector (0/1) rather than a boolean one (true/false). This is also shown in successive examples in the same page. 
I fixed the `onehot(:b, [:a, :b, :c])` and `onehot(:c, [:a, :b, :c])` outputs in the first example of the page accordingly.


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-05 09:17:12 +00:00
bors[bot]
b5a73f8532
Merge #1207
1207: Fixing typo in docs r=dhairyagandhi96 a=natema

`what ever` -> `whatever`


Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-05 09:00:06 +00:00
natema
48d6f2d0c0
Fixing output format for onehot
`Flux.OneHotVector` is displayed as a binary vector (0/1) rather than a boolean (true/false) one, as is also shown in successive examples in the same page, so I fixed the `onehot(:b, [:a, :b, :c])` and `onehot(:c, [:a, :b, :c])` output as given by the current Julia version 1.4.2.
2020-06-03 17:03:08 +02:00
natema
2c4b1e521e
Fixing typo in docs
`what ever` -> `whatever`
2020-06-02 19:20:41 +02:00
bors[bot]
ca1b1b2c7c
Merge #1206
1206: Fixing ambiguous remark in Preserve inputs' types r=dhairyagandhi96 a=natema

This PR is based on the [discussion in the forum](https://discourse.julialang.org/t/not-clear-what-0-01f0x-is-in-the-flux-docs/40553?u=mathematics) on the ambiguity of `0.01f0x` in the line
> While one could change the activation function (e.g. to use `0.01f0x`)

Co-authored-by: natema <natema@users.noreply.github.com>
2020-06-02 17:09:58 +00:00
natema
a24f46b606
Fixing ambiguous remark in Preserve inputs' types
This PR is based on the [discussion in the forum](https://discourse.julialang.org/t/not-clear-what-0-01f0x-is-in-the-flux-docs/40553?u=mathematics) on the ambiguity of `0.01f0x` in the line
> While one could change the activation function (e.g. to use `0.01f0x`)
2020-06-02 18:48:07 +02:00
Mike J Innes
089ec0832c improved restructure adjoint 2020-05-27 12:28:22 +01:00
bors[bot]
ddd0f4e747
Merge #1191
1191: Pull Request Template r=MikeInnes a=MikeInnes

Hopefully makes it a little clearer what the requirements are, which will lead to easier review, and encourage things like NEWS.md that we want to be better in sync.

cc @dhairyagandhi96 and @CarloLucibello for thoughts.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-27 11:15:26 +00:00
Mike J Innes
e10818bbad
Update pull_request_template.md 2020-05-27 12:12:13 +01:00
Mike J Innes
8c3a80c940
Create pull_request_template.md 2020-05-26 12:52:28 +01:00
bors[bot]
85c39e2309
Merge #1190
1190: Correcting advanced.md r=dhairyagandhi96 a=Sleort

To make the example consistent, it should be 
```
julia> Flux.trainable(a::Affine) = (a.W,)
```
not
```
julia> Flux.trainable(a::Affine) = (a.W, a.b)
```

Co-authored-by: Troels Arnfred Bojesen <tr-ab@online.no>
2020-05-25 14:47:42 +00:00
Troels Arnfred Bojesen
17bb00a3fa
Correcting advanced.md
To make the example consistent, it should be 
```
julia> Flux.trainable(a::Affine) = (a.W,)
```
not
```
julia> Flux.trainable(a::Affine) = (a.W, a.b)
```
2020-05-25 23:33:09 +09:00
bors[bot]
bd152ca099
Merge #1177
1177: Align ExpDecay implementation with documentation r=dhairyagandhi96 a=DrChainsaw

Fix for #1176 



Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
2020-05-21 14:33:20 +00:00
bors[bot]
f343172daf
Merge #1185
1185: Add some news r=dhairyagandhi96 a=dhairyagandhi96

cc @CarloLucibello please add to this list as well

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-21 12:46:39 +00:00
bors[bot]
472e1fbf5e
Merge #957
957: Add some gradient checking tests on GPUs r=dhairyagandhi96 a=dhairyagandhi96

Good to add generic tests for tracking gradients through the various layers on the GPU.

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
2020-05-21 12:25:53 +00:00
Dhairya Gandhi
0801064d50 add comment on broken layers 2020-05-20 00:11:38 +05:30
Dhairya Gandhi
c4409fa6d1 clearing failures 2020-05-19 23:54:18 +05:30
bors[bot]
87ba651add
Merge #1165
1165: Fix docstring of logitcrossentropy r=dhairyagandhi96 a=cossio

Since `y` is a logit, there is no log (see the diff).

Co-authored-by: cossio <cossio@users.noreply.github.com>
2020-05-19 11:07:15 +00:00
Dhairya Gandhi
55430e207d add news 2020-05-19 16:34:28 +05:30
bors[bot]
0b10f1a8df
Merge #1184
1184: Add some functions to docs r=dhairyagandhi96 a=dhairyagandhi96



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-18 21:10:46 +00:00
DrChainsaw
9a24ee0bd7 Change intendation to 2 spaces 2020-05-18 21:52:40 +02:00
Dhairya Gandhi
bdfe567519 add some layers to docs 2020-05-18 23:53:11 +05:30
bors[bot]
b6a5dd7152
Merge #1133
1133: add ClipValue and ClipNorm r=CarloLucibello a=AStupidBear



Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-05-15 17:15:07 +00:00
Yao Lu
007586858c fix export merge conflict 2020-05-14 17:13:35 +08:00
Dhairya Gandhi
fab53e0a01
Merge pull request #1179 from FluxML/compathelper/new_version/2020-05-13-00-13-17-919-1190174363
CompatHelper: add new compat entry for "Functors" at version "0.1"
2020-05-13 11:27:40 +05:30
github-actions[bot]
3fa9e91c41 CompatHelper: add new compat entry for "Functors" at version "0.1" 2020-05-13 00:13:46 +00:00
DrChainsaw
e8433d0abe Align ExpDecay implementation with documentation 2020-05-12 22:50:17 +02:00
bors[bot]
de39d1095b
Merge #1175
1175: xlogy broadcast adjoint r=MikeInnes a=MikeInnes

This is helpful for performance, since it avoids having to differentiate `xlogy` itself inside of a map.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-12 17:10:58 +00:00
Mike J Innes
f5a8900ffb xlogy broadcast adjoint 2020-05-12 17:29:35 +01:00
Mike J Innes
bd43201f37
fix logitcrossentropy doc string 2020-05-12 16:18:29 +01:00
bors[bot]
a84e08cf28
Merge #1174
1174: Functors r=MikeInnes a=MikeInnes

Just splits out the implementation to the [Functors](https://github.com/FluxML/Functors.jl) package, so the same traits can be used elsewhere (e.g. Optimisers.jl) without depending on all of Flux.

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2020-05-12 14:39:08 +00:00
Mike J Innes
22d29c9bfd released functors.jl 2020-05-12 15:33:14 +01:00
Dhairya Gandhi
36d3a9ce99
Merge pull request #1162 from aminya/patch-5
Update CompatHelper.yml
2020-05-10 14:21:14 +05:30
Yao Lu
5a9eb7411a cpu 2020-05-10 14:39:48 +08:00
Yao Lu
888f286c51 use @inbounds 2020-05-09 19:40:46 +08:00
Yao Lu
63cb70dd23 remove importing CuMatrix 2020-05-09 19:13:52 +08:00
Yao Lu
30648910c8 transfer onehot indices back to cpu 2020-05-09 19:10:46 +08:00
Yao Lu
d1ad8db625 add to docs 2020-05-09 16:40:26 +08:00
bors[bot]
d89ee6cdba
Merge #1167
1167: Update basics.md r=dhairyagandhi96 a=mipals

Removing superfluous ```using Flux```

Co-authored-by: Mikkel Paltorp Schmitt <mikkel.paltorp@gmail.com>
2020-05-08 11:38:22 +00:00
bors[bot]
0287abbf66
Merge #1166
1166: Fix crossentropy when some probabilities are zero r=dhairyagandhi96 a=cossio

Use a function `xlogy(x,y) = x * log(y)` that has the correct limit at `x=0`.

Before this PR:

```julia
julia> Flux.crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9])
NaN
```

After this PR:

```julia
julia> Flux.crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9])
0.3250829733914482
```

Co-authored-by: cossio <j.cossio.diaz@gmail.com>
2020-05-08 11:14:31 +00:00
cossio
17f54e4c6f bump version 2020-05-08 12:57:34 +02:00
cossio
feb72d400a NaN 2020-05-07 12:44:32 +02:00
cossio
86d6555269 cufunc 2020-05-07 09:58:33 +02:00
Mikkel Paltorp Schmitt
40efa9df49
Update basics.md
Removing superfluous ```using Flux```
2020-05-06 13:41:56 +02:00
cossio
8314200c51 generic 2020-05-05 19:23:05 +02:00
cossio
06c1e20372 add tests 2020-05-05 19:05:04 +02:00
cossio
480473a81b xlogy 2020-05-05 18:33:50 +02:00
cossio
9e1fd883d5
Fix docstring of logitbinarycrossentropy and logitcrossentropy 2020-05-05 16:29:29 +02:00
Amin Yahyaabadi
70f76fd6db
Update CompatHelper.yml 2020-05-05 07:11:22 -05:00
bors[bot]
c444226db5
Merge #1160
1160: Build docs on Julia 1.3 r=dhairyagandhi96 a=dhairyagandhi96

This causes red CI otherwise

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-04 12:59:25 +00:00
Dhairya Gandhi
f2c66579ec yaml syntax fix 2020-05-04 18:01:33 +05:30
Dhairya Gandhi
fc464f5ef8 build docs on Julia 1.3 2020-05-04 17:54:04 +05:30
bors[bot]
1e2476b3c2
Merge #1156
1156: Add correct overload for apply! in docs r=dhairyagandhi96 a=dhairyagandhi96

Maybe we should considering adding a `const` name that is better than `apply!` (or rename `apply!`) and export it, so folks can just overload `descriptive_apply_my_optimiser_rule!` rather than have to go to the sub-project `Flux.Optimise`?

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-04 06:01:23 +00:00
Dhairya Gandhi
d6a1ccd354 add correct overload for apply in docs 2020-05-03 16:56:39 +05:30
bors[bot]
5d9acc7e73
Merge #873
873: Make bias optional r=MikeInnes a=dhairyagandhi96

Addresses #868 



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-05-01 13:28:15 +00:00
Mike J Innes
8f877f2dbf quick fix 2020-05-01 14:22:46 +01:00
Dhairya Gandhi
29215fa5d7 comment on possible future deprecations 2020-04-29 16:17:44 +05:30
Dhairya Gandhi
534809ae78 move zeros to its own file 2020-04-29 16:15:35 +05:30
Dhairya Gandhi
5086c0f4f0 merge conflicts 2020-04-29 16:11:39 +05:30
Yao Lu
114f63a214 norm(Δ) 2020-04-26 17:28:07 +08:00
Yao Lu
eb6898ea19 speedup matmul of CuMatrix and OneHotMatrix 2020-04-25 23:22:46 +08:00
Yao Lu
7d6f711c6f Merge branch 'master' into clip 2020-04-25 22:18:58 +08:00
bors[bot]
9237cdaf5b
Merge #901
901: Add option for "Same" padding to conv and pooling layers r=dhairyagandhi96 a=DrChainsaw

Fixes #813 

This adds the possibility to set "pad=SamePad()" to automatically calculate the amount of padding to apply so that outputsize==inputsize (assuming stide == 1).

Comments on API more than welcome. I considered the following options:

* Call the type just Same and export it, but I was afraid to cause name collisions due to a too generic name
* Call the type Same and not export it
* Dispatch on type instead of instance (so that one can type pad=Same instead of pad=Same())
* Supply a method instead of a type, giving a similar API as above. 

Happy to change to any of the above or to anything else.

I don't think that same padding is common for pooling layers, but I added it just for the sake of consistency. It is a separate commit so it can easily be removed if not wanted.

Co-authored-by: DrChainsaw <Christian.kyril.skarby@gmail.com>
2020-04-25 04:39:18 +00:00
DrChainsaw
4e4f6d9d1f Change next version entry to 0.10.5 2020-04-24 22:07:57 +02:00
DrChainsaw
deff98812a Add v0.11.0 entry and added samepadding option 2020-04-24 21:59:02 +02:00
DrChainsaw
1544f84bb9 Fix merge conflicts 2020-04-24 21:56:26 +02:00
Yao Lu
58a72ec879 Merge branch 'master' of https://github.com/FluxML/Flux.jl into clip 2020-04-22 01:29:13 +08:00
Yao Lu
c4f5e83697 resolve conflict 2020-04-22 01:24:13 +08:00
Yao Lu
1dfec7f38b add test 2020-04-22 01:22:34 +08:00
Yao Lu
def19b058e simplify docstrings 2020-04-21 10:56:38 +08:00
Yao Lu
cc1dcd5590 rm requires 2020-04-20 20:02:29 +08:00
Yao Lu
68b84bba36 add LinearAlgebra 2020-04-20 19:54:44 +08:00
Yao Lu
ba0fca5a19 remove onehot 2020-04-20 19:45:15 +08:00
Yao Lu
b33c4b49be add ClipValue and ClipNorm 2020-04-20 19:41:10 +08:00
Yao Lu
427c55af92 speedup matmul of CuMatrix and OneHotMatrix 2020-04-20 19:11:57 +08:00
HenriDeh
ac94754281
Update stateless.jl 2020-04-18 13:23:11 +02:00
bors[bot]
cdada06472
Merge #1131
1131: Update glorot_normal doc r=dhairyagandhi96 a=AdarshKumar712

Just a minute correction in glorot_normal function doc.

Co-authored-by: Adarsh Kumar <45385384+AdarshKumar712@users.noreply.github.com>
2020-04-18 00:58:49 +00:00
Adarsh Kumar
d53deb9132
Update glorot_normal doc 2020-04-18 03:19:32 +05:30
HenriDeh
1f2643c95c
Add dropgrad in huber_loss
Workaround for issue #1128
2020-04-17 13:34:04 +02:00
bors[bot]
d49d121a65
Merge #1127
1127: Removed deprecated SGD exports r=dhairyagandhi96 a=bhvieira

Closes #1121 

Co-authored-by: Bruno Hebling Vieira <bruno.hebling.vieira@usp.br>
2020-04-16 13:28:00 +00:00
Bruno Hebling Vieira
2c9881bca6 Merge branch 'master' into removeSGD 2020-04-16 09:56:38 -03:00
Bruno Hebling Vieira
db99e41959 Removed SGD exports 2020-04-16 09:50:41 -03:00
Mike J Innes
a35335db00 update for functors.jl change 2020-04-14 15:21:45 +01:00
Mike J Innes
6eda279190 split out functor 2020-04-14 13:58:52 +01:00
bors[bot]
32e2435729
Merge #1123
1123: Fix doc indent r=dhairyagandhi96 a=matsueushi

Fix [docs for `update!`](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.update!).

Co-authored-by: matsueushi <matsueushi@gmail.com>
2020-04-14 04:20:30 +00:00
matsueushi
be92618473 Fix doc indent 2020-04-14 00:12:06 -04:00
bors[bot]
7a32a703f0
Merge #853
853: Improve docs r=CarloLucibello a=janEbert

If you disagree with any of the changes, please tell me what to reverse or fix.
I am unsure about the docstrings I added to `src/utils.jl` for `unsqueeze` and
the `[un]stack` functions so please give those a more detailed look.

Update Documenter.jl version for new features, fix deprecation warnings in
`docs/make.jl` and import Flux for all doctests.
Add missing docstrings to `src/utils.jl`, `src/layers/stateless.jl` and `src/data/`; add
these and other missing functions to Markdown docs.

Improve docstrings by...
   - fixing typos,
   - removing trailing or double whitespaces,
   - using `jldoctest` blocks where applicable,
   - fixing, updating or correctly setting up existing doctests,
   - improving consistency (for example, always use "# Examples" instead
     of other variants),
   - removing empty lines between docstrings and functions,
   - instead of mentioning keywords, put them into the docstring,
   - adding some missing but useful keywords,
   - adding references (`@ref`),
   - using LaTeX math where applicable, and
   - linking papers.

Debatable stuff that is untouched:
   - BE/AE s/z irregularities (e.g. "normalise" versus "normalize") since
     most papers use the AE version while the Flux source code was
     written with BE spelling.
   - Names of normalization functions are capitalized
     ("Batch Normalization" instead of "batch normalization").
   - Default values in argument lists have spaces around the equals sign (`arg = x` instead of `arg=x`).

Co-authored-by: janEbert <janpublicebert@posteo.net>
2020-04-06 13:47:42 +00:00
bors[bot]
a9f8250b43
Merge #1110
1110: fix tests and new version r=CarloLucibello a=CarloLucibello

Add to set the Boston Housing dataset tests as broken due to as SSL certificate expiration problem wich is not our fault

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-04-06 13:27:58 +00:00
janEbert
684570660a Update doctest version guard (1.2 -> 1.4)
And add the same to docs/make.jl
2020-04-06 13:53:36 +02:00
janEbert
0e9bc82626 Loss -> Loss Functions 2020-04-06 13:52:27 +02:00
Carlo Lucibello
c54d71ce56 update travis 2020-04-06 13:20:28 +02:00
Carlo Lucibello
d6cb9f055d fix housing download 2020-04-06 11:08:20 +02:00
Carlo Lucibello
f9e9710446 update travis and bound julia version 2020-04-06 09:35:34 +02:00
Carlo Lucibello
18ea480388 fix tests and new version 2020-04-06 09:26:38 +02:00
janEbert
2a65a30399 Fix doctests in runtests.jl 2020-04-05 13:58:27 +02:00
janEbert
8d2d15aa70 Remove links to OneHot{Vector,Matrix}
Since they aren't documented, we only get a 404 link.
2020-04-04 23:06:56 +02:00
janEbert
73d631f5cd Fix and improve docs
Add missing docstrings, improve existing ones, fix links to functions
or files.
2020-04-04 23:00:34 +02:00
janEbert
2ce5f6d9bf Further docstring improvements in src/
Some had to be re-done after the rebase
2020-04-04 22:59:45 +02:00
janEbert
64ce32ddcf Fix problems due to rebase 2020-04-04 22:55:14 +02:00
janEbert
e16c24a9b8 General minuscule improvements 2020-04-04 19:43:28 +02:00
janEbert
a614983e0b Improve parameter lists in optimisers.jl 2020-04-04 18:40:20 +02:00
janEbert
aaa0a82b74 Slight modifications in recurrent docstrings 2020-04-04 18:21:10 +02:00
janEbert
3b913cd501 Fix rebase changes
- Remove `Flux.testmode!` reference (the function no longer exists).
- Change TrackedArray to Array in doctest (Tracker -> Zygote).
2020-04-04 18:21:10 +02:00
janEbert
ff9198b939 Add datasets to docs
All the relevant functions. Perhaps discuss a consistent API, describe
it in the docs and then only document the modules.
2020-04-04 18:19:20 +02:00
janEbert
740a59d0a6 Add missing docstrings to src/data. 2020-04-04 18:16:46 +02:00
janEbert
ba80c2e8ab Improve whitespaces in docs 2020-04-04 18:16:46 +02:00
janEbert
ab86e350f2 Improve docstrings
Improvements like...
   - fixing typos,
   - removing trailing and double whitespaces,
   - using `jldoctest` blocks where applicable,
   - fixing, updating or correctly setting up existing doctests,
   - improving consistency (for example, always use "# Examples" instead
     of other variants),
   - removing empty lines between docstrings and functions,
   - instead of mentioning keywords, put them into the docstring,
   - adding some missing but useful keywords,
   - adding references (`@ref`),
   - using LaTeX math where applicable, and
   - linking papers.

Debatable stuff that is untouched:
   - BE/AE s/z irregularities ("normalise" versus "normalize") since
     most papers use the AE version while the Flux source code was
     written with BE spelling.
   - Names of normalization functions are capitalized
     ("Batch Normalization" instead of "batch normalization").
2020-04-04 18:16:46 +02:00
janEbert
c76b7315ac Add loss and utility functions to docs 2020-04-04 17:39:19 +02:00
janEbert
c222e1b124 Add missing docstrings to src/utils.jl
Not sure about the `stack`, `unstack` and `unsqueeze` functions.
2020-04-04 17:38:25 +02:00
janEbert
2f955a33cd src/layers/stateless.jl: Add missing docstrings 2020-04-04 17:36:23 +02:00
janEbert
9b68423e64 Import (using) Flux for all doctests 2020-04-04 17:22:08 +02:00
janEbert
1bf8dc2d5b Update Documenter version and fix warnings
0.23.2 -> 0.23.3
2020-04-04 17:22:08 +02:00
bors[bot]
6b37ce3986
Merge #1098
1098: Allow CuArrays v2.x r=dhairyagandhi96 a=ararslan



Co-authored-by: Tim Besard <tim.besard@gmail.com>
Co-authored-by: Alex Arslan <ararslan@comcast.net>
Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-03-26 09:43:21 +00:00
Dhairya Gandhi
6939e03fc6 bump CuArrays version 2020-03-26 14:03:55 +05:30
Dhairya Gandhi
119a66a7cd Merge remote-tracking branch 'origin/tb/cuarraystyle' into aa/cuarrays 2020-03-26 13:42:06 +05:30
Alex Arslan
e85a5d8573
Update CUDAdrv for Tim's bug fix 2020-03-25 15:23:07 -07:00
Alex Arslan
49ba121159
Update Manifest.toml 2020-03-25 12:48:29 -07:00
Alex Arslan
347f53adf6
Allow CuArrays v2.x 2020-03-25 10:58:39 -07:00
bors[bot]
240ab1147f
Merge #1096
1096: fix doc typos r=dhairyagandhi96 a=wenjie-p



Co-authored-by: yuebanyishenqiu <thisispwj@outlook.com>
2020-03-22 06:26:11 +00:00
yuebanyishenqiu
1511778267 fix typos 2020-03-22 09:41:29 +08:00
bors[bot]
1605a01039
Merge #1083
1083: Fix typo in the docstrings of AlphaDropout r=CarloLucibello a=AzamatB



Co-authored-by: AzamatB <aberdysh@gmail.com>
2020-03-14 09:56:05 +00:00
AzamatB
85a9493722
Fix typo in the docstrings of AlphaDropout 2020-03-14 15:42:00 +06:00
bors[bot]
5e09113586
Merge #1080
1080: CompatHelper: bump compat for "Colors" to "0.12" r=dhairyagandhi96 a=github-actions[bot]

This pull request changes the compat entry for the `Colors` package from `0.8, 0.9, 0.10, 0.11` to `0.8, 0.9, 0.10, 0.11, 0.12`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-03-14 01:47:45 +00:00
github-actions[bot]
bca74213ee CompatHelper: bump compat for "Colors" to "0.12" 2020-03-14 00:12:33 +00:00
bors[bot]
8930021b47
Merge #1078
1078: CompatHelper: bump compat for "CodecZlib" to "0.7" r=CarloLucibello a=github-actions[bot]

This pull request changes the compat entry for the `CodecZlib` package from `0.5, 0.6` to `0.5, 0.6, 0.7`.

This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2020-03-13 11:49:02 +00:00
github-actions[bot]
69e96ed1c1 CompatHelper: bump compat for "CodecZlib" to "0.7" 2020-03-13 00:13:04 +00:00
bors[bot]
a874bef6f9
Merge #1076
1076: fix typo in the Dropout docs r=dhairyagandhi96 a=AzamatB



Co-authored-by: AzamatB <aberdysh@gmail.com>
2020-03-10 09:40:28 +00:00
AzamatB
f0d866b2fd
fix typo in the Dropout docs 2020-03-10 12:44:19 +06:00
bors[bot]
d4cf1436df
Merge #950
950: added GlobalMaxPool, GlobalMeanPool, and flatten layers r=CarloLucibello a=gartangh



Co-authored-by: Garben Tanghe <garben.tanghe@gmail.com>
2020-03-08 14:27:10 +00:00
Garben Tanghe
fc3af681ec updated documentation 2020-03-08 14:22:09 +01:00
Garben Tanghe
746e3310f1 removed Flatten struct
updated documentation
2020-03-08 14:22:03 +01:00
Garben Tanghe
82e16a5b29 split up Flatten layer to use the flatten function 2020-03-08 14:21:59 +01:00
Garben Tanghe
3e14bd878c added GlobalMaxPool, GlobalMeanPool, and Flatten layers 2020-03-08 14:18:48 +01:00
Dhairya Gandhi
d8e44fcc1c correct broadcasting for addition 2020-03-04 18:22:45 +05:30
Dhairya Gandhi
7e308e77fd rm unneccesary fns 2020-03-04 17:57:16 +05:30
Dhairya Gandhi
5a4f1932a6 closes #1071 2020-03-04 17:22:45 +05:30
bors[bot]
df3f904f7c
Merge #1072
1072: update freeze docs r=CarloLucibello a=CarloLucibello



Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-03-04 03:47:45 +00:00
CarloLucibello
12106ff4cc update freeze docs 2020-03-04 04:45:41 +01:00
bors[bot]
94ba1e8ede
Merge #1028 #1070
1028: Common questions answered in docs r=CarloLucibello a=dhairyagandhi96

cc @MikeInnes 

1070: Prevent breakage due to new `active` field in normalise layers r=CarloLucibello a=ianshmean

Prevents breakage where the normalise structs, such as `BatchNorm`, have been directly defined but missing the new `active` field

cc. @darsnack 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Ian <i.r.butterworth@gmail.com>
2020-03-04 00:10:39 +00:00
bors[bot]
af23a5756c
Merge #1053
1053: Added Some Loss functions with some doc improvements r=CarloLucibello a=AdarshKumar712

Added the following loss functions with tests:
1. mae
2. mean squared logarithmic error
3. huber loss
4. squared hinge loss
5. dice coeff loss
6. tversky loss 

Also added some documentation improvements for few other functions. 

Co-authored-by: Adarsh Kumar <45385384+AdarshKumar712@users.noreply.github.com>
2020-03-03 23:56:21 +00:00
Ian
61f66e3dcd remove unnecessary helper for AlphaDropout 2020-03-03 13:20:02 -05:00
Ian
078ad7dd50 bump version to 0.10.3 2020-03-03 13:05:23 -05:00
Ian
d63fcf2cb4 add depreciation reminder 2020-03-03 13:05:03 -05:00
Ian
d9ea5fba76 add active helpers for other normalise layers 2020-03-03 11:55:39 -05:00
Ian
0def352383 Prevent breakage due to new active field in BatchNorm 2020-03-03 11:49:34 -05:00
bors[bot]
19a034b215
Merge #1069
1069: Updated activation functions in NNlib doc r=dhairyagandhi96 a=AdarshKumar712



Co-authored-by: Adarshkumar712 <Adarshkumar712.ak@gmail.com>
2020-03-03 12:39:03 +00:00
Adarshkumar712
d0e8a9ff71 Updated activation functions in NNlib doc 2020-03-03 22:07:05 +05:30
Adarsh Kumar
6e5c18bddf
Updated loss functions 2020-03-03 16:02:57 +05:30
bors[bot]
4acc907723
Merge #1065
1065: update documenter r=CarloLucibello a=CarloLucibello



Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-03-03 07:20:03 +00:00
bors[bot]
df73b8b8fb
Merge #1064
1064: Include cuda/cuda.jl during precompilation? r=CarloLucibello a=ianshmean

Loading `cuda/cuda.jl` at run-time during `__init__()` seems to be causing issues with PackageCompiler. (see error at bottom).

I'm wondering the cost of loading `cuda/cuda.jl` is negligible enough to just do it in all cases and get it precompiled. Setting `Flux.use_cuda[]` would stil be used  for switching cuda on or off. 

Load time in 1.3.1 on my mac (without cuda):

This PR:
```
julia> @time using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 37.313982 seconds (56.30 M allocations: 2.822 GiB, 2.52% gc time)
...
julia> @time using Flux
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 22.111054 seconds (52.93 M allocations: 2.663 GiB, 3.99% gc time)
```
Master:
```
julia> @time using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 35.750143 seconds (53.73 M allocations: 2.698 GiB, 2.51% gc time)
...
julia> @time using Flux
[ Info: CUDAdrv.jl failed to initialize, GPU functionality unavailable (set JULIA_CUDA_SILENT or JULIA_CUDA_VERBOSE to silence or expand this message)
 26.267999 seconds (52.92 M allocations: 2.660 GiB, 3.67% gc time)
```


I didn't make `include("cuda/cuda.jl")` dependent  on `CuArrays.functional()` because I guess there could be a case where, say, a user doesn't have cuda installed, loads Flux, installs cuda, reloads Flux.. where the 2nd time the package isn't re-precompiled.

The PackageCompiler error, which doesn't happen every time. It just seems that the runtime loading of cuda.jl  may be introducing dep tracking issues (?)
```
┌ Warning: Package Zygote does not have InteractiveUtils in its dependencies:
│ - If you have Zygote checked out for development and have
│   added InteractiveUtils as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Zygote
└ Loading InteractiveUtils into Zygote from project dependency, future warnings for Zygote are suppressed.
fatal: error thrown and no exception handler available.
#<null>
require at ./loading.jl:905
_jl_invoke at /home/ian/Documents/julia-kf-31156/src/gf.c:2161 [inlined]
jl_apply_generic at /home/ian/Documents/julia-kf-31156/src/gf.c:2328
jl_apply at /home/ian/Documents/julia-kf-31156/src/julia.h:1695 [inlined]
call_require at /home/ian/Documents/julia-kf-31156/src/toplevel.c:399 [inlined]
eval_import_path at /home/ian/Documents/julia-kf-31156/src/toplevel.c:436
eval_import_from at /home/ian/Documents/julia-kf-31156/src/toplevel.c:557
jl_toplevel_eval_flex at /home/ian/Documents/julia-kf-31156/src/toplevel.c:646
jl_eval_module_expr at /home/ian/Documents/julia-kf-31156/src/toplevel.c:181
jl_toplevel_eval_flex at /home/ian/Documents/julia-kf-31156/src/toplevel.c:640
jl_parse_eval_all at /home/ian/Documents/julia-kf-31156/src/ast.c:907
jl_load_rewrite at /home/ian/Documents/julia-kf-31156/src/toplevel.c:872
include at ./Base.jl:380
include at ./Base.jl:368 [inlined]
include at /home/ian/.julia/packages/Flux/p8ZLv/src/Flux.jl:1 [inlined]
__init__ at /home/ian/.julia/packages/Flux/p8ZLv/src/Flux.jl:56
jfptr___init___22072 at /home/ian/Documents/MyPackage.jl/dev/compilation/MyPackageSysImage.so (unknown line)
_jl_invoke at /home/ian/Documents/julia-kf-31156/src/gf.c:2161 [inlined]
jl_apply_generic at /home/ian/Documents/julia-kf-31156/src/gf.c:2328
jl_apply at /home/ian/Documents/julia-kf-31156/src/julia.h:1695 [inlined]
jl_module_run_initializer at /home/ian/Documents/julia-kf-31156/src/toplevel.c:74
_julia_init at /home/ian/Documents/julia-kf-31156/src/init.c:788
unknown function (ip: 0x5594b1667f)
__libc_start_main at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x5594b16733)
unknown function (ip: 0x5594b16733)
```

Co-authored-by: Ian <i.r.butterworth@gmail.com>
2020-03-03 07:07:54 +00:00
CarloLucibello
af99ca27ee docs update 2020-03-03 07:52:20 +01:00
Adarsh Kumar
92e09e204d
Test argument consistency with ŷ and y 2020-03-02 20:33:12 +05:30
Adarsh Kumar
2f05094068
Added consistency with ŷ and unicode chars 2020-03-02 20:00:47 +05:30
CarloLucibello
f5da4d0c70 remove docs manifest 2020-03-02 15:10:08 +01:00
CarloLucibello
ffea8b616d fix docs 2020-03-02 15:08:37 +01:00
CarloLucibello
e51070bf79 update documenter 2020-03-02 15:08:37 +01:00
bors[bot]
ddab979ea9
Merge #1066
1066: fix travis for documentation build r=CarloLucibello a=johnnychen94

The previous build doesn't trigger the documentation stage because the matrix doesn't get expanded for the sole job.

Not very clear how Travis reads the config but this change fixes the issue.

😕 weird that it doesn't allow failures on nightly here... The one in my fork works as expected. https://github.com/johnnychen94/Flux.jl/runs/479502998

cc: @CarloLucibello

Co-authored-by: Johnny Chen <johnnychen94@hotmail.com>
2020-03-02 12:29:20 +00:00
Johnny Chen
f30267e037
bring back test on custom Manifest.toml 2020-03-02 20:14:43 +08:00
Johnny Chen
224ec728ac
fix travis for documentation build 2020-03-02 20:05:56 +08:00
Adarsh Kumar
5565250c28
Updated test for tversky 2020-03-02 13:46:33 +05:30
Adarsh Kumar
89d07c07ec
Added Loss functions to docs 2020-03-02 13:33:44 +05:30
Adarsh Kumar
f9e31a020c
Updated huber_loss with other minute changes 2020-03-02 13:25:23 +05:30
Dhairya Gandhi
cbb9a2a929
Merge branch 'master' into dg/params_docs 2020-03-02 12:45:30 +05:30
Dhairya Gandhi
bb5350591f cleanup 2020-03-02 12:42:33 +05:30
Dhairya Gandhi
27949693f3 refactor 2020-03-02 12:40:19 +05:30
bors[bot]
be38146ee9
Merge #1061
1061: fix a few typos in docstrings r=CarloLucibello a=visr



Co-authored-by: Martijn Visser <mgvisser@gmail.com>
2020-03-02 01:03:58 +00:00
bors[bot]
6575fb8f48
Merge #1057
1057: add Julia ecosystem doc section r=CarloLucibello a=CarloLucibello

Partially fixing #251,  related to the discussion in #1051 .

Not exactly a poem that I wrote here, maybe someone could suggest a better rephrasing. 
Suggestion for additional packages to add to the list also welcome

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-03-02 00:52:22 +00:00
Ian
7555e488c6 tweaks 2020-03-01 19:40:03 -05:00
Ian
9b2f4919ee includ cuda/cuda.jl during precompile, even if cuda isn't detected 2020-03-01 19:33:23 -05:00
bors[bot]
3cf131b8de
Merge #1062
1062: docstring ensure signature code formatting r=CarloLucibello a=visr

by using a four space indent instead of two

Fixes issues seen here:

![image](https://user-images.githubusercontent.com/4471859/75627427-54aa6600-5bd0-11ea-93d3-92901d44db59.png)

Where the type signature has no code formatting, and a code block is introduced that throws off the rest of the formatting.

Co-authored-by: Martijn Visser <mgvisser@gmail.com>
2020-03-01 22:28:10 +00:00
bors[bot]
069d228693
Merge #1044
1044: Add testmode! back for normalization layers r=CarloLucibello a=darsnack

Fixed #909 

I added `testmode!(m, mode)` back to Flux as per v0.9. Now the `mode` can be `false`, `true`, or `:auto`/`nothing` with the default being `:auto` for newly constructed layers. In `:auto` mode, the `istraining()` functions added in v0.10 are used to determine whether we are evaluating within an AD trace or not.

Also plan on adding a doc section in an additional commit.

Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
2020-03-01 19:14:07 +00:00
Kyle Daruwalla
e49d9c4537 Debump version 2020-03-01 13:11:07 -06:00
Kyle Daruwalla
88cad1c5e7 Bump minor version to v0.10.3 2020-03-01 12:50:49 -06:00
Kyle Daruwalla
23f791e32b Add "during X phase" phrasing to testmode!/trainmode! docstring. 2020-03-01 12:49:30 -06:00
Kyle Daruwalla
35e460b044 Fixed broken @ref in docstring 2020-03-01 12:44:36 -06:00
Kyle Daruwalla
4cebf36361
Merge branch 'master' into feature/istraining 2020-03-01 12:32:15 -06:00
Kyle Daruwalla
c001d0f3c5 Added trainmode! and updated docs with warning 2020-03-01 12:30:41 -06:00
Martijn Visser
d67a2e40b3 remove stray code block start from docstring 2020-03-01 15:20:40 +01:00
Martijn Visser
f4365dab94 fix docstring example indentation as well 2020-03-01 15:19:22 +01:00
Martijn Visser
32e0aa9fcb docstring ensure signature code formatting
by using a four space indent instead of two
2020-03-01 15:15:39 +01:00
Martijn Visser
6076847a45 fix a few typos in docstrings 2020-03-01 15:07:12 +01:00
Adarsh Kumar
08dabce57e
Updated loss function docs 2020-03-01 12:00:11 +05:30
Adarsh Kumar
57c1b67d08
Merge branch 'master' into patch-1 2020-03-01 11:49:33 +05:30
Kyle Daruwalla
568ecb1c97 Removed trainmode from tests 2020-02-29 16:25:18 -06:00
Kyle Daruwalla
5cbd2cecf2 Changed testmode! to return model 2020-02-29 16:09:59 -06:00
bors[bot]
77a7606dad
Merge #1051
1051: add DataLoader r=CarloLucibello a=CarloLucibello

Fix #450 

This adds a DataLoader type, largely adapted from the Knet one, therefore pinging @denizyuret to check if he is cool with this. Unfortunately, I cannot "unsee" his implementation, and in any case any reasonable alternative implementation will be pretty much similar I guess. 

This is an initial implementation to get things going, possibly in the future we will also want a distributed and out-of-memory option as the one implemented by @staticfloat here
https://github.com/FluxML/Metalhead.jl/blob/sf/training/training/ImageNet/dataset.jl



Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-02-29 19:27:27 +00:00
CarloLucibello
a1efc434c2 fix typo 2020-02-29 19:40:44 +01:00
CarloLucibello
a72258ea2a fix rebase 2020-02-29 18:55:49 +01:00
CarloLucibello
97141e8c98 improve docstring 2020-02-29 18:51:00 +01:00
CarloLucibello
487002878e restrict train! special casing 2020-02-29 18:51:00 +01:00
CarloLucibello
b6c79b38b4 add DataLoader
special case train! for the unsupervised data iterator
2020-02-29 18:50:59 +01:00
bors[bot]
37af9fb15c
Merge #1023
1023: Feature: Added Boston Housing Dataset r=CarloLucibello a=pranjaldatta

[Boston Housing Dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/housing/) is one of the most common datasets that are used by beginners. It is as popular as other datasets like Iris etc. Hence, it feels only natural that this dataset is a part of Flux.

Added src/data/housing.jl: code for downloading and loading the dataset
Edited src/data/Data.jl: To include and export housing.jl
Edited test/data.jl: Added test for the dataset.

*All tests in test/data.jl are passing*

Co-authored-by: pranjaldatta <pranjaldatta99@gmail.com>
Co-authored-by: Pranjal  Datta <pranjaldatta99@gmail.com>
2020-02-29 15:54:34 +00:00
CarloLucibello
4f693e02cb add model zoo reference 2020-02-29 13:50:23 +01:00
CarloLucibello
4109f2e0d7 cleanup 2020-02-29 13:45:17 +01:00
CarloLucibello
169ed6eb25 add ecosystem 2020-02-29 13:43:03 +01:00
bors[bot]
81a55a0c9e
Merge #1041
1041: add NNlib docs + misc docs improvements r=CarloLucibello a=CarloLucibello

Partially addressing https://github.com/FluxML/NNlib.jl/issues/137.

Also, I'm leaving out the `σ`  activation and using its alias `sigmoid`, since `σ` conveys little information and it is also used to denote a generic activation in the Dense layer. I think we should deprecate `σ` in NNlib, has this been discussed already?

In an ideal world, before merging this, we should get NNlib to either unexport or add docs to its undocumented exports  

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-02-29 10:24:39 +00:00
Carlo Lucibello
425fcdbe69 NNlib docs + misc docs improvements 2020-02-29 11:14:48 +01:00
bors[bot]
2dd23574c0
Merge #998
998: test restructure on the GPU r=CarloLucibello a=ChrisRackauckas

Requires https://github.com/FluxML/Zygote.jl/pull/474 to pass

Co-authored-by: Chris Rackauckas <accounts@chrisrackauckas.com>
2020-02-29 09:08:11 +00:00
Adarsh Kumar
8afed01345
Apply suggestions from code review
Co-Authored-By: David Lung <lungd@users.noreply.github.com>
2020-02-27 23:23:53 +05:30
Dhairya Gandhi
35f6998be7 pkg up 2020-02-27 22:19:06 +05:30
Adarsh Kumar
9dce623214
Updated Msle loss 2020-02-27 16:26:17 +05:30
Dhairya Gandhi
a121742f9c pkg up 2020-02-27 13:56:05 +05:30
Adarsh Kumar
3d8965230f
Added tests for dice and Tversky loss 2020-02-27 02:29:39 +05:30
Adarsh Kumar
980ce72914
Added tversky and dice loss 2020-02-27 02:00:28 +05:30
bors[bot]
531d3d4d8b
Merge #1052
1052: update docs and export update! r=dhairyagandhi96 a=CarloLucibello

Fix #951 

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-02-26 19:33:53 +00:00
CarloLucibello
759fe9df2f update docs and export update! 2020-02-26 20:27:39 +01:00
Dhairya Gandhi
20e78e274e docs fix 2020-02-26 22:41:45 +05:30
Dhairya Gandhi
cf82393ae8 type signatures 2020-02-26 22:36:25 +05:30
Dhairya Gandhi
cd931793ef more docs and constructors 2020-02-26 22:29:14 +05:30
Dhairya Gandhi
58211e31bd docs improve 2020-02-26 22:22:11 +05:30
Dhairya Gandhi
f889d0c4d4 add kwarg constructors 2020-02-26 22:19:17 +05:30
Pranjal Datta
90bb3205f4
Merge pull request #2 from pranjaldatta/housing_added
added newlines  at end of file
2020-02-26 15:08:37 +05:30
pranjaldatta
569021a9f1 added newlines at end of file 2020-02-26 15:05:23 +05:30
Kyle Daruwalla
ba5259a269 Added docs on testmode! 2020-02-25 13:53:49 -06:00
bors[bot]
55616afc11
Merge #960
960: Added utility function outdims to compute output dimensions of a layer r=dhairyagandhi96 a=darsnack

Based on Slack chatter, I added a utility function, `outdims`, that computes the output dimensions for given input dimensions.

Example
```julia
layer = Conv((3, 3), 3 => 16)
outdims(layer, (10, 10)) # returns (8, 8)
```

Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
2020-02-25 17:40:05 +00:00
Tim Besard
4ed7d984db Adapt to CuArrays ArrayStyle changes. 2020-02-25 14:09:03 +01:00
Dhairya Gandhi
7e58766467
Merge pull request #1047 from MotJuMi/master
Edit description of convolutional layer
2020-02-25 15:39:23 +05:30
Bulat Suleymanov
db4eaf254b
Edit description of convolutional layer 2020-02-24 13:16:51 +05:00
Dhairya Gandhi
34ceed5c1f
Merge pull request #1046 from ianshmean/patch-1
Bump Colors compat to include 0.10, 0.11
2020-02-24 10:41:49 +05:30
Ian Butterworth
6ced7e1ecf
expand Colors compat 2020-02-23 13:42:11 -05:00
Kyle Daruwalla
924b8f49ec Updated to place function definitions in the appropriate places. 2020-02-21 15:10:28 -06:00
Kyle Daruwalla
7c12af065a Added testmode! functionality back to normalization layers. 2020-02-21 14:35:10 -06:00
Kyle Daruwalla
f5b9cf659c Updated docs to specify exactly what layers support outdims 2020-02-20 23:38:56 -06:00
Dhairya Gandhi
88b0c65d72
Merge pull request #1035 from matsueushi/remove_get_macro
Remove get! macro
2020-02-20 12:58:16 +05:30
Dhairya Gandhi
8f7a0bb264
Merge pull request #1030 from JuliaTagBot/master
Install TagBot as a GitHub Action
2020-02-19 21:47:31 +05:30
Dhairya Gandhi
a38af748e5
Merge pull request #1037 from heliosdrm/heliosdrm-patch-1
update compat to Juno 0.8
2020-02-19 21:46:33 +05:30
bors[bot]
e4a84c120f
Merge #1021
1021: nograd for onecold, onehot, onehotbatch r=MikeInnes a=CarloLucibello

fixes #1020 

Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
2020-02-17 14:12:48 +00:00
Helios De Rosario
9bb388d953
update Juno compat 2020-02-16 18:29:18 +01:00
Helios De Rosario
6f0710d364
Merge pull request #1 from FluxML/master
update to origin
2020-02-16 18:27:35 +01:00
Dhairya Gandhi
26631e1361 test_broken AlphaDropout 2020-02-16 21:22:37 +05:30
Viral B. Shah
0b8d1574bf
Merge pull request #984 from aminya/CompatHelper
Adding CompatHelper
2020-02-16 09:44:09 -05:00
matsueushi
6ea7b95384 Remove unused using 2020-02-15 20:06:15 -05:00
Dhairya Gandhi
d5ed9a4478
Update docs/src/models/basics.md
Co-Authored-By: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-02-12 11:26:11 +05:30
Dhairya Gandhi
ee6d950696
Update docs/src/models/basics.md
Co-Authored-By: Carlo Lucibello <carlo.lucibello@gmail.com>
2020-02-12 11:25:50 +05:30
bors[bot]
fe85a38d78 Merge #1032
1032: Remove outdated reference to truncate! r=dhairyagandhi96 a=mcognetta



Co-authored-by: Marco <mcognetta@users.noreply.github.com>
2020-02-10 08:30:15 +00:00
Marco
ae0455517a Remove outdated reference to truncate! 2020-02-10 00:03:11 -08:00
Kyle Daruwalla
c37fc3cfa6 Recommitting to trigger build 2020-02-09 19:45:04 -06:00
Julia TagBot
d7b20d1a78 Install TagBot as a GitHub Action 2020-02-08 20:02:52 +07:00
Dhairya Gandhi
37d58e16dd common questions answered in docs 2020-02-08 16:33:18 +05:30
Pranjal Datta
d1522deee4
Merge pull request #1 from pranjaldatta/housing_added
Feature: Added Boston Housing Dataset
2020-02-07 04:01:00 +05:30
pranjaldatta
197a1a70c0 added BostonHousing dataset and testing 2020-02-07 03:47:19 +05:30
CarloLucibello
6499344af3 nograd for onecold, onehot, onehotbatch 2020-02-06 15:41:46 +01:00
Adarsh Kumar
659ba074d1
Updated test for msle 2020-02-06 01:21:51 +05:30
Adarsh Kumar
7710bb0b4b
Removed spurious promotions 2020-02-06 01:06:41 +05:30
Adarsh Kumar
b5184553d4
Error correction in mae 2020-02-05 23:32:55 +05:30
Adarsh Kumar
44a977b7a4
Added tests for new loss functions 2020-02-05 23:20:06 +05:30
Adarsh Kumar
643086c8db
Updated squared_hinge 2020-02-05 22:40:07 +05:30
Adarsh Kumar
7ac647a7ac
Added loss functions 2020-02-05 22:29:15 +05:30
bors[bot]
60043fa2aa
Merge #1013
1013: Adapt to GPUArrays/CuArrays changes r=dhairyagandhi96 a=maleadt

Changes in response to a29df67184 and https://github.com/JuliaGPU/CuArrays.jl/pull/576. I suppose the next CuArrays release will need to be breaking because of this.

Maybe the `crossentropy` signature needs to be adjusted to support integer vectors, but I'll leave that decision up to Flux developers. This at least is the quick fix to get the tests passing again.

Co-authored-by: Tim Besard <tim.besard@gmail.com>
2020-02-03 16:29:48 +00:00
Dhairya Gandhi
ddc2c20e68
Merge pull request #994 from FluxML/ox/doccustomtraining
Add custom training loops to docs
2020-02-01 11:13:54 +05:30
Dhairya Gandhi
bc20103ea6 no-op copy 2020-01-31 13:23:33 +05:30
Tim Besard
e2c2ec5575 Don't invoke GPU crossentropy with integers.
Broadcasting log on integers does not work.
2020-01-31 08:22:54 +01:00
Tim Besard
e66a7f130f Don't compare CPU with GPU arrays. 2020-01-31 08:22:21 +01:00
Dhairya Gandhi
b9fbee1ff0 ::typeof(op) -> op 2020-01-31 12:24:36 +05:30
Dhairya Gandhi
620cffc45c
Merge pull request #1008 from FluxML/tb/cuindex
Remove unused imports.
2020-01-29 18:52:53 +05:30
Tim Besard
d88f63adb4 Remove unused imports. 2020-01-29 12:15:41 +01:00
Chris Rackauckas
9803826a36 test restructure on the GPU
Requires https://github.com/FluxML/Zygote.jl/pull/474
2020-01-20 13:53:28 -05:00
Dhairya Gandhi
29ab410794 test gradients are allocated on the gpu 2020-01-17 15:52:26 +05:30
Lyndon White
7797e31b44
Add custom training loops to docs 2020-01-16 21:57:59 +00:00
bors[bot]
d1edd9b16d
Merge #680
680: Added new loss functions. r=thebhatman a=thebhatman

I have added the KL Divergence Loss function, Poisson loss function, Logcosh loss, and Hinge loss function.

Co-authored-by: Manjunath Bhat <manjunathbhat9920@gmail.com>
Co-authored-by: thebhatman <manjunathbhat9920@gmail.com>
2020-01-13 15:46:25 +00:00
Manjunath Bhat
747e01ea02
Test to check for spurious promotions 2020-01-13 18:33:30 +05:30
Dhairya Gandhi
048c31f609 bump Flux version to v0.10.1 2020-01-13 18:16:29 +05:30
bors[bot]
f7f0ebbffd
Merge #992
992: Compat bounds for a couple more packages r=dhairyagandhi96 a=dhairyagandhi96

adds compatibility bounds for a few more packages

cc @MikeInnes 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-01-13 12:29:05 +00:00
Dhairya Gandhi
cd4626d5a7 compat bounds for a couple more packages 2020-01-13 17:38:59 +05:30
bors[bot]
2b222b15fa
Merge #991
991: Update CuArrays + Zygote deps  r=dhairyagandhi96 a=dhairyagandhi96

cc @MikeInnes 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2020-01-13 11:14:21 +00:00
Dhairya Gandhi
e1698e6617 up cuarrays 2020-01-13 16:18:20 +05:30
Dhairya Gandhi
e2a97aec24 up cuda+zygote deps 2020-01-13 16:16:24 +05:30
Dhairya Gandhi
de40476beb doc tests on julia 1.3 2020-01-13 14:10:34 +05:30
Dhairya Gandhi
d7953ff573 test on julia 1.3+ 2020-01-13 13:45:40 +05:30
Dhairya Gandhi
da9f295a8e bump version to 10.1 2020-01-13 13:41:25 +05:30
Dhairya Gandhi
370fd978fa
Merge pull request #986 from FluxML/restructure
Destructure/restructure for models
2020-01-13 13:04:48 +05:30
Dhairya Gandhi
58a7941386 reduce bors timeout 2020-01-13 11:24:04 +05:30
Dhairya Gandhi
0411b9a3e8 rm second slash 2020-01-12 17:35:04 +05:30
Mike Innes
f96270c213 free zygote 2020-01-09 17:16:41 +00:00
Mike J Innes
17732e7023 restructure; closes #747 2020-01-06 11:53:47 +00:00
aminya
f00b532556 Adding CompatHelper 2020-01-06 03:17:25 +03:30
Dhairya Gandhi
e92da0cf85
Merge pull request #973 from FluxML/sf/nnpack_tolerance
Give `NNPACK` a bit of numerical leeway
2019-12-23 15:57:56 +05:30
Elliot Saba
0fdcc00923 Give NNPACK a bit of numerical leeway 2019-12-23 01:31:26 -08:00
Dhairya Gandhi
b1e68813a8 cpu -> test_throws 2019-12-20 23:02:44 +05:30
Viral B. Shah
8a1e2f19d7
Update README.md 2019-12-19 09:44:17 -05:00
Dhairya Gandhi
efa2cbfd0e checkin Manifest#master 2019-12-11 14:13:41 +05:30
Kyle Daruwalla
2f854bdfc0 Recommitting to trigger new build 2019-12-10 09:57:08 -06:00
Dhairya Gandhi
ac4c49b63e
Merge pull request #954 from FluxML/decaydocs
[WIP] Decaydocs
2019-12-10 12:11:23 +05:30
Dhairya Gandhi
a72ca2b05d fix args 2019-12-09 23:18:01 +05:30
Dhairya Gandhi
894c075b6d rm Zeros setindex 2019-12-09 21:40:58 +05:30
Dhairya Gandhi
f39e184814 rm Zeros warning 2019-12-09 21:07:30 +05:30
Manjunath Bhat
8a93be8c6c
Change loss to cost 2019-12-09 20:39:46 +05:30
Kyle Daruwalla
04991d3261 Added entry to docs for outdims 2019-12-07 14:06:11 -06:00
Kyle Daruwalla
0cdd11c0dc Added tests for varying padding, stride, and dilation with outdims. 2019-12-07 14:05:50 -06:00
Kyle Daruwalla
a64378b112 Switched to using NNlib for conv.jl outdims. 2019-12-07 13:21:26 -06:00
Kyle Daruwalla
6265b1fa39 Added tests for outdims 2019-12-05 22:54:25 -06:00
Kyle Daruwalla
31dda0ce6c Updated with all basic and conv layers outdims 2019-12-05 21:57:10 -06:00
Dhairya Gandhi
9b6155c77d
Merge branch 'master' into dg/gradtests 2019-12-05 18:17:47 +05:30
Dhairya Gandhi
76dc8ea9d4 formatting fixes 2019-12-05 18:14:04 +05:30
Dhairya Gandhi
717ad9328d add some grad tests on GPU 2019-12-05 18:12:23 +05:30
DrChainsaw
755536bf5e Merge remote-tracking branch 'upstream/master' into samepad 2019-12-04 23:45:03 +01:00
Kyle Daruwalla
b4ed16ad9c Added outdims for some basic layers 2019-12-03 22:48:48 -06:00
Kyle Daruwalla
9279d79e63
Merge pull request #1 from FluxML/master
Updating to upstream master
2019-12-03 21:09:35 -06:00
Fredrik Bagge Carlson
e67f09c06d Correct some comments in decay docs 2019-12-03 15:32:23 +08:00
Fredrik Bagge Carlson
6e94e59afd Improve docs for decay optimisers 2019-12-03 15:27:44 +08:00
Mike J Innes
f46b5243db
Merge pull request #946 from FluxML/pkg-up
compat, pkg up
2019-11-29 12:55:47 +00:00
Mike J Innes
0c99f7f4b7 Merge branch 'dg/news' into pkg-up 2019-11-29 10:42:28 +00:00
Dhairya Gandhi
4b63e69b65 bump version to v0.10 2019-11-29 00:02:59 +05:30
Dhairya Gandhi
8519833d17 Merge branch 'dg/news' of https://github.com/FluxML/Flux.jl into dg/news 2019-11-28 23:57:30 +05:30
Dhairya Gandhi
73d572b1a9 rm RADAM 2019-11-28 23:57:01 +05:30
Mike Innes
b65b491e51 compat, pkg up 2019-11-28 16:23:22 +00:00
Dhairya Gandhi
c17dc34e38
phew
Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com>
2019-11-28 21:49:34 +05:30
Dhairya Gandhi
1ae554d82c rm new line 2019-11-28 21:47:37 +05:30
Dhairya Gandhi
4481c74f50 v0.10 changes 2019-11-28 21:45:06 +05:30
Mike J Innes
75d609ecc8
Update README.md 2019-11-28 16:00:55 +00:00
Mike J Innes
99f98ca800
Update README.md 2019-11-28 16:00:21 +00:00
Tim Besard
ab450477f3
Merge pull request #944 from FluxML/rnn-fix
RNN failure hackaround
2019-11-27 16:06:13 +01:00
Dhairya Gandhi
ec872bb579 test that bias has no grads with Zeros 2019-11-27 19:45:04 +05:30
Dhairya Gandhi
245563077b cleaner API 2019-11-27 19:40:58 +05:30
Mike Innes
1c0e9acc45 Update CuArrays to include the workspace fix. 2019-11-27 14:31:03 +01:00
bors[bot]
90a38a3201
Merge #937
937: Fix Glorot initialization, add He initialization r=MikeInnes a=Sleort

Should fix #442 .
Adds He weight initialization as a bonus :-)

Co-authored-by: Troels Arnfred Bojesen <tr-ab@online.no>
2019-11-26 16:17:06 +00:00
bors[bot]
fb4a48f970
Merge #943
943: Fixes #900 r=MikeInnes a=dhairyagandhi96

Thoughts on the test?

cc @MikeInnes

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2019-11-26 15:09:27 +00:00
Dhairya Gandhi
59bb0d81b0 add TODO 2019-11-26 16:23:09 +05:30
Mike J Innes
4c69b44a7c
Merge pull request #940 from matsueushi/feature/cuda-logitbc
Fix logitbinarycrossentropy on CuArrays
2019-11-26 10:18:07 +00:00
Dhairya Gandhi
c031ae1a94 correct channel value 2019-11-24 13:31:31 +05:30
Tim Besard
fbb377a7b4
Merge pull request #941 from FluxML/tb/include_during_precompile
Don't include the CUDA module during precompilation.
2019-11-24 08:55:43 +01:00
Dhairya Gandhi
5f21238d1a no grad dims helper 2019-11-24 13:25:02 +05:30
Tim Besard
4ece13c649 Don't include the CUDA module during precompilation.
If we do, we could end up replacing it at runtime.
2019-11-22 18:03:51 +01:00
matsueushi
a0314ce682 Fix logitbinarycrossentropy on CuArrays 2019-11-22 05:23:24 +00:00
Troels Arnfred Bojesen
3f97701d4c Merge branch 'HEAD' into weight_init_patch 2019-11-20 13:25:32 +09:00
Troels Arnfred Bojesen
60a29abaf1 Merge branch 'weight_init_patch' into HEAD 2019-11-20 13:25:19 +09:00
Troels Arnfred Bojesen
3b83828e4e Merge branch 'HEAD' into weight_init_patch 2019-11-20 13:24:48 +09:00
Troels Arnfred Bojesen
af96a197c1 Fix Glorot initialization
Should fix #442
2019-11-20 13:20:42 +09:00
Mike J Innes
5839e166f6
Merge pull request #860 from dsweber2/activations
Activations
2019-11-19 16:44:25 +00:00
Tim Besard
2fa3e5673e
Merge pull request #924 from FluxML/tb/cuda_init
CUDA package initialization improvements
2019-11-19 16:48:45 +01:00
Tim Besard
c45cec4cba Simplify warning. 2019-11-19 16:05:41 +01:00
Tim Besard
bd734ed957 Bump CUDA dependencies. 2019-11-19 15:55:25 +01:00
Tim Besard
69bf84278f Remove wrong warning. 2019-11-19 15:53:43 +01:00
Mike J Innes
4f73e434a4
Merge pull request #935 from baggepinnen/patch-4
Fix AMSGrad on GPU
2019-11-19 12:58:37 +00:00
Troels Arnfred Bojesen
2b80573248 Fix Glorot initialization, add He initialization
Should fix #442 .
Adds He weight initialization as a bonus :-)
2019-11-19 18:16:29 +09:00
bors[bot]
8638bcdcd7
Merge #936
936: Avoid unnecessary conversion r=MikeInnes a=baggepinnen

This initialization works for both cpu and gpu

Co-authored-by: Fredrik Bagge Carlson <baggepinnen@gmail.com>
2019-11-19 09:05:23 +00:00
Fredrik Bagge Carlson
2da22f31f0
Avoid unnecessary conversion
This initialization works for both cpu and gpu
2019-11-19 16:31:04 +08:00
Fredrik Bagge Carlson
df7ffb0ef8
Fix AMSGrad on GPU
The previous initialization created a CPU array. Now, the same type of array as `x` is created.
2019-11-19 16:27:44 +08:00
Dhairya Gandhi
eb41715d26 define manual rules 2019-11-19 13:30:33 +05:30
Troels Arnfred Bojesen
4530ac65c7 Fix Glorot initialization, add He initialization
Should fix the issue reported at https://github.com/FluxML/Flux.jl/issues/442 .
Adds He weight initialization as a bonus :-)
2019-11-19 16:50:40 +09:00
Mike J Innes
967cc1c175
Merge pull request #927 from heliosdrm/patch-1
Extend docs about `train!`
2019-11-18 12:22:16 +00:00
dsweber2
dea29532ef Merge branch 'master' into activations 2019-11-15 17:19:43 -08:00
Helios De Rosario
a0e3729679
Update docs/src/training/training.md
Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com>
2019-11-15 21:17:45 +01:00
dsweber2
20eb840882 keeping activations separate 2019-11-15 12:03:08 -08:00
bors[bot]
7eb6a0c98c
Merge #932
932: Travis: test on 1.0 r=MikeInnes a=MikeInnes



Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
2019-11-15 16:21:30 +00:00
Mike Innes
e24215ca98 guard test on 1.0 2019-11-15 15:59:42 +00:00
Mike J Innes
665e441919 pkg up 2019-11-15 12:12:28 +00:00
Mike J Innes
9d6f6fdaa3
Merge pull request #926 from janEbert/bc-cuda-fix
Fix binarycrossentropy on CuArrays
2019-11-15 13:05:52 +01:00
Mike J Innes
2471596cdb test on 1.0 2019-11-15 11:50:13 +00:00
dsweber2
89afa20410 Merge branch 'activations' of github.com:dsweber2/Flux.jl into activations 2019-11-14 14:09:27 -08:00
dsweber2
58c794702d simpler test 2019-11-14 14:05:53 -08:00
dsweber2
0fe3ac4e77 bring activations into function call 2019-11-14 13:40:52 -08:00
dsweber2
db92b0e3ce super simple test 2019-11-14 13:40:52 -08:00
dsweber2
6475f6a43e recursive way of doing activations 2019-11-14 13:40:52 -08:00
dsweber2
99679f7e16 deal with empty Chain 2019-11-14 13:40:52 -08:00
dsweber2
d0202a2945 adding the extra commits broke the accumulate version 2019-11-14 13:40:52 -08:00
dsweber2
cdaaca8cfa make activations zygote friendly 2019-11-14 13:40:29 -08:00
Helios De Rosario
ba4e3be0d3
explanations about params in train! 2019-11-14 16:22:31 +01:00
Helios De Rosario
074eb47246
Update training.md 2019-11-12 23:29:38 +01:00
Dhairya Gandhi
e89b8eba77 fixes 2019-11-13 01:12:26 +05:30
Helios De Rosario
7e1ffd6507
Extend docs about train!
Related to #921: explain why it is not needed to pass the model as argument.
2019-11-08 21:39:00 +01:00
DrChainsaw
453ecd1f24 Merge remote-tracking branch 'upstream/master' into samepad 2019-11-08 18:49:47 +01:00
janEbert
a00d8d94ec Add test for CUDA binarycrossentropy 2019-11-08 17:28:54 +01:00
janEbert
3dceef427f Fix binarycrossentropy on CuArrays 2019-11-08 16:48:11 +01:00
Dhairya Gandhi
a4a987f0b0 hook into bcasting 2019-11-07 16:53:41 +05:30
Tim Besard
9d05afaccc
Merge pull request #922 from FluxML/tb/backward
Restore Julia 1.0 compatibility.
2019-11-06 20:15:31 +01:00
Tim Besard
8a0745faab Restore Julia 1.0 compatibility. 2019-11-06 18:53:45 +01:00
bors[bot]
84d4ab083d
Merge #920
920: use release versions of packages r=MikeInnes a=MikeInnes

bors r+

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2019-11-06 12:23:44 +00:00
Mike J Innes
61078f3ef0 use release versions of packages 2019-11-06 12:23:12 +00:00
Tim Besard
08804a06d2
Merge pull request #916 from FluxML/tb/runtime_use_cuda
Check for CUDA availability at run time.
2019-11-06 09:46:39 +01:00
Tim Besard
c9f369de86 Update packages. 2019-11-06 07:53:20 +01:00
Tim Besard
6e8f8c1f46 Use latest GPU CI templates. 2019-11-04 16:41:57 +01:00
Tim Besard
916d3dabbd Bump Julia version. 2019-11-04 15:51:33 +01:00
Tim Besard
33d276cdb7 Fix GPU-less tests. 2019-11-04 15:51:33 +01:00
Tim Besard
dbcdf4d1bd Bump GPU packages. 2019-11-04 15:51:33 +01:00
Tim Besard
a82b76cf24 Conditionally include the CUDNN glue code. 2019-11-04 15:27:11 +01:00
Tim Besard
39ab740fb7 Check for CUDA availability at run time. 2019-11-02 11:18:06 +01:00
bors[bot]
7104fd9332
Merge #907
907: Change `gate` function to `view` instead of copy r=MikeInnes a=janEbert

This speeds up code with large inputs by quite a lot. I only added it to the function accepting an `AbstractVector` as input as copying matrices may be faster than viewing them due to caching (they are sliced per row so will the data will not necessarily have a low stride).

Co-authored-by: janEbert <janpublicebert@posteo.net>
2019-10-24 11:06:41 +00:00
janEbert
7b41bc4ab5 Change gate function to view instead of copy
Only for vector input as copying a matrix may be more efficient due to
caching. A matrix is sliced per row, meaning the view will not be
aligned.
2019-10-24 12:45:22 +02:00
Dhairya Gandhi
7c90fb469d use array to define Zeros 2019-10-23 20:02:15 +05:30
bors[bot]
645aa04464
Merge #898
898: Fix problem in crossentropy breaking GPU compilation r=MikeInnes a=kshyatt

Trying to run this simple example
```
using Flux, CuArrays
using Flux: crossentropy
model = Chain(
        Dense(728, 128, σ),
        LSTM(128, 256),
        LSTM(256, 128),
        Dense(128, 10),
        softmax) |> gpu
data = [rand(728) for i in 1:100];
out  = [rand(10) for i in 1:100];
loss(x, y) = crossentropy(model(x), y);
Flux.train!(loss, params(model), zip(gpu.(data), gpu.(out)), ADAM())
```
Old version of `crossentropy`:
```
ERROR: GPU compilation of #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}) failed
KernelError: passing and using non-bitstype argument

Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.  .args is of type Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}} which is not isbits.
    .1 is of type Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}} which is not isbits.
      .x is of type Array{Float32,1} which is not isbits.


Stacktrace:
 [1] check_invocation(::CUDAnative.CompilerJob, ::LLVM.Function) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/validation.jl:70
 [2] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/driver.jl:187 [inlined]
 [3] macro expansion at /mnt/home/khyatt/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:216 [inlined]
 [4] #codegen#136(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/driver.jl:186
 [5] #codegen at ./none:0 [inlined]
 [6] #compile#135(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/driver.jl:47
 [7] #compile#134 at ./none:0 [inlined]
 [8] #compile at ./none:0 [inlined] (repeats 2 times)
 [9] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:389 [inlined]
 [10] #cufunction#176(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::GPUArrays.var"#23#24", ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}}}) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:357
 [11] cufunction(::Function, ::Type) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:357
 [12] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:174 [inlined]
 [13] macro expansion at ./gcutils.jl:91 [inlined]
 [14] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:171 [inlined]
 [15] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,1}, ::Tuple{CuArray{Float32,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CuArray{Float32,1},Tuple{Bool},Tuple{Int64}}}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /mnt/home/khyatt/.julia/dev/CuArrays/src/gpuarray_interface.jl:60
 [16] gpu_call at /mnt/home/khyatt/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:151 [inlined]
 [17] gpu_call at /mnt/home/khyatt/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:128 [inlined]
 [18] copyto! at /mnt/home/khyatt/.julia/dev/GPUArrays/src/broadcast.jl:48 [inlined]
 [19] copyto! at ./broadcast.jl:863 [inlined]
 [20] copy at ./broadcast.jl:839 [inlined]
 [21] materialize at ./broadcast.jl:819 [inlined]
 [22] (::Zygote.var"#1310#1311"{CuArray{Float32,1},CuArray{Float32,1}})(::Array{Float32,1}) at /mnt/home/khyatt/.julia/dev/Zygote/src/lib/broadcast.jl:68
```
New version:
```
julia> Flux.train!(loss, params(model), zip(gpu.(data), gpu.(out)), ADAM())

julia> # everyone finished happily and went on with their lives
```

Co-authored-by: Katharine Hyatt <khyatt@flatironinstitute.org>
2019-10-23 14:31:53 +00:00
Katharine Hyatt
8913c9c741 Make the vector of weights test pass on GPU 2019-10-23 09:53:09 -04:00
Katharine Hyatt
f7ce717aaa Add tests 2019-10-23 09:22:22 -04:00
Katharine Hyatt
e0c1c0e057 Fix problem in crossentropy breaking GPU compilation 2019-10-22 14:00:57 -04:00
bors[bot]
fa5737fb5c
Merge #904
904: Documenting Optimiser Interface r=MikeInnes a=MikeInnes

I needed to add one extra commit to #875 before merging.

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
2019-10-22 12:38:19 +00:00
Mike Innes
7ead2d6c7b typo 2019-10-22 13:36:39 +01:00
Dhairya Gandhi
a9955fec8a correct train! syntax 2019-10-22 16:25:55 +05:30
Dhairya Gandhi
4a183aeaf0 make Zeros a dimensionlesss number 2019-10-22 16:11:27 +05:30
bors[bot]
b03f34dcb6
Merge #902
902: Backticks and examples for normalise r=MikeInnes a=kshyatt



Co-authored-by: Katharine Hyatt <khyatt@flatironinstitute.org>
2019-10-21 14:35:45 +00:00
Katharine Hyatt
b8b4bc48b9 Backticks and examples for normalise 2019-10-21 10:31:44 -04:00
DrChainsaw
530d4edb67 Fix for reading comprehension error (dim is not always 2 * (N-2)) Fix for ambiguous method sig 2019-10-20 16:03:01 +02:00
DrChainsaw
411ce5dbd8 Add SamePad for pooling layers 2019-10-20 13:43:39 +02:00
DrChainsaw
fc123d6279 Add SamePad for conv layers 2019-10-20 13:43:23 +02:00
Dhairya Gandhi
776023ddad fixes 2019-10-10 20:35:28 +05:30
Dhairya Gandhi
4477dd8d54 reviews 2019-10-10 20:27:11 +05:30
Dhairya Gandhi
a55878453c
typo
Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com>
2019-10-10 20:16:29 +05:30
Dhairya Gandhi
623ee2c29c
typo
Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com>
2019-10-10 20:16:00 +05:30
Dhairya Gandhi
f19066ee29 more docstrings 2019-10-10 16:48:12 +05:30
thebhatman
d591b2b59e Removed colon and capitalised 2019-10-09 21:36:40 +05:30
Dhairya Gandhi
fe52689cfe in depth docstrings 2019-10-09 16:16:11 +05:30
thebhatman
96a23c295c Changes to docs 2019-10-09 14:53:03 +05:30
dsweber2
3b7b780d39 super simple test 2019-10-08 23:04:31 -07:00
Dhairya Gandhi
c85bad4427 replace weight with filter 2019-10-08 20:26:09 +05:30
Dhairya Gandhi
49ea43e711 ZeroType => Zeros 2019-10-08 20:02:04 +05:30
bors[bot]
af0dcb2c63
Merge #882
882: Check if CUDA availability changed during init. r=MikeInnes a=maleadt

With this PR, Flux checks using CUDAapi if CUDA is available during initialization, and forces recompilation if that does not agree with what was decided during precompilation. This avoids the scenario where Flux was precompiled without GPU support, consequently not allowing use of the GPU even if the user fixed his CUDA/GPU set-up because that does not force recompilation (and we can't add precompilation dependencies on stuff that doesn't exist).

However, we can't do the same for the case where we have a GPU/CUDA but CuArrays fails to import (checking if it imports during `__init__` would be much too expensive, if even possible), so this PR removes support for having CUDA/a GPU but CuArrays being broken. That's a little risky now that Flux depends on CuArrays, but the package is pretty mature and I haven't seen many bug reports failing to load it recently.

Fixes https://github.com/FluxML/Flux.jl/pull/852#issuecomment-538028314

cc @MikeInnes @xukai92

Co-authored-by: Tim Besard <tim.besard@gmail.com>
2019-10-08 13:24:49 +00:00
Dhairya Gandhi
95c5845e99 document bias switch 2019-10-08 17:54:01 +05:30
Dhairya Gandhi
b596faaffa tests bias switch 2019-10-08 17:18:39 +05:30
Dhairya Gandhi
040697fb2b add bias and weight kwarg 2019-10-08 17:18:19 +05:30
Dhairya Gandhi
f3904b4e04 add ZeroType back 2019-10-08 17:17:36 +05:30
Dhairya Gandhi
a1e826b888 fixes 2019-10-06 05:10:56 +05:30
Dhairya Gandhi
214f71f492 add N 2019-10-06 04:55:33 +05:30
Dhairya Gandhi
2ae3ad3b31 doc fixes 2019-10-06 04:46:13 +05:30
Dhairya Gandhi
d00f833c17 rm ZeroType 2019-10-06 04:44:50 +05:30
Dhairya Gandhi
e97d61f257 fixes 2019-10-06 04:42:26 +05:30
Dhairya Gandhi
48a305bd21 ditto remaining layers 2019-10-06 04:41:06 +05:30
Dhairya Gandhi
55ef7c1aba add weight and bias kwargs 2019-10-06 04:25:23 +05:30
Dhairya Gandhi
b503741651 expanded docstrings 2019-10-04 14:46:03 +05:30
Tim Besard
8aea15e6e0 Demote to const variables. 2019-10-03 21:28:55 +02:00
Tim Besard
2369b2b3fd Add an environment variable to disable CUDA usage. 2019-10-03 21:27:54 +02:00
Tim Besard
63d196aa37 Check if CUDA availability changed during init. 2019-10-03 20:05:32 +02:00
thebhatman
ec886c8ce8 Added docstring for hinge loss 2019-10-03 21:13:09 +05:30
Dhairya Gandhi
1fe321781b add to docs 2019-10-01 21:29:18 +05:30
Dhairya Gandhi
dced8c04e5 use ZeroType 2019-10-01 21:25:07 +05:30
bors[bot]
0d3aa8fa5e
Merge #877
877: Fix functor's `params!` to work with complex numbers r=MikeInnes a=PhilipVinc

I believe you forgot to define `params!` for complex-valued arrays.

If I'm wrong, feel free to close this.

Co-authored-by: Filippo Vicentini <filippovicentini@gmail.com>
2019-10-01 15:11:55 +00:00
Manjunath Bhat
2b30319a55
Merge branch 'master' into patch-6 2019-09-30 21:05:02 +05:30
thebhatman
ec35e9cbaa Loss functions docs added in layers.md 2019-09-30 21:02:13 +05:30
thebhatman
6e289ef939 Merge branch 'patch-6' of https://github.com/thebhatman/Flux.jl into patch-6 2019-09-30 20:55:44 +05:30
Filippo Vicentini
606fe58854
Use <:Number 2019-09-29 12:33:02 +02:00
Filippo Vicentini
14e94c291e
Make it actually work 2019-09-29 12:28:01 +02:00
Filippo Vicentini
d91677f651
Fix params! to work with complex numbers 2019-09-29 12:23:41 +02:00
Dhairya Gandhi
8013c728b1 clearer optimiser docstrings 2019-09-28 16:09:00 +05:30
Dhairya Gandhi
0175485a80 fixup 2019-09-27 22:08:25 +05:30
Dhairya Gandhi
8bb0db7d0c opt docstrings 2019-09-27 22:04:53 +05:30
Dhairya Gandhi
32ac71734d optimiser interface docs 2019-09-27 21:43:59 +05:30
Dhairya Gandhi
a98a1b8bb5 fixes 2019-09-27 21:43:39 +05:30
bors[bot]
e2b93bc78a
Merge #874
874: Move CUDNN wrappers to CuArrays r=MikeInnes a=MikeInnes



Co-authored-by: Tim Besard <tim.besard@gmail.com>
Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
2019-09-27 14:05:37 +00:00
Mike Innes
b90b02872f Merge branch 'master' into tb/cuarrays_dnn 2019-09-27 14:58:32 +01:00
Mike Innes
e287982b78 use CuArrays master 2019-09-27 14:55:30 +01:00
Mike Innes
691a29cf32 cudnn bug is fixed 2019-09-27 14:15:58 +01:00
Dhairya Gandhi
a801fcb9e7 docstrings 2019-09-27 12:07:55 +05:30
Dhairya Gandhi
9f2ac8fdef ditto remaining conv layers 2019-09-27 12:04:27 +05:30
Dhairya Gandhi
5ea6a33f44 make bias optional 2019-09-27 11:48:12 +05:30
Mike Innes
46bc8e5e64 move pullbacks to CuArrays 2019-09-26 17:14:18 +01:00
bors[bot]
12bc06136d
Merge #870
870: Fix printing of SkipConnection r=MikeInnes a=mcabbott

Before:
```
julia> SkipConnection(Dense(2,2),+)
SkipConnection(Error showing value of type SkipConnection:
ERROR: MethodError: no method matching iterate(::Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}})

julia> SkipConnection(Chain(Dense(2,3), Dense(3,2), LayerNorm(2)),+)
SkipConnection(Dense(2, 3), Dense(3, 2), LayerNorm(2))

julia> SkipConnection(Dense(2, 3), Dense(3, 2), LayerNorm(2))
ERROR: MethodError: no method matching SkipConnection(::Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}}, ::Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}}, ::LayerNorm{TrackedArray{…,Array{Float32,1}}})
```
After:
```
julia> SkipConnection(Dense(2,2),+)
SkipConnection(Dense(2, 2), +)

julia> SkipConnection(Chain(Dense(2,3), Dense(3,2), LayerNorm(2)),+)
SkipConnection(Chain(Dense(2, 3), Dense(3, 2), LayerNorm(2)), +)

julia> SkipConnection(Dense(2,2), (a,b) -> a .+ b./2)
SkipConnection(Dense(2, 2), #9)
```

Co-authored-by: Michael Abbott <32575566+mcabbott@users.noreply.github.com>
2019-09-25 14:09:28 +00:00
Michael Abbott
806e0c5c57 line 2019-09-25 15:20:13 +02:00
Michael Abbott
4245d9acad eg 2019-09-25 15:18:40 +02:00
Michael Abbott
2de84ce79f simplify 2019-09-25 13:59:32 +02:00
Michael Abbott
1a1a96571a +Chain 2019-09-25 13:47:29 +02:00
Michael Abbott
19830c71b1 fix printing of SkipConnection 2019-09-25 13:37:01 +02:00
bors[bot]
acb6a89245
Merge #865
865: Functor r=MikeInnes a=MikeInnes

This refactors our current `@treelike` infrastructure. It somewhat formalises what we're doing around the idea of a Flux model as a functor, i.e. something that can be mapped over.

This is much more flexible than what we had before, and avoids some issues. It allows layers to have state that isn't mappable; it allows for dispatch when walking the tree, which means layers like `BatchNorm` can have non-trainable parameters; and it also allows for zipped mapping like `fmap(+, xs, ys)`, which isn't implemented yet but will be useful for the new optimisers work.

The main downside is that the term `functor` has been previously used in the Julia community as a malapropism for "thing that behaves like a function"; but hopefully this can start to reduce that usage.

Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
2019-09-24 16:36:10 +00:00
bors[bot]
d57636fd48
Merge #861
861: GPU CI maintainance  r=dhairyagandhi96 a=dhairyagandhi96



Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2019-09-24 16:06:13 +00:00
Dhairya Gandhi
ce910da948 compat julia v1.0 2019-09-24 17:04:13 +05:30
Dhairya Gandhi
cf593a5744 revert to custom target 2019-09-24 16:43:48 +05:30
Dhairya Gandhi
fe4ecc5880 trying out extending directly 2019-09-24 16:15:48 +05:30
Dhairya Gandhi
928b5dcc2a fix Zygote 2019-09-24 00:51:35 +05:30
Dhairya Gandhi
822288d63d merge conflicts 2019-09-24 00:31:44 +05:30
Dhairya Gandhi
d8a069b304 fix env 2019-09-24 00:28:52 +05:30
Dhairya Gandhi
98308a85ea add gitlab common yaml 2019-09-23 16:55:53 +05:30
Dhairya Gandhi
783ae137e1 remove targets and env 2019-09-23 16:51:11 +05:30
Dhairya Gandhi
6846551f57 fix cuda init 2019-09-22 22:02:05 +05:30
Dhairya Gandhi
787097f9ea use CuArrays#stable 2019-09-21 00:20:54 +05:30
Mike Innes
b60df53ba1 pkg up 2019-09-19 18:33:33 +01:00
Mike Innes
cabb81e30b internal rename 2019-09-19 15:53:31 +01:00
Mike Innes
b951377426 fix normalisation layer params 2019-09-19 15:33:24 +01:00
Mike Innes
6529dbcbe6 functor refactor 2019-09-19 15:22:11 +01:00
Mike Innes
2c71fc282b rename functor.jl 2019-09-19 14:15:28 +01:00
Mike Innes
f8d5d3b5fc broken normalisation layer params 2019-09-19 14:12:11 +01:00
Dhairya Gandhi
99b6fe57e9 extend test template 2019-09-18 12:32:11 +05:30
Dhairya Gandhi
37fe91d54d remove branch restrictions 2019-09-18 12:05:31 +05:30
Mike Innes
c5e56b7e04 move setweights and copy_transpose 2019-09-17 17:22:35 +01:00
Mike Innes
5baebf48f4 Merge branch 'master' into tb/cuarrays_dnn 2019-09-17 16:17:09 +01:00
Mike Innes
fc9db7ee74 pkg up 2019-09-17 15:49:48 +01:00
Mike Innes
368b1f53b4 tuple support 2019-09-17 15:49:39 +01:00
Mike Innes
b348b20452 cudnn rnns + implicit gradients 2019-09-17 15:41:42 +01:00
Mike Innes
fe57215b7e test fillarray gradients 2019-09-17 15:21:03 +01:00
Dhairya Gandhi
29eae312b8
Merge pull request #863 from Naba7/fix_typo
removed extra parenthesis
2019-09-14 11:43:20 +05:30
Naba7
a600a9ceed removed extra parenthesis 2019-09-14 10:56:17 +05:30
Tim Besard
6ea2557c46 Use correct CuArrays branch for CI. 2019-09-13 08:21:45 +02:00
Tim Besard
4942d7fcfd Move functionality over to CuArrays. 2019-09-13 08:21:45 +02:00
Tim Besard
1e7ff4f65d Query the worksize. 2019-09-13 08:04:05 +02:00
Tim Besard
04fce70019 Move low-level CUDNN wrappers to CuArrays. 2019-09-13 08:04:05 +02:00
dsweber2
46abfbbd5c recursive way of doing activations 2019-09-11 17:36:37 -07:00
Dhairya Gandhi
b8d872d842 update to Flux 0.9+ 2019-09-11 21:11:02 +05:30
Dhairya Gandhi
7ebb2cfac5 test on julia 1.2 2019-09-11 21:10:12 +05:30
Mike J Innes
bdeb9c6d58
Merge pull request #669 from FluxML/zygote
using Zygote
2019-09-11 16:22:26 +01:00
Dhairya Gandhi
e0276139e1
Update docs/src/training/optimisers.md
Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com>
2019-09-11 19:21:15 +05:30
Dhairya Gandhi
b6926f07a5 cleanup 2019-09-11 19:18:50 +05:30
Dhairya Gandhi
b08c949b99 fixes to saving 2019-09-11 14:25:46 +05:30
dsweber2
f41219133e deal with empty Chain 2019-09-10 10:46:56 -07:00
Dhairya Gandhi
6fd66fd3b5 Merge branch 'zygote' of https://github.com/FluxML/Flux.jl into zygote 2019-09-10 21:20:35 +05:30
Dhairya Gandhi
a9d1cbf07c added decays 2019-09-10 21:20:05 +05:30
Mike Innes
250aef5a5a normalise test fixes 2019-09-10 16:19:55 +01:00
Dhairya Gandhi
b6c8312796 optimiser docs 2019-09-10 20:49:15 +05:30
Mike Innes
877415be10 rm gradient checks 2019-09-10 15:35:52 +01:00
Mike Innes
221313c977 formatting changed on 1.1 2019-09-10 15:26:51 +01:00
Mike Innes
de2049450b docs mostly fixed 2019-09-10 15:17:07 +01:00
Mike Innes
ddf06af0b9 remove tracker docs 2019-09-10 15:03:08 +01:00
Mike Innes
c8d460ff84 doctests passing 2019-09-10 15:02:43 +01:00
dsweber2
1bb25dc1f9 adding the extra commits broke the accumulate version 2019-09-10 01:34:12 -07:00
dsweber2
bb84aeeb55 Merge branch 'activations' of github.com:dsweber2/Flux.jl into activations 2019-09-10 01:07:53 -07:00
dsweber2
82261b5bb7 make activations zygote friendly 2019-09-10 01:06:37 -07:00
Mosè Giordano
38790dd4db Restore purity 2019-09-10 01:06:37 -07:00
dsweber2
540b7366ec make activations zygote friendly 2019-09-10 00:54:49 -07:00
Mike J Innes
b8e06ef3b7
Merge pull request #857 from giordano/linguist-ignore-citation
Restore purity
2019-09-09 15:30:18 +01:00
Mosè Giordano
83b998c39d Restore purity 2019-09-08 16:15:35 +01:00
Mike J Innes
67c38b3099 Merge branch 'master' into zygote 2019-09-06 15:18:58 +01:00
thebhatman
ecc9ce9d64 Gradient on AlphaDropout now working 2019-09-06 16:34:19 +05:30
Mike J Innes
4ca320444e pkg up 2019-09-06 11:50:01 +01:00
Mike J Innes
3c1ac84676
Merge pull request #842 from baggepinnen/patch-4
Add RADAM optimizer
2019-09-02 14:36:40 +01:00
Manjunath Bhat
c3cc4bf966
Remove double docstring 2019-08-31 01:35:40 +05:30
thebhatman
2f1a187665 Update AlphaDropout 2019-08-31 01:28:58 +05:30
Fredrik Bagge Carlson
fe2e3c3e6b
Add RADAM news entry 2019-08-30 17:08:16 +08:00
Mike J Innes
7e8021422d
Update Project.toml 2019-08-29 14:40:36 +01:00
Fredrik Bagge Carlson
cb3bfd72f3
Export RADAM from Optimise 2019-08-29 07:46:45 +08:00
Mike J Innes
27934c3674
Merge pull request #852 from FluxML/tb/cuarrays_dep
RFC: Replace Requires with direct CuArrays dependency.
2019-08-27 15:56:16 +01:00
Mike J Innes
61a8cfd6ee libcudnn check fix 2019-08-27 15:41:23 +01:00
Mike J Innes
9cd97f06f7 define has_cuarrays when no cuda 2019-08-27 15:06:04 +01:00
Mike J Innes
9da32e5d78 pkg up 2019-08-27 15:04:20 +01:00
Tim Besard
4fef9d8508 Don't depend on unreleased CuArrays. 2019-08-27 09:40:22 +02:00
Tim Besard
6ad3cdd138 Replace Requires with direct CuArrays dependency. 2019-08-27 09:33:15 +02:00
bors[bot]
6494f73c78
Merge #847
847: Fix CuArrays.libcudnn imports r=dhairyagandhi96 a=janEbert

Closes #846.

Co-authored-by: janEbert <janpublicebert@posteo.net>
2019-08-25 09:51:45 +00:00
janEbert
dec1b37e8e Merge remote-tracking branch 'origin/master' into HEAD 2019-08-24 12:23:10 +02:00
janEbert
978d7bf195 Fix CuArrays.libcudnn imports 2019-08-24 02:21:54 +02:00
Mike Innes
ee74f1a311 pkg up 2019-08-22 13:02:59 +01:00
Mike Innes
487000ac31 fix cuda code and tests 2019-08-19 16:56:48 +01:00
Mike Innes
62ec01a6f5 doc build changes 2019-08-19 15:49:50 +01:00
Mike Innes
6c67404398 update cleanup 2019-08-19 15:44:51 +01:00
Mike Innes
447fd9d604 conv docstring formatting 2019-08-19 15:30:59 +01:00
Mike Innes
2f7ad895aa test cleanups 2019-08-19 15:22:50 +01:00
Mike Innes
9590aa63e3 rm last uses of param/data 2019-08-19 15:14:42 +01:00
thebhatman
a76e4d128b Remove param from crosscor 2019-08-19 19:19:53 +05:30
Manjunath Bhat
8456b7ba45
Remove param from groupnorm 2019-08-19 19:16:21 +05:30
Mike Innes
3ecca436e4 formatting fix 2019-08-19 14:42:07 +01:00
Mike Innes
49044dff7c avoid adjoint on abstract type 2019-08-19 14:39:09 +01:00
Mike Innes
b8fabad337 deprecate param/data 2019-08-19 14:35:48 +01:00
Fredrik Bagge Carlson
3287cf23db
Add RADAM export 2019-08-19 13:07:39 +08:00
Fredrik Bagge Carlson
304b433daa
Add RADAM to tests 2019-08-19 13:01:14 +08:00
Fredrik Bagge Carlson
ebbad0d135
Add RADAM optimizer 2019-08-19 12:22:32 +08:00
bors[bot]
aab3c4e052 Merge #837
837: Use `CuArrays.ones` instead `cuones` which is deprecated r=dhairyagandhi96 a=mimadrid

I

Co-authored-by: Miguel Madrid Mencía <miguel.madrid.mencia@gmail.com>
2019-08-12 05:36:29 +00:00
Miguel Madrid Mencía
14affbc91b
Use CuArrays.ones instead cuones which is deprecated 2019-08-11 13:38:44 +02:00
Mike J Innes
7c111e7cde fixes #645
fixes #831
2019-08-09 13:53:11 +01:00
bors[bot]
109c278f74 Merge #835
835: Fix  cuzeros deprecation r=dhairyagandhi96 a=Moelf



Co-authored-by: Moelf <jerryling315@gmail.com>
2019-08-09 10:33:55 +00:00
Moelf
4d00957b36
Fix CuArray zeros deprecation 2019-08-06 22:23:21 +02:00
Dhairya Gandhi
0a5ce0ed61
Merge pull request #827 from ChrisRackauckas/patch-3
Momentum doesn't need params
2019-07-31 23:36:40 -04:00
Christopher Rackauckas
ed12d4e7c0
Momentum doesn't need params 2019-07-31 17:56:51 -04:00
Mike J Innes
f3551da5a2 dropout printing 2019-07-24 11:20:39 -04:00
thebhatman
faac0ff08b Updated InstanceNorm and GroupNorm to avoid mutation 2019-07-18 16:13:58 +05:30
thebhatman
a645a86927 Manifest updated 2019-07-17 20:45:25 +05:30
Manjunath Bhat
b779d43aca
replaced trunc Int with div 2019-07-16 17:52:55 +05:30
thebhatman
a128a7718d gradients test updated in cudnn 2019-07-16 17:27:35 +05:30
thebhatman
d0b94b88f6 Merge branch 'zygote' of https://github.com/FluxML/Flux.jl into zygote 2019-07-12 22:20:34 +05:30
thebhatman
2816fbb9b2 Fix for getindex error in BatchNorm 2019-07-12 22:19:41 +05:30
Manjunath Bhat
4ef5ec0005
brackets corrected 2019-07-12 21:03:57 +05:30
thebhatman
fc1c0d58ed Merge branch 'zygote' of https://github.com/FluxML/Flux.jl into zygote 2019-07-12 20:47:54 +05:30
thebhatman
8d6028e27a tests with gradients 2019-07-12 20:47:43 +05:30
Mike Innes
a140c31f72 fix batchnorm 2019-07-12 16:09:42 +01:00
Mike Innes
1fc584102d fix dropout 2019-07-12 15:38:28 +01:00
Mike Innes
094b38ac03 require julia 1.1 2019-07-12 15:21:46 +01:00
Mike Innes
c9cb729b9b rm REQUIRE 2019-07-12 14:55:50 +01:00
Mike Innes
e2bf46b7fd gpu test fixes 2019-07-12 14:52:01 +01:00
Mike Innes
c9663c1e71 pkg up 2019-07-12 14:51:42 +01:00
Manjunath Bhat
2b379d0ec0
Allow scalar indexing or onehotbatch tests will fail 2019-07-12 17:56:47 +05:30
Mike J Innes
bab618d168
Merge pull request #767 from oxinabox/patch-6
Some cleanup on performance tips docs
2019-07-11 16:11:44 +01:00
Mike J Innes
27904d349c
Update performance.md 2019-07-11 16:11:32 +01:00
Mike J Innes
174adf94d9
Merge pull request #805 from DrChainsaw/prefor-so-fix
Fix for #803
2019-07-11 16:02:54 +01:00
Mike Innes
33c8d84a60 cuparam -> cuarray 2019-07-11 14:14:56 +01:00
Manjunath Bhat
11c9a8450c
Remove active from GroupNorm 2019-07-11 18:40:48 +05:30
Mike Innes
c2cd7dab91 re-export gradient 2019-07-11 13:55:12 +01:00
DrChainsaw
9b96a3d69b Change to array due to "type definition not allowed inside a local scope" 2019-07-09 01:15:55 +02:00
DrChainsaw
16d5f2bc24 Add x to seen in prefor to avoid infinite recursion if passed something self-referential 2019-07-08 23:11:35 +02:00
thebhatman
cf5bc801d3 Check for nothing in update step 2019-07-08 19:22:23 +05:30
thebhatman
8d78b437ff Merge branch 'sf/zygote_updated' of https://github.com/thebhatman/Flux.jl 2019-07-08 18:47:17 +05:30
Mike J Innes
b3bba4c566
Merge pull request #801 from quatrejuin/master
Fix lack of x
2019-07-08 13:00:58 +01:00
thebhatman
812541f8d6 zeros replaced by fill to avoid nothing grad 2019-07-06 19:41:03 +05:30
thebhatman
8292cfd81f Decay checking test added back 2019-07-03 00:30:16 +05:30
Jason Wu
b24e05bb20
Fix lack of x 2019-07-02 13:15:54 -04:00
thebhatman
4e9f3deb7f Manifest updated with new Zygote version 2019-07-02 20:41:44 +05:30
thebhatman
3ee2a76f61 Removed .data from LSTMCell 2019-07-02 17:38:30 +05:30
thebhatman
517219ba23 Renamed gradients test file 2019-07-02 16:13:42 +05:30
thebhatman
9f6793d63a Project.toml and Manifest updated 2019-07-02 12:16:24 +05:30
Viral B. Shah
5689b39538
Create FUNDING.yml 2019-06-26 17:51:54 -04:00
Mike J Innes
e88440974b
Merge pull request #796 from dhairyagandhi96/nadam
Pick beta from the state - NADAM
2019-06-19 22:18:56 +01:00
thebhatman
618f8a03c8 Hopefully the tests pass 2019-06-20 00:46:11 +05:30
thebhatman
f1bf39977b nograd defined for sleep 2019-06-20 00:38:24 +05:30
thebhatman
b194e7e3a8 Callback being called now 2019-06-20 00:37:54 +05:30
Dhairya Gandhi
dd9cdbef14 remove uncessary call to beta 2019-06-16 19:09:50 +05:30
Dhairya Gandhi
67f18663d9 pick beta from state in NADAM 2019-06-16 19:06:59 +05:30
thebhatman
e6d5846e49 Temporary removal of Float16 test 2019-06-14 23:24:31 +05:30
thebhatman
7ab9d8ed3d Minor update 2019-06-13 18:59:03 +05:30
thebhatman
ce6a1bf84f Modifying tests in curnn.jl 2019-06-13 18:45:37 +05:30
thebhatman
80c680c598 Updated tests in cudnn.jl 2019-06-13 18:44:46 +05:30
thebhatman
25f74d1b4a Modified tests in cuda.jl 2019-06-13 18:44:17 +05:30
thebhatman
1ff4e3188e back on mse failing for Float16 2019-06-13 16:41:25 +05:30
thebhatman
ce11804dc1 CrossCor test passing, hopefully. 2019-06-13 01:21:58 +05:30
thebhatman
48ed93cdaa Silly error in Dropout corrected. 2019-06-12 23:16:15 +05:30
thebhatman
e9797408ec DepthwiseConv corrected again. 2019-06-12 23:01:51 +05:30
thebhatman
00a4f4c26d Correcting Dropout 2019-06-12 22:39:30 +05:30
thebhatman
bd7e3b1f41 Dropout with dims test passing. 2019-06-12 22:16:11 +05:30
thebhatman
c7c0ee2cbc Resolving Merge Conflicts 2019-06-12 21:34:42 +05:30
Dhairya Gandhi
b47238eb74
Merge pull request #793 from amellnik/typos
Two minor typos in docs
2019-06-12 11:31:06 +05:30
Alex Mellnik
e17999f19b Two minor typos 2019-06-11 22:09:59 -07:00
thebhatman
dfd2965e85 GroupNorm tests corrected 2019-06-11 22:32:54 +05:30
thebhatman
11073dcd25 GroupNorm made to use istraining() 2019-06-11 22:04:33 +05:30
thebhatman
a56cfb73c3 BatchNorm test corrected 2019-06-11 20:34:48 +05:30
thebhatman
f465665c73 Corrected test for asymmetric padding 2019-06-11 20:20:00 +05:30
thebhatman
94a2d1987d Updated tests of normalisation layers. 2019-06-11 20:05:07 +05:30
thebhatman
a782524a0e Temporarily removed tests of cudnn and curnn. 2019-06-10 18:29:55 +05:30
thebhatman
ef63f80644 No ops defined for param and data 2019-06-10 18:24:18 +05:30
thebhatman
0ddb5f0265 Tests for Optimisers supporting Zygote 2019-06-06 04:09:17 +05:30
bors[bot]
1902c0e7c5 Merge #446
446: Added the SkipConnection layer and constructor r=MikeInnes a=bhvieira

I added a DenseBlock constructor, which allows one to train DenseNets (you can train ResNets and MixNets with this as well, only need change the connection, which is concatenation for DenseNets).

Disclaimer: I created the block for a 3D U-Net, so the assumption here is that whatever layer is inside the block, its output has the same spatial dimension (i.e. all array dimensions excluding the channel and minibatch dimensions) as the input, otherwise the connection wouldn't match. I'm not sure this matches the topology of every DenseNet there is out there, but I suppose this is a good starting point.

No tests yet, will add them as the PR evolve.

I'm open to suggestions! :)


Co-authored-by: Bruno Hebling Vieira <bruno.hebling.vieira@usp.br>
Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
2019-06-05 13:28:41 +00:00
Mike J Innes
b98075817c
Merge branch 'master' into DenseBlock 2019-06-05 14:27:47 +01:00
Lyndon White
fe759ac43c
Update docs/src/performance.md
Co-Authored-By: Kristoffer Carlsson <kristoffer.carlsson@chalmers.se>
2019-05-28 14:19:56 +01:00
bors[bot]
8ee6af1bee Merge #762
762: CrossCor layer r=avik-pal a=ayush-1506

Same as #423 (which could be edited since I lost access to that github account).

Co-authored-by: ayush-1506 <ayush.shridhar1506@gmail.com>
2019-05-14 10:36:22 +00:00
ayush-1506
98a027a505 typo 2019-05-14 02:56:12 -07:00
ayush-1506
bfc5bb0079 rebase 2019-05-14 02:53:48 -07:00
ayush-1506
f263f0c8ed add to layer docs 2019-05-14 02:53:06 -07:00
ayush-1506
0a2e288c3f another small test 2019-05-14 02:53:06 -07:00
ayush-1506
2161163a82 added crosscor 2019-05-14 02:52:28 -07:00
ayush-1506
451b80da3d add to layer docs 2019-05-14 02:50:18 -07:00
ayush-1506
7c28f7f883 Merge branch 'crosscor' of https://github.com/ayush-1506/Flux.jl into crosscor 2019-05-14 02:47:28 -07:00
Bruno Hebling Vieira
6b3cd825b9 Added SkipConnection to docs tentatively in Other General Purporse Layers 2019-05-13 16:43:14 -03:00
Bruno Hebling Vieira
796a2957c9 Added news and removed type annotation from SkipConnection structure 2019-05-13 16:33:31 -03:00
Bruno Hebling Vieira
c5fc2fb9a3 Added tests 2019-05-13 16:32:00 -03:00
Bruno Hebling Vieira
e7d76b8423 Added the SkipConnection layer and constructor
Added missing export

Corrected channel placement

Dimension 4 cannot be assumed to always be the Channel dimension

Deprecation of `treelike`

Code now makes use of `@treelike` macro instead of the deprecated `treelike` function (it worked on my end because I'm on Julia 0.7, while Julia 1.0 deprecated stuff)

Update basic.jl

Renaming to SkipConnection

* Update Flux.jl

* Update basic.jl

Updated `SkipConnection` with a `connection` field

I'm pretty sure I broke something now, but this PR should follow along these lines `cat` needs special treatment (the user can declare his own `concatenate` connection, but I foresee it's going to be used often so we can simply define special treatment)

Forgot to remove some rebasing text

Forgot to remove some more rebasing text

Removed local copy and default cat method from the function calls

Adjusted some more types for inference, could improve on this as well

Re-placed some left-over spaces
2019-05-13 16:32:00 -03:00
Dhairya Gandhi
308b199bd0
Merge pull request #774 from zsz00/patch-1
typo of comvolutional in NEWS.md
2019-05-14 00:37:17 +05:30
zy
a27be0f9ec
typo of comvolutional
comvolutional  -> convolutional
2019-05-14 01:24:45 +08:00
bors[bot]
68ba6e4e2f Merge #563
563: noise shape for dropout r=MikeInnes a=chengchingwen

I add the noise shape for dropout, similar to the `noise_shape` argument in [`tf.nn.dropout`](https://www.tensorflow.org/api_docs/python/tf/nn/dropout)

Co-authored-by: chengchingwen <adgjl5645@hotmail.com>
Co-authored-by: Peter <adgjl5645@hotmail.com>
2019-05-13 17:16:10 +00:00
Peter
9c1bb93aa3
Update NEWS.md
Co-Authored-By: Mike J Innes <mike.j.innes@gmail.com>
2019-05-14 01:12:59 +08:00
chengchingwen
bdf74fe342 update NEWS 2019-05-14 00:57:42 +08:00
chengchingwen
2fc2a5282c Merge remote-tracking branch 'upstream/master' into drop_shape 2019-05-14 00:50:59 +08:00
bors[bot]
16fc41cd00 Merge #756
756: Change `DepthwiseConv()` to use `in=>out` instead of `in=>mult`. r=MikeInnes a=staticfloat

This is an API change, but I think it makes more sense, and is more consistent with our `Conv()` API.  This also dumps the `DepthwiseConv((3,3), C_in)` API, as I'm not sure why you would want to specify only the input channel count and default the output to a channel multiplier of 1; if anything I would think you'd want to specify the channel output and leave the input to be default.  In any case, I think consistency with `Conv()` is the best thing to chase after here.

Co-authored-by: Elliot Saba <staticfloat@gmail.com>
2019-05-13 16:37:57 +00:00
Mike J Innes
5931b93e09
Merge pull request #772 from johnnychen94/patch-1
delete redundant section
2019-05-13 17:33:01 +01:00
Elliot Saba
06da965301 Add NEWS.md entry for https://github.com/FluxML/Flux.jl/pull/756 2019-05-12 11:20:41 -07:00
Elliot Saba
48fcc66094 Remove vestigial testing println() 2019-05-12 11:20:24 -07:00
Elliot Saba
2e6561bb6a Change DepthwiseConv() to use in=>out instead of in=>mult.
This is an API change, but I think it makes more sense, and is more
consistent with our `Conv()` api.
2019-05-12 11:20:24 -07:00
Johnny Chen
7103a61a1f
delete redundant section 2019-05-11 12:40:01 +08:00
chengchingwen
5c5140683c make dims as field of Dropout 2019-05-10 23:45:50 +08:00
ayush-1506
99d07e67db another small test 2019-05-09 16:43:28 +05:30
ayush-1506
9a3aa18c17 conflicts 2019-05-08 11:56:46 +05:30
Tejan Karmali
79534caca1
Merge pull request #701 from jw3126/test700
Add tests for on quadratic Conv (#700)
2019-05-08 11:09:38 +05:30
Lyndon White
fc4827c48f
Some cleanup on performance tips 2019-05-07 16:38:21 +01:00
Viral B. Shah
7c897394dd
Create CITATION.bib 2019-05-04 18:49:19 -04:00
Jan Weidner
e96a9d7eaf Switch broken #700 test to pass 2019-05-03 22:36:32 +02:00
Jan Weidner
73c5d9f25c fix 2019-05-03 22:22:52 +02:00
Jan Weidner
27a9a7b9cf add broken test for #700 2019-05-03 22:22:52 +02:00
Elliot Saba
fecb6bd16f Update Manifest 2019-05-02 18:59:12 -07:00
Mike J Innes
92ddc618f8 update for arrays 2019-05-02 18:57:52 -07:00
Mike J Innes
c70276ddfe rm more deprecations 2019-05-02 18:57:52 -07:00
Mike J Innes
2bb0c1e1fe update stuff 2019-05-02 18:54:29 -07:00
Mike J Innes
256695262c rm optimiser deprecations 2019-05-02 18:54:01 -07:00
Mike J Innes
3182c1b44b test on 1.1 2019-05-02 18:54:01 -07:00
Mike J Innes
5b79453773 passing tests... ish 2019-05-02 18:54:01 -07:00
Mike J Innes
0c265f305a fix most tests 2019-05-02 18:52:09 -07:00
Mike J Innes
f9d8ea81fb move jacobian test to Tracker 2019-05-02 18:52:09 -07:00
Mike J Innes
82ee61f5be implement #643 2019-05-02 18:52:09 -07:00
Mike J Innes
c313be8e95 rm data/param 2019-05-02 18:52:09 -07:00
Mike J Innes
aa4d221f8c break all the things 2019-05-02 18:50:52 -07:00
ayush-1506
20b79e0bdf added crosscor 2019-05-01 22:29:00 +05:30
bors[bot]
e991228047 Merge #761
761: Fixes #760 r=MikeInnes a=avik-pal



Co-authored-by: Avik Pal <avikpal@iitk.ac.in>
2019-05-01 14:23:08 +00:00
Avik Pal
a0be6fa837
Add missing activation function for batchnorm 2019-05-01 19:47:54 +05:30
Dhairya Gandhi
8355d57c79
Merge pull request #759 from dhairyagandhi96/tag_083
bump version to v0.8.3
2019-05-01 18:59:36 +05:30
Dhairya Gandhi
221670a2b1
Merge pull request #733 from thebhatman/expdecay-fix
Fixed ExpDecay
2019-05-01 18:58:37 +05:30
thebhatman
5ffc3b2d40 Comparing decay steps with expected true decay steps 2019-05-02 00:12:14 +05:30
thebhatman
5e06d8bb76 Test for decay_step 2019-05-01 23:10:00 +05:30
Dhairya Gandhi
eff600642a
Merge pull request #612 from dhairyagandhi96/onecold
Fixes OneHotMatrix/Vector GPU Performance
2019-04-30 19:40:19 +05:30
Dhairya Gandhi
9bbbd17e4b
Merge branch 'master' into onecold 2019-04-30 19:09:36 +05:30
Dhairya Gandhi
3d5b76c0df bump version to v0.8.3 2019-04-29 22:01:46 +05:30
Mike J Innes
b0155ec1fe
Merge pull request #755 from Roger-luo/add-more-docs
add some docs for onehot & onecold
2019-04-26 11:54:54 +01:00
Roger-luo
d63338c242 fix doctest 2019-04-26 18:12:14 +08:00
Mike J Innes
6c3a939133
Update src/onehot.jl
Co-Authored-By: Roger-luo <hiroger@qq.com>
2019-04-26 18:09:14 +08:00
Roger-luo
fabcd05ff2 add examples 2019-04-26 18:05:03 +08:00
Mike J Innes
13cfcb5ffa
Merge pull request #718 from FluxML/sf/asymmetric_padding
Add asymmetric padding
2019-04-25 22:29:14 +01:00
Elliot Saba
732f97fe16 Split out conv_transpose_dims() so that Zygote can ignore it 2019-04-25 10:24:19 -07:00
Elliot Saba
c9148194cf Update docs/ Manifest 2019-04-25 10:22:29 -07:00
Elliot Saba
a81036c2e1 Update Project/Manifest 2019-04-25 10:11:41 -07:00
Elliot Saba
6e22cd4931 Add asymmetric padding to convolutional layers 2019-04-25 09:55:23 -07:00
Elliot Saba
113ddc8760 Update Flux code for new NNlib branch 2019-04-25 09:55:23 -07:00
Viral B. Shah
bc2999b5a7
Merge pull request #752 from hossein-pourbozorg/use_https
use https instead of http for web links
2019-04-25 11:55:05 -04:00
Hossein Pourbozorg
7f06b15f67 use https instead of http for web links 2019-04-25 11:04:03 +00:00
Dhairya Gandhi
01ffa21939
Merge pull request #750 from FluxML/dg/bound_tracker
Added NNlib/ Tracker version bounds
2019-04-24 21:38:30 +05:30
Dhairya Gandhi
77e3ff7a8c fixed docs 2019-04-24 21:16:31 +05:30
Dhairya Gandhi
96b0e751e3 fix NNlib bound 2019-04-24 19:29:54 +05:30
Dhairya Gandhi
4ba640b59e fixes 2019-04-24 19:19:21 +05:30
Dhairya Gandhi
55bb39a259 added NNlib/ Tracker version bounds 2019-04-24 19:15:29 +05:30
Mike J Innes
bd2611da9c
Merge pull request #709 from DoktorMike/master
Small fix for recurrence documentation
2019-04-23 14:51:25 +01:00
Mike J Innes
f2ee87e4d8
Merge pull request #747 from jctops/patch-1
Swap comma for full stop
2019-04-23 12:33:46 +01:00
Jake Topping
ff7adda74b
Swap comma for full stop
"ERROR: LoadError: UndefVarError: G not defined" caused by "gn,G" rather than "gn.G" in line 386. Swapping for full stop should fix this
2019-04-22 17:08:36 +01:00
Michael Green
1eca23e113 Merge branch 'master' of https://github.com/FluxML/Flux.jl 2019-04-20 11:26:24 +02:00
Michael Green
934f7f932d Updated docs again. 2019-04-20 11:22:48 +02:00
Dhairya Gandhi
412e04fef1
Merge pull request #745 from Tokazama/patch-1
Fix typo in Maxout
2019-04-20 13:09:28 +05:30
Zachary P Christensen
83eb5a1df6
Fix typo in Maxout 2019-04-19 17:02:26 -04:00
Viral B. Shah
05b1844419
Update LICENSE.md 2019-04-15 16:59:16 -04:00
thebhatman
e459551336 weights updated in tests 2019-04-11 21:59:50 +05:30
thebhatman
fb3001b8b2 Added test for ExpDecay 2019-04-11 21:53:36 +05:30
thebhatman
31a50ab16a Fixed ExpDecay 2019-04-11 17:28:06 +05:30
Dhairya Gandhi
66ce8d8066
Merge pull request #728 from shreyas-kowshik/gn_news_patch
Added GroupNorm to docs and News.md
2019-04-09 17:58:56 +05:30
thebhatman
710084ffbf Loss functions added to docs 2019-04-05 23:50:16 +05:30
Shreyas
2a6eb35a71 Added GroupNorm to docs and News.md 2019-04-05 23:16:46 +05:30
Dhairya Gandhi
30fc68047e
Merge pull request #727 from hossein-pourbozorg/patch-1
add other optimizers to documentation
2019-04-05 20:18:24 +05:30
Mike J Innes
54d9229be9
Merge pull request #710 from johnnychen94/master
naive implementation of activations
2019-04-05 15:33:31 +01:00
Johnny Chen
a300376f71
fix a typo in comment
`inplementation` --> `implementation`
2019-04-05 19:19:30 +08:00
JohnnyChen
4626f7568c rewrite one test case 2019-04-05 18:50:15 +08:00
JohnnyChen
3cafbbad02 simplify the implementation 2019-04-05 18:44:00 +08:00
JohnnyChen
de7a5f4024 correct the function behavior; support Any type 2019-04-05 18:16:44 +08:00
thebhatman
b84ab7ac95 Removed logcosh 2019-04-05 03:16:54 +05:30
Hossein Pourbozorg
cad2df2c41
add other optimizers to documentation 2019-04-05 01:25:21 +04:30
bors[bot]
bd9d73a941 Merge #655
655: Added support for Float64 for DepthwiseConv r=dhairyagandhi96 a=thebhatman

DepthwiseConv was giving errors for Float64. This fixes the issue.

Co-authored-by: Manjunath Bhat <manjunathbhat9920@gmail.com>
2019-04-04 17:25:52 +00:00
chengchingwen
261235311c change dims as unbroadcasted dims and keyword argument 2019-04-05 01:19:20 +08:00
Dhairya Gandhi
1963f30911
Merge pull request #726 from dhairyagandhi96/iris
use cached iris dataset
2019-04-04 22:46:21 +05:30
Dhairya Gandhi
9c8175b1c0 fixes 2019-04-04 22:32:01 +05:30
Dhairya Gandhi
4f754d33cb switch to http link 2019-04-04 22:18:38 +05:30
Dhairya Gandhi
38cc216a4b switch to azure 2019-04-04 22:03:01 +05:30
Dhairya Gandhi
77274b4af7 change iris link 2019-04-04 21:07:46 +05:30
Dhairya Gandhi
2952bcdab1 fixes 2019-04-04 19:28:40 +05:30
Dhairya Gandhi
5b9c53439b recreate OHV 2019-04-04 19:19:47 +05:30
Dhairya Gandhi
4f1336905f fix colon indexing 2019-04-04 19:16:14 +05:30
bors[bot]
25097c4322 Merge #712
712: Enable GPU CI r=dhairyagandhi96 a=dhairyagandhi96

Looking for feedback on this policy for doing GPU CI.

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
2019-04-03 12:54:18 +00:00
Dhairya Gandhi
f4f8ba32fe fix variable name 2019-04-03 16:01:27 +05:30
Mike J Innes
0b9fddda03
Merge pull request #721 from yukota/devide_test
Devide test specific dependency
2019-04-02 12:40:49 +01:00
Dhairya Gandhi
058378a35c increase bors timeout 2019-04-01 20:10:08 +05:30
Dhairya Gandhi
cff1dfd258 conditionally execute RNN tests 2019-04-01 19:56:49 +05:30
Dhairya Gandhi
bc33108e66 disable rnn tests 2019-03-31 00:29:10 +05:30
YuK_Ota
d6bbbbc4cd devide test specific dependency 2019-03-30 23:36:43 +09:00
Dhairya Gandhi
ac467cfe77 fixes 2019-03-30 18:17:57 +05:30
Dhairya Gandhi
492a3ca707 disable GRU tests 2019-03-30 18:15:42 +05:30
Elliot Saba
7418a2d7d7
Merge pull request #696 from shreyas-kowshik/group_norm_patch
Added GroupNorm Layer
2019-03-29 16:35:51 -07:00
Shreyas
4cb7b9278b Minor changes to docstring according to guidelines 2019-03-30 00:28:23 +05:30
Dhairya Gandhi
a50492ab40 add bors conf 2019-03-29 17:45:19 +05:30
Dhairya Gandhi
438b31a138 dont test with CUDA masters 2019-03-29 00:08:08 +05:30
Dhairya Gandhi
d2ce3f304f fixes 2019-03-28 21:34:18 +05:30
Dhairya Gandhi
be6a606d96 enable gpu ci on julia 1 2019-03-28 21:31:20 +05:30
JohnnyChen
82595648e2 change 4-spaces tab to 2-spaces tab 2019-03-28 22:40:24 +08:00
Shreyas
b6fcd1d837 Added export to Maxout 2019-03-28 19:15:16 +05:30
JohnnyChen
13c58494ec add x into results 2019-03-28 19:28:59 +08:00
Johnny Chen
c4ebd199db
move test cases to "basic" testset 2019-03-28 17:58:02 +08:00
Johnny Chen
47728b1899
fix test case error 2019-03-28 17:45:12 +08:00
JohnnyChen
5c2a071713 add support for 0-element Chain 2019-03-28 17:20:41 +08:00
JohnnyChen
ccfe0f8720 naive implementation of activations 2019-03-28 17:07:04 +08:00
Shreyas
c810fd4818 Corrected Group Size In Batch Norm Test For Group Norm 2019-03-28 01:35:38 +05:30
Shreyas
61c1fbd013 Made Requested Changes 2019-03-28 01:33:04 +05:30
Michael Green
a5c34e8325 Fixed merging with upstream Flux. 2019-03-27 20:30:31 +01:00
Michael Green
d68866a238 Fixed documentation error. 2019-03-27 20:22:01 +01:00
Shreyas
671aed963e Made a few fixes. Added tests 2019-03-28 00:51:50 +05:30
Julian P Samaroo
8033dca0c3 Add note on reset! usage in recurrence docs 2019-03-28 00:51:50 +05:30
Mike J Innes
ab46da11c7
Merge pull request #685 from jpsamaroo/jps/recur-docs-reset
Add note on reset! usage in recurrence docs
2019-03-27 12:47:01 +00:00
thebhatman
4efcc69ba5 logcosh averaged 2019-03-26 23:23:02 +05:30
Shreyas
595f1cf6eb Made Requested Changes 2019-03-26 21:42:49 +05:30
Shreyas
35431e3da9 Merge branch 'master' of https://github.com/FluxML/Flux.jl 2019-03-26 21:32:04 +05:30
Dhairya Gandhi
b5a6207350 add initial GPU CI conf 2019-03-26 18:49:23 +05:30
Julian P Samaroo
1930f40dec Add note on reset! usage in recurrence docs 2019-03-26 00:00:00 -05:00
Manjunath Bhat
930adb122d
Avoided promotion to Float64 in hinge. 2019-03-25 23:43:06 +05:30
thebhatman
6f078857be Added reference links to loss functions 2019-03-26 03:15:28 +05:30
thebhatman
c4d12e57fe Loss function names in lowercase 2019-03-26 03:09:48 +05:30
Mike J Innes
983d87525b
Merge pull request #699 from oxinabox/patch-5
add Maxout news item
2019-03-25 16:15:26 +00:00
Lyndon White
cd3926755a
add Maxout news item 2019-03-25 16:13:11 +00:00
Mike J Innes
8a55969492
Merge pull request #698 from oxinabox/ox/learnablemaxout
make Maxout trainable
2019-03-25 16:06:32 +00:00
Lyndon White
f0cc4a328d make Maxout trainable 2019-03-25 16:02:46 +00:00
Mike J Innes
eeed8b24c3
Merge pull request #681 from dellison/stopdoc
add Flux.stop to training docs
2019-03-25 15:07:07 +00:00
Shreyas Kowshik
b64a9841bc
Merge pull request #1 from FluxML/master
Update
2019-03-24 14:31:59 +05:30
Dhairya Gandhi
912306dfbb
Merge pull request #694 from FluxML/tag_v5
Add Tracker to REQUIRE
2019-03-22 23:36:24 +05:30
Dhairya Gandhi
9249f64e1d add Tracker to REQUIRE 2019-03-22 23:35:29 +05:30
Dhairya Gandhi
f956468e74
Merge pull request #693 from FluxML/tag_v5
Update NNlib
2019-03-22 22:08:16 +05:30
Dhairya Gandhi
db7f1a52db update nnlib 2019-03-22 21:51:04 +05:30
Tim Besard
f2dc57f938
Merge pull request #600 from FluxML/tb/cuptr
Adapt to the new CUDAdrv.CuPtr pointer type.
2019-03-22 14:37:55 +01:00
Tim Besard
0734eeb50e Check CuArrays major version. 2019-03-22 14:15:26 +01:00
Dhairya Gandhi
bc06861320 fix indirect import 2019-03-22 14:15:26 +01:00
Tim Besard
959dd247bf Import CUDAdrv stuff through CuArrays. 2019-03-22 14:15:26 +01:00
Tim Besard
df509ce9f0 Adapt to the new CUDAdrv.CuPtr pointer type. 2019-03-22 14:15:26 +01:00
Mike J Innes
b637311642
Merge pull request #647 from oxinabox/ox/maxout
Add MaxOut layer
2019-03-22 12:18:53 +00:00
Lyndon White
401d3da884 no arg closures 2019-03-21 17:04:52 +00:00
Lyndon White
7d247ea25b update docstring 2019-03-18 12:20:46 +00:00
Nick Robinson
025d9b678d Update docs/src/models/layers.md
Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
2019-03-18 12:20:46 +00:00
Nick Robinson
f222555deb Update src/Flux.jl
Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
2019-03-18 12:20:46 +00:00
Nick Robinson
2bc4b8d1a4 Update docs/src/models/layers.md
Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
2019-03-18 12:20:46 +00:00
Lyndon White
ca68bf9bec correct casing 2019-03-18 12:20:46 +00:00
Lyndon White
e23c8ddd13 take zero-arge closure 2019-03-18 12:20:46 +00:00
Lyndon White
c76b9c7e2c fix docs 2019-03-18 12:20:46 +00:00
Lyndon White
838047f708 fix docs 2019-03-18 12:19:44 +00:00
Kristoffer Carlsson
b84a60e74e Update src/layers/basic.jl
Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
2019-03-18 12:19:44 +00:00
Lyndon White
c1a33c556f do things to docs 2019-03-18 12:19:44 +00:00
Lyndon White
fcc3ec471a Add MaxOut layer 2019-03-18 12:19:44 +00:00
Lyndon White
79de829fdc move Dense's overloads to be near its defn 2019-03-18 12:18:14 +00:00
chengchingwen
59da68b4d9 update test 2019-03-14 21:55:37 +08:00
chengchingwen
934f0840b2 change API to dims 2019-03-14 21:51:28 +08:00
David Ellison
263a3248f6 add Flux.stop to training docs 2019-03-11 19:52:05 -07:00
Manjunath Bhat
57a52e3375
Error of recurrent decimals fixed. 2019-03-12 02:58:32 +05:30
Manjunath Bhat
61386c04f8
Tests added. 2019-03-12 02:36:37 +05:30
Manjunath Bhat
633f0df01f
Added new loss functions. 2019-03-12 02:31:42 +05:30
Mike J Innes
22cb732657
Merge pull request #652 from joshua-whittemore/add-module-to-download-iris-dataset
Add module to make iris dataset available.
2019-03-11 15:32:40 +00:00
Mike J Innes
7da7fe98d6
Merge branch 'master' into add-module-to-download-iris-dataset 2019-03-11 15:31:05 +00:00
Mike J Innes
6b7f8a37dd
Merge pull request #676 from dhpollack/instancenorm_news
update NEWS.md with InstanceNorm
2019-03-10 12:18:24 +00:00
David Pollack
6654aae1ec update NEWS.md with InstanceNorm 2019-03-10 11:11:43 +01:00
Joshua Whittemore
61588f72ef add item to NEWS.md describing Data.Iris module 2019-03-09 13:20:35 -08:00
Joshua Whittemore
0cac373539 add tests for Data.Iris module 2019-03-09 13:02:59 -08:00
Joshua Whittemore
f061df3d23 resolves pull request #652 merge conflicts 2019-03-09 12:51:20 -08:00
Manjunath Bhat
d4a1d33a31
Added Float64 tests for DepthwiseConv 2019-03-09 20:17:22 +05:30
Mike J Innes
b348e31f07
Merge pull request #667 from FluxML/donottrack
rm Tracker
2019-03-08 11:38:37 +00:00
Mike J Innes
f5fb19093c
Update NEWS.md 2019-03-08 11:35:31 +00:00
Mike J Innes
194e2ecd50 update docs manifest 2019-03-08 11:20:39 +00:00
Mike J Innes
5c9cd44428 use registered tracker 2019-03-08 11:18:59 +00:00
Dhairya Gandhi
a481cd5cd6
Merge pull request #668 from thebhatman/patch-4
Add AlphaDropout to NEWS.md
2019-03-08 12:31:58 +05:30
Josh Whittemore
930ebaf217 Add module to make iris dataset available. 2019-03-07 16:56:23 -08:00
Manjunath Bhat
3de8c8ede5
Add AlphaDropout to NEWS.md 2019-03-08 03:10:02 +05:30
Elliot Saba
bc12a4d55a
Merge pull request #656 from thebhatman/patch-3
Added AlphaDropout which is used in SNNs.
2019-03-07 10:58:44 -08:00
Manjunath Bhat
c6e51f5cc2
Made lambda and alpha of eltype(x) 2019-03-07 23:42:38 +05:30
Manjunath Bhat
47c1324476
Merge branch 'master' into patch-3 2019-03-07 23:08:40 +05:30
Elliot Saba
82578bfb0d
Merge pull request #634 from dhpollack/instancenorm
instance normalization
2019-03-07 09:33:22 -08:00
Manjunath Bhat
1d310d4532
Removed {typeof(p)} 2019-03-07 21:55:26 +05:30
thebhatman
f4543b7adf Value of alpha updated and dot operations changed 2019-03-08 03:21:26 +05:30
Mike J Innes
c8badcd12f add news.md 2019-03-07 11:23:14 +00:00
Mike J Innes
5118771e3b pkg update 2019-03-07 11:04:40 +00:00
David Pollack
7b9b64f1cb change IN to in 2019-03-07 09:46:44 +01:00
David Pollack
83b4b3a714 changes based on PR comments 2019-03-07 09:46:44 +01:00
David Pollack
c41f891005 changes based on the improved batchnorm in PR#633 2019-03-07 09:46:44 +01:00
David Pollack
129a708b6f instance normalization 2019-03-07 09:46:44 +01:00
Mike J Innes
b5a148fa37 rm Tracker 2019-03-07 01:33:02 +00:00
Mike J Innes
3a4c6274fa
Merge pull request #651 from FluxML/mji/dogfood
Refactor training loop
2019-03-06 16:53:24 +00:00
Mike J Innes
fc6232b779
Merge pull request #633 from Sklan/patch-3
Improving BatchNorm
2019-03-06 16:23:03 +00:00
thebhatman
8e5965ac41 Indentation fixed 2019-03-05 16:28:05 +05:30
thebhatman
d6608682fc Suggested changes made 2019-03-05 16:18:50 +05:30
Manjunath Bhat
29b853e0bb
Made sure Gradients are not lost. 2019-03-04 22:17:19 +05:30
Manjunath Bhat
922e9c9bc2
Updated docs with AlphaDropout 2019-03-04 01:10:12 +05:30
Manjunath Bhat
b5533ee00b
Exported AlphaDropout 2019-03-04 01:09:05 +05:30
Manjunath Bhat
97f874abcf
Added AlphaDropout which is used in SNNs. 2019-03-04 01:05:46 +05:30
Manjunath Bhat
704be49483
Added support for Float64 for DepthwiseConv
DepthwiseConv was giving errors for Float64. This fixes the issue.
2019-03-01 15:04:05 +05:30
Mike Innes
4cf43c0c41 simpler/nicer training loop 2019-02-28 14:58:42 +00:00
Mike Innes
cd091ad005 in place implicit gradients 2019-02-28 14:08:01 +00:00
Mike Innes
8b4bc7cc52 organise params 2019-02-28 13:44:54 +00:00
Dhairya Gandhi
6825639f79 mapreduce for onehotmatrix 2019-02-28 09:17:18 +05:30
Mike J Innes
d6cf116a74
Merge pull request #639 from ropenta/master
Added an example of Conv to Flux.jl/src/layers/conv.jl, and clarified…
2019-02-25 14:35:38 +00:00
Rohith Pentaparthy
1b1dff1266 Added an example of Conv to Flux.jl/src/layers/conv.jl, and clarified what WHCN means 2019-02-23 14:31:27 -06:00
Sklan
7463f09591
Update normalise.jl 2019-02-21 23:56:19 +05:30
Sklan
6044421c5c
Update normalise.jl 2019-02-20 13:47:31 +05:30
Lyndon White
ebf50f4e1c Create performance tips docs section (#615)
* Create performance_tips.jl

* Rename performance_tips.jl to performance_tips.md

* add perf tips

* Update docs/src/performance_tips.md

Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>

* Update docs/src/performance_tips.md

Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>

* Update make.jl

* Update and rename performance_tips.md to performance.md

* spelling

* Update docs/src/performance.md

Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>

* Update docs/src/performance.md

Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>

* Update performance.md

* Update performance.md

* Update docs/src/performance.md

Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>

* Update docs/src/performance.md

Co-Authored-By: oxinabox <oxinabox@ucc.asn.au>
2019-02-19 15:03:41 +00:00
Dhairya Gandhi
78876a14b3
Merge pull request #522 from kolia/tiny_stack_bugfix
Tiny bugfix: `stack` was still calling julia 0.6 `cat`
2019-02-15 20:50:08 +05:30
Dhairya Gandhi
eb9da4084f
remove spurious line change 2019-02-15 20:33:21 +05:30
Dhairya Gandhi
c50ad6cdb5
Merge branch 'master' into tiny_stack_bugfix 2019-02-15 20:20:01 +05:30
Ayan Banerjee
08b87e0bce Transition to doctests (#616)
* basics.md: Initial doctest to an example

Related to https://github.com/FluxML/Flux.jl/issues/561

* make.jl: Allow doctest to run

* Fix comments in order to pass doctests

* basic.md: Add doctests to examples
2019-02-14 18:29:27 +00:00
pshashk
b0a5844afb Remove dims=1 from normalise (#619)
* remove `dims=1`

* add dims arg

* fix test

* remove dims=1 only from deprecated version
2019-02-11 16:11:47 +00:00
Dhairya Gandhi
2ec35861b5 removing non-allocating functions and tests 2019-02-11 21:22:32 +05:30
Dhairya Gandhi
d16ef75b1c remove duplicate allowscalar call 2019-02-11 20:32:23 +05:30
Dhairya Gandhi
1ada9afe81 assert no scalar indexing for onecold 2019-02-09 22:38:49 +05:30
Dhairya Gandhi
35cd9761a8 adding tests 2019-02-09 22:32:02 +05:30
Mike J Innes
f17a5acd2b
Merge pull request #606 from pshashk/patch-3
Add `corrected` argument to std
2019-02-08 19:09:33 +00:00
pshashk
b074b2491a
fix docstring 2019-02-08 21:49:53 +03:00
pshashk
c3e04392d8
drop dims type restriction 2019-02-08 16:15:37 +03:00
pshashk
ae10421bfe
fix normalise test for dims kwarg 2019-02-08 16:02:03 +03:00
pshashk
911c901294
dims kwarg 2019-02-08 16:00:32 +03:00
pshashk
37385e0dbd
test normalise 2019-02-08 15:43:50 +03:00
pshashk
4f6432d133
test 2019-02-08 15:28:07 +03:00
pshashk
368c29e5e3
Add corrected argument to std
Fixes ffe037c485/src/layers/stateless.jl (L49)
2019-02-08 15:23:27 +03:00
Mike J Innes
ffe037c485
Merge pull request #603 from FluxML/kf/namedtupletree
Treat NamedTuple like Tuple for treelike purposes
2019-02-08 11:06:12 +00:00
Mike J Innes
601e2d8ae0
Merge pull request #586 from KristofferC/kc/batchnorm
work around extreme slowdown in BatchNorm due to julia performance bug in broadcast fusion
2019-02-08 11:00:33 +00:00
Mike J Innes
fe712bf338
Merge pull request #596 from IvanYashchuk/ivan/topic-issue-542
Fixed issue #542.
2019-02-08 10:38:23 +00:00
Ivan Yashchuk
6471790819 Pass symmetric matrix to logdet gradtest 2019-02-08 12:22:08 +02:00
Ivan Yashchuk
e00ac88016 Added tracking of logdet and logabsdet. Added gradtests. 2019-02-08 09:55:33 +02:00
Keno Fischer
1e452a3042 Treat NamedTuple like Tuple for treelike purposes 2019-02-06 11:11:00 -05:00
Mike J Innes
57491b6c39
Merge pull request #602 from avik-pal/patch-2
Conv Transpose missing entry in the docs
2019-02-06 15:45:53 +00:00
Avik Pal
c093d089a6
Add conv_transpose to docs 2019-02-06 21:11:41 +05:30
KristofferC
9914c531f6 work around extreme slowdown due julia performance bug 2019-02-06 16:19:29 +01:00
Mike J Innes
777571d4b4
Merge pull request #601 from FluxML/revert-591-onehot
Revert "Fix OneHotVector/Matrix performance on GPU"
2019-02-06 14:32:48 +00:00
Mike J Innes
ecc55ec9e1
Revert "Fix OneHotVector/Matrix performance on GPU" 2019-02-06 14:31:15 +00:00
Mike J Innes
e8b2ec6f67
Merge pull request #311 from tejank10/conv_transpose
2D Conv transpose support
2019-02-06 14:14:14 +00:00
Tejan Karmali
cc4438cd93 Update NNlib to master in Manofest 2019-02-05 09:33:50 -05:00
Dhairya Gandhi
53875a85a1
Merge pull request #592 from MJ10/master
Layer normalisation for images
2019-02-05 18:51:02 +05:30
Moksh Jain
046f7b4eae fix std arguments in normalise 2019-02-05 18:36:04 +05:30
Ivan Yashchuk
f790fff59a Use other definition for grad(det(A)). 2019-02-05 14:36:28 +02:00
Moksh Jain
c6409d7686 add support for n-dimensional input to normalise layer 2019-02-05 17:09:22 +05:30
Ivan Yashchuk
aa64d2157d Fixed issue #542.
Added tracking of LinearAlgebra.det and its grad method.
2019-02-05 11:38:27 +02:00
Mike J Innes
940b1e6dbf
Merge pull request #587 from KristofferC/patch-2
use uncorrected standard deviation in normalise
2019-02-04 14:35:25 +00:00
Mike J Innes
7fc920240d
Merge pull request #591 from dhairyagandhi96/onehot
Fix OneHotVector/Matrix performance on GPU
2019-02-04 13:53:55 +00:00
Dhairya Gandhi
2f916f9763 better tests 2019-02-04 18:43:25 +05:30
Mike J Innes
17f33b4a6a
Merge pull request #583 from KristofferC/kc/small_fixes
clarify docs on single batch image to conv
2019-02-04 12:33:34 +00:00
Mike J Innes
e774053126
Merge pull request #590 from oxinabox/patch-2
Default to zero'ed initial state for all RNN
2019-02-04 12:28:38 +00:00
Dhairya Gandhi
6654ebfc90 added onecold broadcast test 2019-02-04 17:57:34 +05:30
Mike J Innes
329c8f8f95
Merge pull request #585 from KristofferC/kc/verify_download
add hash verification to datasets
2019-02-04 11:20:53 +00:00
Mike J Innes
cfe6859186 auto-collect in forward 2019-02-04 10:37:02 +00:00
Mike J Innes
838070968e vcat with scalars 2019-02-04 00:05:16 +00:00
Dhairya Gandhi
30aa814c4d fixes #582 2019-02-03 18:43:16 +05:30
Dhairya Gandhi
e243950e28 comment fix 2019-02-03 04:00:08 +05:30
Dhairya Gandhi
bd6158d7f9 onehotvector/matrix behaviour 2019-02-03 03:57:41 +05:30
Lyndon White
26550dacda
Default to zero'ed initial state 2019-02-02 20:01:28 +00:00
Tejan Karmali
84eabcd2ae
fixed DepthwiseConv dilation 2019-02-02 12:19:35 +05:30
Tejan Karmali
e54df2de06
Merge branch 'master' into conv_transpose 2019-02-02 10:20:45 +05:30
Kristoffer Carlsson
fd0f1c7a82
use uncorrected standard deviation in normalise
fixes https://github.com/FluxML/Flux.jl/issues/529
2019-01-30 17:42:19 +01:00
Kristoffer Carlsson
f60079d07c add hash verification to datasets 2019-01-30 13:11:26 +01:00
Mike J Innes
0469394715
Merge pull request #576 from mcabbott/patch-1
PermutedDimsArray
2019-01-29 14:55:55 +00:00
Mike J Innes
dd95416a45
Merge pull request #579 from asbisen/master
add tests for stack and unstack
2019-01-29 14:30:23 +00:00
Mike J Innes
490fdd6400 update diffrules 2019-01-29 11:13:50 +00:00
Anand Bisen
3670fabbe6 add tests for stack and unstack 2019-01-29 01:41:15 -08:00
Mike J Innes
66ca92cd03 test against manifest 2019-01-29 09:07:46 +00:00
Mike J Innes
9e553adbf7 add hessian 2019-01-29 08:37:30 +00:00
Michael Abbott
55a7359f67
PermutedDimsArray test 2019-01-28 18:19:06 +01:00
Michael Abbott
031d1b3d57
PermutedDimsArray like permutedims
e.g. PermutedDimsArray(rand(2,3) |> param, (2,1))
2019-01-28 18:15:32 +01:00
Mike J Innes
8386a49bf9
Merge pull request #575 from FluxML/mji/update
Clean up parameter update API
2019-01-28 15:26:57 +00:00
Mike J Innes
e1cac76a34 params update 2019-01-28 14:14:41 +00:00
Mike J Innes
0f8a4a48c6 extend update! with an optimiser 2019-01-28 14:10:09 +00:00
Mike J Innes
0f2975d905 update -> apply 2019-01-28 13:59:23 +00:00
Mike J Innes
bf0b5c5cef
Merge pull request #535 from asbisen/master
fixed stack/unstack function - in utils.jl for v1.0
2019-01-28 12:23:07 +00:00
Mike Innes
af8fdcc7af fix #573 2019-01-28 10:54:58 +00:00
Mike J Innes
013b421b08
Merge pull request #570 from avik-pal/ap/batchnorm_fixes
Patches for default initializers
2019-01-28 10:40:55 +00:00
Mike J Innes
bb2210f552
Merge pull request #553 from xiaodaigh/patch-2
Updated with more detailed instructions for installing CuArrays
2019-01-28 10:36:27 +00:00
Mike Innes
1c3a63c42f fixes #574 2019-01-28 10:11:07 +00:00
susabi
5930ac1730
simplified instructions 2019-01-26 12:26:48 +11:00
Mike J Innes
58ac415f6b forward mode 2019-01-25 16:14:24 +00:00
Mike J Innes
962ce88c0d
Merge pull request #572 from FluxML/precision
Numeric precision utilities
2019-01-25 10:45:13 +00:00
Mike J Innes
2b1a3e92da mapparams 2019-01-25 10:11:46 +00:00
Mike J Innes
791939709b numeric precision utilities 2019-01-25 10:06:37 +00:00
Mike J Innes
1cf37ab9eb rm some old deprecations 2019-01-25 09:54:32 +00:00
Mike J Innes
a9064cad14
Merge pull request #571 from arnaudmgh/patch-1
Adding `nest = true` option in `Tracker.gradient`
2019-01-25 09:47:48 +00:00
Arnaud Amzallag
3cc3c463a3
Adding nest = true option in Tracker.gradient
otherwise fails and returns an error. Note that option has to be added in both `df` and `d2f`.
2019-01-24 19:29:29 -05:00
Avik Pal
2f3ad56166 Add test for Depthwise Conv 2019-01-24 18:53:04 +05:30
Avik Pal
733879681e Change initializer to glorot_uniform 2019-01-24 18:48:30 +05:30
Avik Pal
bb72c528e1 Change initializers to Float32 2019-01-24 18:43:39 +05:30
Avik Pal
73c1485927 Merge branch 'master' of https://github.com/FluxML/Flux.jl 2019-01-24 18:42:28 +05:30
Mike Innes
ca1c73ed35 fixup 2019-01-24 11:15:57 +00:00
Mike Innes
0142d89943 test onecold-of-tracked-gpu-vector
see #556
2019-01-24 10:40:52 +00:00
Kristoffer Carlsson
325e3a4f70 clarify docs on single batch image to conv
fixes #309
2019-01-24 11:24:10 +01:00
Mike J Innes
62d780c77f onecold fix 2019-01-24 10:16:41 +00:00
Mike J Innes
1eee724054
Merge pull request #567 from ayan-b/upd-docs
docs/basics.md: Add `using Flux`
2019-01-24 10:01:39 +00:00
Ayan Banerjee
bc68dfbd75
docs/basics.md: Add using Flux
In order to import sigmoid function.
2019-01-23 19:20:10 +05:30
Mike J Innes
f5acf442f5
Merge pull request #565 from ayan-b/upd-docs
docs/basics.md: Add `tracked` after 1.0
2019-01-23 12:55:47 +00:00
Ayan Banerjee
236b103b73
docs/basics.md: Add tracked after 1.0 2019-01-22 23:37:34 +05:30
chengchingwen
06003b72c7 noise shape for dropout 2019-01-22 23:51:38 +08:00
Dhairya Gandhi
4be08fe194 remove debug statement 2019-01-22 17:29:12 +05:30
Mike J Innes
152ce4a164 conversions for dual numbers 2019-01-22 10:07:42 +00:00
Mike J Innes
496dbfabd2 make chain collectable 2019-01-22 00:31:55 +00:00
Mike Innes
db3f477e15 update 2019-01-21 10:55:30 +00:00
Mike J Innes
f6397e7358
Merge pull request #517 from FluxML/fix_adamw
Fix decay argument in ADAMW
2019-01-18 10:06:23 +00:00
Mike J Innes
058b4dc7fb
Merge pull request #557 from dhairyagandhi96/dg/transpose
fix transpose/ adjoint gradient
2019-01-16 15:46:44 +00:00
Mike J Innes
347678344e
Merge pull request #550 from KristofferC/kc/docs
modernize documentation
2019-01-16 10:31:53 +00:00
Mike J Innes
9d56807bcd cuarrays version check 2019-01-15 11:43:57 -05:00
Mike J Innes
c667423681 package updates 2019-01-15 11:43:23 -05:00
Dhairya Gandhi
0060cc3453 fixes transpose/ adjoint gradient 2019-01-15 21:59:32 +05:30
Mike J Innes
4d79f499bf fixes #549 2019-01-15 15:49:37 +00:00
Mike J Innes
a3e0de1ee5 fixes #516 2019-01-15 15:49:18 +00:00
Mike J Innes
67d9016319
Merge pull request #538 from KristofferC/kc/promote
fix promotion by avoiding integer division in mse and crossentropy
2019-01-15 13:20:46 +00:00
Kristoffer Carlsson
c74aa67c5d fix promotion by avoiding integer division in mse and crossentropy
oops

add tests
2019-01-15 14:15:05 +01:00
Mike J Innes
827a7b8ed5
Merge pull request #546 from ChrisRackauckas/random
Support random numbers as constants
2019-01-11 10:06:54 +00:00
Mike J Innes
aa1b4f410f simplify 2019-01-11 10:06:14 +00:00
susabi
3f62bc30b9
Update gpu.md 2019-01-11 15:57:54 +11:00
susabi
e13c6c1125
updated gpu.md with installation instructions 2019-01-11 15:55:39 +11:00
Christopher Rackauckas
f6faa10ee2 remove non-type dispatches 2019-01-10 08:57:10 -08:00
Kristoffer Carlsson
2298e4fea1 modernize documentation 2019-01-10 15:06:11 +01:00
Mike J Innes
f0d5624ed2
Merge pull request #493 from dhairyagandhi96/master
[WIP] New Optimiser Docs
2019-01-10 11:10:38 +00:00
Dhairya Gandhi
4291c1a833 pull master 2019-01-10 16:35:57 +05:30
Mike J Innes
e6f925f977 train docstring simplification 2019-01-10 11:05:21 +00:00
Dhairya Gandhi
f00e1cdedf [docs] replace :stop with Flux.stop() 2019-01-10 16:34:07 +05:30
Mike J Innes
81e5551256 tweaks 2019-01-10 11:01:57 +00:00
Mike J Innes
5caeeccb5f
Merge pull request #548 from FluxML/mji/update
Fix update for scalars
2019-01-10 10:38:08 +00:00
Mike J Innes
735b970c12 fix update for scalars 2019-01-10 10:19:05 +00:00
Mike J Innes
e9bae09a64
Merge pull request #532 from KristofferC/patch-1
Docs: fix link to CuArrays
2019-01-10 09:14:49 +00:00
Christopher Rackauckas
3ee5a99794 hit all possibilities 2019-01-09 23:15:21 -08:00
Christopher Rackauckas
cf061e9207 support random numbers as constants 2019-01-09 23:04:12 -08:00
Dhairya Gandhi
7484c54f03 fix train! API syntax docstring 2019-01-08 00:32:55 +05:30
Anand Bisen
ec8dde79c3 fixed stack/unstack function - in utils.jl for v1.0 2019-01-03 17:32:11 -08:00
Kristoffer Carlsson
202424d1b1
Docs: fix link to CuArrays 2019-01-03 01:25:25 +01:00
kolia
9b897fc601 Tiny bugfix: stack was still calling julia 0.6 cat
Also added tiny test for good measure.
2018-12-20 10:03:21 -05:00
Mike J Innes
9781f063aa package updates 2018-12-19 16:06:23 +00:00
Mike J Innes
6b11c552f3 better h/vcat, fixes #378 2018-12-19 11:19:01 +00:00
Mike J Innes
cdfc97f7c6 fix fix_dec 2018-12-19 11:18:44 +00:00
Mike J Innes
aad4ec572e
Merge pull request #520 from roberthoenig/patch-1
Correct CuArrays requirements.
2018-12-19 09:23:03 +00:00
Robert Hönig
0f243dba29
Correct CuArrays requirements.
According to the CuArrays README, "CuArrays should work out-of-the-box on Julia 1.0."
Correct the outdated Julia 0.6 requirement. Also, update the instructions link to point to the
CuArrays.jl README, which has setup instructions (CUDAnative.jl's README doesn't).
2018-12-19 09:23:26 +01:00
Dhairya Gandhi
e48268ff06 fix argument name in ADAMW 2018-12-12 16:47:42 +05:30
Tejan Karmali
ed835f26fe printing ConvTranspose layer 2018-12-09 12:50:09 -05:00
Tejan Karmali
1648414a5d fixes for layer and test 2018-12-04 11:08:40 -05:00
Dhairya Gandhi
eb287ae9a0 fixed optimisers syntax 2018-12-04 16:08:03 +05:30
Dhairya Gandhi
d412845192 added training api changes 2018-12-01 16:59:27 +05:30
Tejan Karmali
519c3db5c0 clean code 2018-11-28 11:48:53 -05:00
Tejan Karmali
95e490a2c5 merge conflict resolved 2018-11-28 11:10:22 -05:00
Tejan Karmali
89f2709b61 resolved conflicts 2018-11-28 11:07:43 -05:00
Tejan Karmali
bc9bda9a85 in accordance with conv_filter api 2018-11-28 10:55:21 -05:00
Tejan Karmali
10f3a8eae2 conv_filter api changes 2018-11-28 10:55:21 -05:00
Tejan Karmali
ca8ad63fb6 in accordance with conv_data api 2018-11-28 10:55:21 -05:00
Tejan Karmali
9c3e34b15f conv_data grad api change 2018-11-28 10:55:21 -05:00
Tejan Karmali
a71ee386d0 1.0 fix for conv transpose 2018-11-28 10:55:21 -05:00
Mike J Innes
a32c8a2e60
Merge pull request #499 from willtebbutt/wct/leq
Deal with <= for TrackedReals
2018-11-28 00:37:32 +00:00
Mike J Innes
dd154ca049
Merge pull request #294 from avik-pal/cudnn_batchnorm
Wrapper for CuDNN BatchNorm
2018-11-27 23:51:32 +00:00
Mike J Innes
1c36504768 fixup 2018-11-27 18:44:07 -05:00
Mike J Innes
7992de5cba update requires syntax 2018-11-27 18:31:05 -05:00
Avik Pal
1d5b3429ea Missing brackets 2018-11-20 09:26:48 +05:30
Will Tebbutt
c7f5026bd9 Deal with <= for TrackedReals 2018-11-18 13:06:32 +00:00
Mike J Innes
4cba46c293
Merge pull request #494 from FluxML/mji/chain
Make Chain immutable
2018-11-16 13:52:00 +00:00
Mike J Innes
3d41dca338 immutable chain 2018-11-16 12:22:15 +00:00
Mike J Innes
6ac5345339 better printing 2018-11-14 23:53:30 +00:00
Mike J Innes
325035cf60 array conversions 2018-11-14 23:48:32 +00:00
Mike J Innes
fc2f2e9f9f
Merge pull request #490 from ChrisRackauckas/patch-1
Add missing eps overload for TrackedReal
2018-11-14 18:16:35 +00:00
Christopher Rackauckas
f20fa65848
Add missing eps overload for TrackedReal
`eps` can be called on the number type as well, and this is missing from the TrackedReal overloads.
2018-11-14 09:58:41 -08:00
Avik Pal
dfd680646c Fix conflict 2018-11-14 22:18:57 +05:30
Mike J Innes
3ef6bfc0ac
Merge pull request #473 from avik-pal/patch-2
Update CUDNN function calls
2018-11-14 16:07:02 +00:00
Mike J Innes
1eea125582 require adapt 0.4 2018-11-14 10:54:51 -05:00
Mike J Innes
cbc29c889a old cuarrays compat 2018-11-14 10:53:26 -05:00
Mike J Innes
a57f66e58a adapt updates 2018-11-14 15:36:18 +00:00
Mike J Innes
fc9f1e101f package updates 2018-11-12 23:45:04 +00:00
Mike J Innes
b3331205d1 faster default gradient performance 2018-11-12 23:39:25 +00:00
Mike J Innes
95fb46018d
Merge pull request #390 from FluxML/f32
Proposal: Initialise all weights as Float32
2018-11-12 20:44:47 +00:00
Mike J Innes
75ecc0b6ba downconversion for conv 2018-11-12 20:21:27 +00:00
Mike J Innes
903db70673 float32 param initialisers 2018-11-12 20:10:47 +00:00
Dhairya Gandhi
1ea8c5a293 [WIP] add docstrings and doc improvements 2018-11-12 19:17:10 +05:30
Dhairya Gandhi
07397bc950 [WIP] add links to sgd 2018-11-12 17:53:53 +05:30
Dhairya Gandhi
4562682528 [WIP] add optimiser docs 2018-11-12 17:42:52 +05:30
Avik Pal
9f12e8ec68 Make the test more reliable 2018-11-10 14:00:25 +05:30
Avik Pal
4df9e10516 Add test for 2D inputs 2018-11-10 11:52:23 +05:30
Avik Pal
d6aacf4135 Fix reshape 2018-11-10 11:43:49 +05:30
Avik Pal
e2ae8b4e8d Fix dimensions 2018-11-10 11:35:58 +05:30
Avik Pal
3bc809f49e dropdims to make the array 2d 2018-11-10 11:25:37 +05:30
Avik Pal
4d703b31a1 Reshape 2D tensors to use cudnn batchnorm 2018-11-08 19:23:07 +05:30
Avik Pal
564518e448 Merge branch 'master' of https://github.com/FluxML/Flux.jl into cudnn_batchnorm 2018-11-08 19:13:34 +05:30
Avik Pal
02efc264e7 Fix unintentional change to spaces 2018-11-08 19:12:38 +05:30
Mike J Innes
30486f9c03
Merge pull request #441 from Paethon/rm_initn
Removes initn initialization
2018-11-08 13:25:02 +00:00
Mike J Innes
5e572df557
Merge pull request #485 from dhairyagandhi96/master
Add call back
2018-11-08 13:18:17 +00:00
Dhairya Gandhi
392c3c942b re-add removed call function 2018-11-08 18:44:57 +05:30
Mike J Innes
a88b7528bf constructor deprecations 2018-11-06 08:19:46 -05:00
Mike J Innes
0c19dad700 include cudnn.jl 2018-11-06 12:39:54 +00:00
Mike J Innes
39dcfd3933
Merge pull request #469 from invenia/ed/hang-draw-and-quarter
Stop type treason with show of the TrackedArray type
2018-11-06 11:54:07 +00:00
Mike J Innes
4763473079 fixed method 2018-11-06 11:50:04 +00:00
Mike J Innes
8042198475
Merge pull request #479 from dhairyagandhi96/master
Fix deprecations of optimisers
2018-11-05 13:01:59 +00:00
Mike J Innes
d071014fae
Merge pull request #448 from JobJob/adam-match-paper
Match paper for Adam implementation and make epsilon use more consistent
2018-11-05 12:57:30 +00:00
Mike J Innes
a9e6ace308
Merge pull request #467 from invenia/ed/diagm-pair
Add new-style diagm to tracker
2018-11-05 12:23:16 +00:00
Mike J Innes
d0e4fbb1e0 Merge branch 'master' into ed/diagm-pair 2018-11-05 11:51:29 +00:00
Mike J Innes
5df48fbc5d fix 2018-11-05 11:49:38 +00:00
Eric Davies
6b0b51e390 Stop type treason with show of the TrackedArray type 2018-11-02 16:00:58 -05:00
Joel Mason
29832aca92 Move some epsilons about 2018-11-02 22:59:04 +11:00
Dhairya Gandhi
5ec70fe29d allow array parameters to old optimisers 2018-11-01 22:17:54 +05:30
Mike J Innes
c71c610747 separate gradient library 2018-11-01 15:35:55 +00:00
Dhairya Gandhi
ca4e01ac26 use user defined decay in ADAMW 2018-11-01 15:58:40 +05:30
Dhairya Gandhi
58a6c3f225 fix deprecations 2018-11-01 15:02:00 +05:30
Avik Pal
4ba891f666
Remove unnecessary import 2018-11-01 09:37:48 +05:30
Avik Pal
c67e33f387
Make the changes backward compatible 2018-11-01 09:37:16 +05:30
Mike J Innes
43c5f90d93
Merge pull request #379 from dhairyagandhi96/master
New optimisers interface
2018-10-31 16:38:40 +00:00
Mike J Innes
b05cd41c99 require 1.0 2018-10-31 16:26:14 +00:00
Mike J Innes
46049b9f44 tweak update rule 2018-10-31 16:08:18 +00:00
Mike J Innes
554c4c7c7a return Params from params 2018-10-31 15:50:08 +00:00
Mike J Innes
4a54d30cbf correct SGD deprecation 2018-10-31 15:30:30 +00:00
Mike J Innes
bffaceee02 tweaks 2018-10-31 14:58:55 +00:00
Mike J Innes
70283e1971
Merge pull request #465 from FluxML/mji/once
Destroy AD graph when doing in-place gradients
2018-10-31 14:14:38 +00:00
Mike J Innes
9312536b96
Merge pull request #461 from Roger-luo/roger-patch-1
Support view for TrackedArray
2018-10-30 15:24:05 +00:00
Mike J Innes
77178b7d67 remove old-style definition and test 2018-10-30 14:21:22 +00:00
Avik Pal
7804d980b2
Update cudnn.jl 2018-10-30 01:08:21 +05:30
Dhairya Gandhi
bebf4eb95f fixed ExpDecay update! rule 2018-10-29 23:12:24 +05:30
Mike J Innes
59cfe3e891
Merge pull request #471 from FluxML/kf/broadcastvcheck
Add VERSION check around broadcast piracy
2018-10-29 15:20:10 +00:00
Keno Fischer
baf868e851
Add VERSION check around broadcast piracy 2018-10-28 16:07:26 -04:00
Dhairya Gandhi
32ce2d78b8 fixed ExpDecay test 2018-10-27 19:53:06 +05:30
Dhairya Gandhi
ea508a79b0 use explicit update! rule 2018-10-27 19:39:56 +05:30
Dhairya Gandhi
815e8c206d decay fixes 2018-10-27 19:26:42 +05:30
Mike J Innes
b77433cdfd 0.7 fix 2018-10-27 12:23:14 +01:00
Eric Davies
9f9803eec6 Add new-style diagm to tracker 2018-10-26 14:44:59 -05:00
Roger-luo
e5d58699e6 fix and add test 2018-10-26 14:06:17 -04:00
Mike J Innes
c21d768b7c destroy AD graph when doing in-place gradients 2018-10-26 16:57:19 +01:00
Tejan Karmali
a657c287d0 in accordance with conv_filter api 2018-10-26 11:31:34 -04:00
Mike J Innes
44ccdb7ca9 project/manifest 2018-10-26 15:39:32 +01:00
Avik Pal
b838c0bc04 Update the libcudnn_handle 2018-10-26 10:24:30 +05:30
Roger-luo
a3cda9016c apply Mike's change 2018-10-25 13:48:33 -04:00
Mike J Innes
bd004bf2db
Merge pull request #460 from avik-pal/patch-1
Add missing export for DepthwiseConv
2018-10-25 09:30:59 +01:00
Roger-luo
5f99e5775a fix #458 2018-10-24 15:40:10 -04:00
Tejan Karmali
387df8c095 conv_filter api changes 2018-10-24 13:28:22 -04:00
Tejan Karmali
fca93471b3 in accordance with conv_data api 2018-10-24 12:52:43 -04:00
Avik Pal
ec2c00783d
Add missing export for DepthwiseConv 2018-10-24 22:18:26 +05:30
Tejan Karmali
0dc4ec4d6b conv_data grad api change 2018-10-24 07:04:49 -04:00
Tejan Karmali
f540a0daf7 merge with upstream 2018-10-23 13:40:06 -04:00
Avik Pal
ccd86d4795 Merge branch 'cudnn_batchnorm' of https://github.com/avik-pal/Flux.jl into cudnn_batchnorm 2018-10-23 21:54:04 +05:30
Avik Pal
2559e7b4e6 Fix merge conflicts 2018-10-23 21:53:29 +05:30
Mike J Innes
bbccdb3eec
Merge pull request #279 from avik-pal/depthwiseconv
Adds support for Depthwise Convolutions
2018-10-23 17:22:15 +01:00
Mike J Innes
96dbae2d20 Omega and Turing fix 2018-10-23 11:30:37 +01:00
Tejan Karmali
e9bf86dbff Merge branch 'master' of https://github.com/FluxML/Flux.jl into conv_transpose 2018-10-19 02:08:25 -04:00
Sebastian Stabinger
94e5e9f993 Removes initn initialization
Is replaced with glorot_uniform for Conv following Keras
2018-10-17 17:11:16 +02:00
Mike J Innes
cb773e54c0
Merge pull request #439 from JuliaDocsForks/juliadocs/cap-documenter
Cap Documenter.jl to 0.19 on Travis
2018-10-17 10:21:31 +01:00
Morten Piibeleht
85d56ad8e9 Cap Documenter.jl to 0.19 on Travis
Documenter 0.20 will introduce breaking changes that will invalidate
existing make.jl setups. This commit makes sure that automatic Travis
builds will not use 0.20 automatically, in order to avoid sudden
documentation deployment failures once Documenter 0.20 gets tagged.

This commit has been generated by a script.
2018-10-17 16:38:35 +13:00
Avik Pal
3899907164
Update conv.jl 2018-10-11 21:39:35 +05:30
Dhairya Gandhi
edbcd3c9ea fix train! test 2018-10-11 18:52:16 +05:30
Dhairya Gandhi
1f0f2a5ac2 fixed DescentWeightDecay parameters 2018-10-11 10:21:29 +05:30
Dhairya Gandhi
d8394298bb fix merge conflicts 2018-10-11 10:15:59 +05:30
Dhairya Gandhi
fe8c147f72 fixed weight decay definition 2018-10-11 10:07:16 +05:30
Mike J Innes
ab0763fd41
Merge pull request #428 from tejank10/rnn-fixes
[WIP] Fixes for RNN tests
2018-10-10 16:58:44 +01:00
Tejan Karmali
8987e2c423 rm comments 2018-10-10 11:55:10 -04:00
Tejan Karmali
6b4bbd4fce reverted back the weights changes in rnndesc 2018-10-10 10:29:15 -04:00
Mike J Innes
9f6c3d5a2c fixes #403 2018-10-10 12:26:03 +01:00
Tejan Karmali
7b3e9c35ad changed index to view 2018-10-09 12:57:20 -04:00
Mike J Innes
3285afa45a
Merge pull request #409 from harryscholes/patch-2
Correct Custom Gradients docs
2018-10-09 14:09:09 +01:00
harryscholes
61c14afee4 Add usage example of custom gradients 2018-10-09 13:05:38 +01:00
Mike J Innes
5d3cc044cd
Merge pull request #427 from johnnychen94/master
Support copy(::TrackedArray)
2018-10-08 23:36:03 +01:00
JohnnyChen
de7623ac94 use variable assignment to do "copy" 2018-10-09 03:49:17 +08:00
JohnnyChen
eaacec852f Bug fix 2018-10-09 03:40:02 +08:00
JohnnyChen
27fec15fcc Add explicit copy(x::TrackedArray) method 2018-10-09 03:34:41 +08:00
Tejan Karmali
4d1a6c305b fixed params getting zero 2018-10-08 13:59:29 -04:00
JohnnyChen
36f5f274a5 Support copy(::TrackedArray)
1. fix issue https://github.com/FluxML/Flux.jl/issues/416
2. change test code to pass the test: some broken tests are not broken now...
2018-10-09 01:53:32 +08:00
Avik Pal
9bd2c4e006
Update curnn.jl 2018-10-06 00:00:46 +05:30
Avik Pal
d56c626725
Merge branch 'master' into cudnn_batchnorm 2018-10-06 00:00:16 +05:30
Mike J Innes
73385b5dbd
Merge pull request #372 from johnnychen94/issue-#354
Type restriction for Dense layer
2018-10-05 15:03:03 +01:00
Proyag
3b391a1af6 #389 2018-10-05 14:47:06 +01:00
Mike Innes
c6740c5cdd fix unbroadcast 2018-10-05 14:14:43 +01:00
Mike J Innes
325d2ce212
Merge pull request #418 from c-p-murphy/add-fashion-mnist
Add FashionMNIST
2018-10-05 14:05:50 +01:00
Mike Innes
61fb6cdf05 jit macro 2018-10-05 14:02:00 +01:00
Mike Innes
69afdd61a6 avoid a warning 2018-10-05 13:59:58 +01:00
Mike Innes
bfe85e65f1 compose tweaks 2018-10-05 13:52:26 +01:00
Mike Innes
0f2019eba5 compose tweaks 2018-10-05 12:57:03 +01:00
Mike Innes
9bc9771a8d tweaks 2018-10-05 12:43:03 +01:00
Mike Innes
4abe518599 newline fixes 2018-10-05 12:37:47 +01:00
Mike J Innes
f08b6f80d2
Merge pull request #422 from tejank10/cudnn_avail
cudnn_available update
2018-10-05 12:05:18 +01:00
Tejan Karmali
2ff54ee0fd cudnn_available() update 2018-10-04 11:31:29 -04:00
Christopher Murphy
73a526b1de reuse utils from mnist.jl 2018-10-03 12:40:24 -04:00
Mike J Innes
683bbec71c
Merge pull request #413 from mcabbott/patch-2
evaluate both 2-ary DiffRules only when needed
2018-10-03 12:02:12 +01:00
Mike J Innes
fe6793fde5
closes #411 2018-10-03 11:45:29 +01:00
Mike J Innes
3a7b77d104
Merge pull request #419 from r3tex/master
update utils.jl for 1.0
2018-10-03 11:21:40 +01:00
Robert Luciani
252e34e173 1.0+ updates - indices to axes, Vector init with undef 2018-10-02 21:39:00 +02:00
Christopher Murphy
95d72d7f79 update comments 2018-10-02 15:31:44 -04:00
Christopher Murphy
7e67bf06e1 update tests 2018-10-02 15:00:45 -04:00
Christopher Murphy
aff4c7898e add FashionMNIST 2018-10-01 15:26:26 -04:00
Avik Pal
f3e39a1e55 Merge branch 'master' of https://github.com/FluxML/Flux.jl 2018-10-01 09:50:30 +05:30
Dhairya Gandhi
b661db3797 added deprecations and compose 2018-10-01 05:30:53 +05:30
Michael Abbott
d25e05d9ee
evaluate both 2-ary DiffRules only when needed 2018-09-27 10:40:44 +02:00
JohnnyChen
3bf18347e0 Fix dimensional error in test 2018-09-26 22:03:38 +08:00
JohnnyChen
b20ae0546b rebase to pass the test 2018-09-26 20:30:13 +08:00
Harry
179a1e8407
Correct Custom Gradients docs
* Fixed a type signature that was incorrect.
* Also, replaced `data(a)` with `a.data`. Don't know if the syntax has changed (recently). This may also need to be corrected in line 121.

MWE:

```julia
using Flux
using Flux.Tracker
using Flux.Tracker: forward, TrackedReal, track, @grad

minus(a, b) = a - b
minus(a::TrackedReal, b::TrackedReal) = Tracker.track(minus, a, b)
@grad function minus(a, b)
    return minus(a.data, b.data), Δ -> (Δ, -Δ)
end

a, b = param(2), param(4)
c = minus(a, b)  # -2.0 (tracked)
Tracker.back!(c)

Tracker.grad(a)  # 1.00
Tracker.grad(b)  # -1.00
```
2018-09-21 16:57:54 +01:00
Mike J Innes
02ecca4c61
Merge pull request #405 from harryscholes/patch-1
Fix typo
2018-09-19 17:02:26 +01:00
Harry
079614adb2
Fix typo 2018-09-19 16:45:11 +01:00
Mike J Innes
6367cfd696
Merge pull request #404 from ornithos/add-inv-funcs
add inv/ldivide/rdivide + test
2018-09-19 15:32:49 +01:00
Alex Bird
d131853587 add inv/ldivide/rdivide + test 2018-09-19 13:08:30 +01:00
Mike J Innes
b3a08baf55
Merge pull request #400 from IsaacTay/patch-1
updated loadparams! function
2018-09-17 00:03:07 +01:00
Dhairya Gandhi
87c7e65a2d fixed Compose test 2018-09-16 17:45:29 +05:30
Dhairya Gandhi
6665189ff1 added remaining optimizers and tests 2018-09-16 17:34:51 +05:30
Isaac Tay
e803117e25
updated loadparams! function 2018-09-15 16:45:04 +08:00
Avik Pal
eb9b408c0f
Merge branch 'master' into depthwiseconv 2018-09-15 10:21:31 +05:30
Mike J Innes
9d4ee1b3aa
Merge pull request #394 from sambitdash/patch-1
The sample gradient should not use the softdash
2018-09-14 20:24:07 +01:00
Mike J Innes
08fb9b7df1
Merge pull request #397 from FluxML/nest-bcast
Nested Derivatives of Broadcast
2018-09-14 20:23:28 +01:00
Mike Innes
d797999fc5 fix sentiment model 2018-09-14 18:10:24 +01:00
Dhairya Gandhi
63bc71698b updated tests 2018-09-14 20:32:56 +05:30
Sambit Kumar Dash
8b9a98ed01
The sample gradient should not use the softdash
While softdash is a very natural and mathematical way of representation, it can be very easily confused with the apostrophe used for LinAlg adjoint. Not worth and unnecessary confusion in a first example of the code.
2018-09-11 18:58:07 +05:30
Dhairya Gandhi
4860c1d48b fixed white lines 2018-09-11 18:35:21 +05:30
Dhairya Gandhi
d933f2079b pulled tracker from upstream 2018-09-11 18:30:24 +05:30
Avik Pal
cc812a8f89 Fix tests 2018-09-11 17:30:54 +05:30
Avik Pal
dd2fa77681 Fix tests 2018-09-11 17:06:18 +05:30
Avik Pal
7d06f654f0 Fix tests 2018-09-11 16:58:05 +05:30
Avik Pal
7e7a501efd Fix tests 2018-09-11 16:32:14 +05:30
Avik Pal
c4f87ff15c Minor fixes: 2018-09-11 16:21:55 +05:30
Avik Pal
7e83852862 Fixes 2018-09-11 15:58:17 +05:30
Avik Pal
5fd8ffa47e CuRNN updates 2018-09-11 15:44:07 +05:30
Avik Pal
8bea60d980
Merge branch 'master' into cudnn_batchnorm 2018-09-11 15:34:25 +05:30
Tejan Karmali
e86365ed3f 1.0 fix for conv transpose 2018-09-08 15:44:06 -04:00
Mike J Innes
b93d4763cc
Merge pull request #391 from jekbradbury/normalise-1
1.0 compat for `normalise`
2018-09-07 11:01:23 +01:00
James Bradbury
e7783ace12 1.0 compat for normalise 2018-09-06 18:38:11 -07:00
Mike J Innes
6bbed07e96 enable nested broadcast 2018-09-07 02:05:03 +01:00
Dhairya Gandhi
0b440f16ff Merge branch 'master' of https://github.com/FluxML/Flux.jl 2018-09-06 22:48:03 +06:00
Johnny Chen
44049ce00c
Merge branch 'master' into issue-#354 2018-09-06 09:39:31 -05:00
Mike J Innes
5e4ee827e9
Merge pull request #371 from johnnychen94/issue-#323
Fix issue #323
2018-09-06 15:28:15 +01:00
Mike J Innes
395a35d137 better headings 2018-09-05 17:03:41 +01:00
Mike J Innes
193c4ded19 make docs on 1.0 2018-09-05 16:52:50 +01:00
Mike J Innes
d5d9441fc1 make docs on 1.0 2018-09-05 16:31:05 +01:00
Mike J Innes
b7eaf393fc docs updates 2018-09-05 16:01:57 +01:00
Mike J Innes
ec16a2c77d todone: nicer syntax on 0.7 2018-09-05 15:55:08 +01:00
Mike J Innes
8b71350878 make travis happy maybe 2018-09-05 15:39:00 +01:00
Mike J Innes
41cf1f2a84
Merge pull request #381 from piever/pv/docs
fix julia 1 changes in tutorial
2018-09-04 16:00:58 +01:00
Mike J Innes
2005247d5a
Merge pull request #339 from yuehhua/master
Add Maxpool and Meanpool for convention.
2018-09-04 14:52:10 +01:00
Mike J Innes
1e90226077 actually run tests 2018-09-04 14:35:20 +01:00
Mike J Innes
1e0fd07b09 use expand 2018-09-04 14:30:02 +01:00
Mike J Innes
e6be639436 Merge branch 'master' into HEAD 2018-09-04 14:03:46 +01:00
Mike J Innes
93c4a6b4b5 fixes #343 2018-09-04 13:37:54 +01:00
Pietro Vertechi
a012d0bd51 fix vecnorm in docs 2018-08-29 23:39:43 +01:00
Pietro Vertechi
abcefb8ae3 fix foldl in tutorial 2018-08-29 18:36:24 +01:00
Mike J Innes
a2d2d068aa initial sketch 2018-08-28 17:55:59 +05:30
Mike Innes
53be49b102 fix #377 2018-08-28 11:02:38 +01:00
Mike J Innes
fac06751ea
Merge pull request #361 from dhairyagandhi96/with_stop
Add stop() to train loop when callback conditions are met
2018-08-28 10:56:15 +01:00
Mike Innes
2ca189bc96 newlines 2018-08-28 10:54:50 +01:00
Dhairya Gandhi
89bca2d98d remove merge conflicts 2018-08-28 15:14:12 +05:30
Dhairya Gandhi
a964debd8a fixed example in docs 2018-08-28 15:02:47 +05:30
Johnny Chen
b35664c59f Update testsets 2018-08-25 16:30:46 +08:00
Johnny Chen
0c4fb9655a Fix a bug 2018-08-25 15:12:01 +08:00
Johnny Chen
81811a01ce Update testset for ==, ≈, and < 2018-08-25 14:52:08 +08:00
Johnny Chen
4ac76c35b0 fix MethodError for == and ≈
```julia
param([2]).^2 == [4.0]
ERROR: MethodError: ==(::TrackedArray{…,Array{Float64,1}}, ::Array{Float64,1}) is ambiguous. Candidates:
  ==(x::TrackedArray, y) in Main.Flux.Tracker at /Users/jc/.julia/dev/Flux/src/tracker/array.jl:63
  ==(A::AbstractArray, B::AbstractArray) in Base at abstractarray.jl:1686
Possible fix, define
  ==(::TrackedArray, ::AbstractArray)
```
2018-08-25 14:51:40 +08:00
Mike Innes
7d6ec2365f fixes #367 2018-08-24 14:30:39 +01:00
Mike Innes
86cf22675f rewrite broadcast 2018-08-24 14:07:08 +01:00
Mike Innes
e13d28a7a2 cruft 2018-08-24 13:44:21 +01:00
Dhairya Gandhi
c035fe22d7 added deprecation warning 2018-08-24 13:08:03 +05:30
Yueh-Hua Tu
634d34686e Add new constructors and test 2018-08-24 10:31:13 +08:00
Mike J Innes
953280d57f
Merge pull request #364 from boathit/master
fix argmax and add test
2018-08-23 15:52:06 +01:00
Mike Innes
dcde6d2217 tweaks 2018-08-23 15:44:28 +01:00
Johnny Chen
4baf85bbe2 update Testset of basic.jl 2018-08-23 22:29:03 +08:00
Johnny Chen
81e5f7c991 Update test/layers/basic.jl 2018-08-23 21:59:41 +08:00
Johnny Chen
c9d6b5648f Fix issue #354 2018-08-23 21:56:32 +08:00
Johnny Chen
6743d52d08 Fix issue #354 2018-08-23 21:34:11 +08:00
Johnny Chen
7bfe431321 Fix issue #323 2018-08-23 20:58:58 +08:00
boathit
6c97846551 rename argmax as onecold 2018-08-23 20:47:43 +08:00
Mike Innes
dfe7578216 test repeat fix 2018-08-23 11:29:43 +01:00
Mike J Innes
6c355e93d2
Merge pull request #363 from pshashk/patch-1
Fix repeat
2018-08-23 11:28:13 +01:00
Mike Innes
9d1d5187f3 fix activations for 1.0 2018-08-23 10:56:31 +01:00
boathit
33c901c191 redo 2018-08-23 16:01:42 +08:00
boathit
5dca80bd68 fix argmax and batch deprecations 2018-08-23 13:17:58 +08:00
Dhairya Gandhi
2f1a9847fa deprecate :stop from optimizers; housekeeping 2018-08-22 21:25:26 +05:30
Dhairya Gandhi
a7ad620f01 exporting stop 2018-08-22 00:33:30 +05:30
Dhairya Gandhi
3d11322d37 fixed docstring and not exporting stop 2018-08-22 00:29:07 +05:30
Dhairya Gandhi
ed044e2df7 changes as requested 2018-08-21 23:22:20 +05:30
boathit
616ed194df fix argmax and add test 2018-08-21 11:29:57 +08:00
Mike J Innes
930776eb1a
Merge pull request #352 from domluna/mnist-image-init
properly initialize MNIST array for 1.0
2018-08-20 19:57:42 +01:00
Mike Innes
216d278e7b fix mnist loader 2018-08-20 16:57:43 +01:00
Mike J Innes
86683e5991
Merge pull request #362 from FluxML/cuda-1.0
1.0 CUDA support
2018-08-20 16:09:02 +01:00
Mike Innes
3cfecaa4db test cleanup 2018-08-20 15:38:25 +01:00
Mike Innes
e68b8765b6 broadcast fixes 2018-08-20 14:41:46 +01:00
pshashk
1115eda6af
repeat fix
ERROR: UndefVarError: A not defined
2018-08-20 16:11:56 +03:00
Dhairya Gandhi
1af7a53e1f housekeeping: removed commented code 2018-08-20 18:10:20 +05:30
Mike Innes
5a023a9ccc WIP 1.0 support
closes #353
2018-08-20 13:08:04 +01:00
Mike J Innes
0ef6456903
Merge pull request #356 from domluna/recurrent-fix
recurrent bug fixes
2018-08-20 11:27:49 +01:00
Dhairya Gandhi
624dc6cb85 changed training loop test 2018-08-20 15:09:07 +05:30
Dhairya Gandhi
756207e782 added docs 2018-08-20 14:20:33 +05:30
Dhairya Gandhi
51578177a5 removed arguments from StopException 2018-08-20 14:08:23 +05:30
Dhairya Gandhi
df22bc5c8f removed argument from stop function 2018-08-20 14:02:09 +05:30
Dhairya Gandhi
06db6ed314 housekeeping: fixing typo 2018-08-20 13:48:28 +05:30
Dhairya Gandhi
394b4167ce moving stop to Optimise 2018-08-20 13:43:08 +05:30
Dhairya Gandhi
06aad375fc properly importing functions 2018-08-20 13:35:55 +05:30
Dhairya Gandhi
e239eb1105 properly importing functions 2018-08-20 13:30:05 +05:30
Dhairya Gandhi
1228e9c5e2 removed include statement 2018-08-19 22:55:14 +05:30
Dhairya Gandhi
9c98272cf0 catching exception 2018-08-19 17:38:00 +05:30
Dhairya Gandhi
257e2a7d2e checking exception 2018-08-19 17:11:11 +05:30
Dhairya Gandhi
5c42c8689c printing expception 2018-08-19 17:04:31 +05:30
Dhairya Gandhi
b0f83f93ff exported StopException 2018-08-19 16:41:13 +05:30
Dhairya Gandhi
a53a5c8350 exporting stop 2018-08-19 15:31:33 +05:30
Dhairya Gandhi
fbd82a6925 added end 2018-08-19 15:19:45 +05:30
Dhairya Gandhi
8229c8e045 modified training loop 2018-08-19 15:17:07 +05:30
Dhairya Gandhi
2aa057ec08 fixed throwing exception 2018-08-19 14:54:54 +05:30
Dominique Luna
f2021d41ac initn -> init 2018-08-18 14:18:50 -04:00
Dominique Luna
3f42301e07 recurrent bug fixes 2018-08-18 11:50:52 -04:00
Dhairya Gandhi
887bfad312 returning :stop 2018-08-18 08:28:47 +05:30
Dhairya Gandhi
65a5ecccd2 returning 2018-08-18 08:24:49 +05:30
Dhairya Gandhi
999b00b64d fixed typo 2018-08-17 19:45:10 +05:30
Dhairya Gandhi
0524964400 fixed typo 2018-08-17 19:40:48 +05:30
Dhairya Gandhi
8ad72e51ea added function to stop training 2018-08-17 19:33:51 +05:30
Dhairya Gandhi
24a3bce495 added stop to break training loop 2018-08-17 17:46:13 +05:30
Mike Innes
23af487f98 ignore manifest 2018-08-17 11:44:07 +01:00
Mike Innes
995543f648 rm dates 2018-08-17 11:44:01 +01:00
Dominique Luna
517dc58ce0 properly initialize MNIST array for 1.0 2018-08-16 18:17:43 -04:00
Mike J Innes
b460439484
Merge pull request #351 from FluxML/fbot/deps
Fix deprecations
2018-08-16 17:28:53 +01:00
Mike J Innes
4045c322d5
Merge pull request #345 from JoshChristie/update-1.0
Fix for 0.7 and 1.0 updates
2018-08-16 16:51:04 +01:00
Josh Christie
a3ab1cbb98
Merge pull request #1 from SimonDanisch/patch-1
fix copy_transpose! for cuda
2018-08-15 12:18:22 +02:00
Simon
a43127f881
fix copy_transpose! 2018-08-15 12:16:12 +02:00
femtocleaner[bot]
2d80f68087 Fix deprecations 2018-08-14 16:46:23 +00:00
ayush1999
4683e925d4 Final changes 2018-08-12 11:38:48 +01:00
Josh Christie
69ccaf044f Allow failures on nightly 2018-08-11 15:46:01 +01:00
Josh Christie
59bdff2cae Test 0.7 and 1.0 2018-08-11 14:58:29 +01:00
Josh Christie
c8307a0627 Use @info for logging 2018-08-11 14:42:33 +01:00
Josh Christie
710a65fe72 Fix back scalar with a Ref and fix diagonal test 2018-08-11 14:36:33 +01:00
ayush1999
89881a9b21 utils errors fixed 2018-08-11 14:36:33 +01:00
Avik Pal
5db7a3a3ad Fix Optimizers 2018-08-11 18:23:47 +05:30
Avik Pal
355091b9d1 Merge removing conflicts 2018-08-11 18:01:27 +05:30
Josh Christie
837e03613f Updates for julia 1.0 2018-08-11 13:23:02 +01:00
Avik Pal
d3c78a80be Fix layers errors 2018-08-11 17:20:27 +05:30
Avik Pal
4bd13c448f Add updates for julia0.7 2018-08-11 15:23:40 +05:30
Josh Christie
5186e3ba18 Updates for julia 1.0 2018-08-11 10:51:07 +01:00
Avik Pal
3b448ce1ac
Merge branch 'master' into cudnn_batchnorm 2018-08-11 15:02:55 +05:30
Avik Pal
3affed8ef0 Remove track_kw 2018-08-10 03:21:05 +05:30
Mike J Innes
62d594af43 out of place gradients for collect 2018-08-07 22:09:20 +01:00
Avik Pal
a0ec472a4b
Merge branch 'master' into depthwiseconv 2018-08-08 01:20:37 +05:30
Mike J Innes
6cdf4ff56a add statsbase to require 2018-08-03 16:22:54 +01:00
Mike J Innes
92d7e4632c
Merge pull request #269 from FluxML/julia-0.7
Julia 0.7
2018-08-03 15:24:53 +01:00
Mike J Innes
1750cda59e
Merge pull request #327 from pevnak/julia-0.7-fixes
Julia 0.7 fixes
2018-08-03 15:24:22 +01:00
Mike J Innes
7103a0ed7d tweaks 2018-08-03 15:19:10 +01:00
pevnak
926411a449 removed most error, the only one in Fallbacks test persits 2018-08-03 15:14:25 +01:00
pevnak
c657d4e47f fixed the sum as suggested by mike 2018-08-03 15:14:25 +01:00
Simon Mandlik
02f343d44d fixed more dep warns, also in tests, but maximum, minimum and size in array.jl still need to be updated. As a result, some more tests may not pass for the time being 2018-08-03 15:14:25 +01:00
Simon Mandlik
0471c489e6 depwarns 2018-08-03 15:14:25 +01:00
pevnak
3510c837a8 zeros replaced by zero 2018-08-03 15:14:25 +01:00
pevnak
ea38c7dbea some more changes 2018-08-03 15:14:25 +01:00
pevnak
d6f5baee39 fixed fixes proposed by Carlo 2018-08-03 15:14:25 +01:00
pevnak
8ab209126d removed zeros fix 2018-08-03 15:14:25 +01:00
pevnak
e98538673a updated sum to be compliant with latest beta. Removed some depwarns 2018-08-03 15:14:25 +01:00
Mike J Innes
e5b3d27016 track_kw should be unnecessary 2018-08-03 15:14:10 +01:00
Avik Pal
4d17a1a809
Merge branch 'master' into depthwiseconv 2018-08-03 19:41:50 +05:30
Avik Pal
3f6c065523 Update test 2018-08-03 19:32:21 +05:30
Avik Pal
6a41f823c8 Update track function 2018-08-03 19:06:05 +05:30
Avik Pal
b4ba7df03a Merge branch 'master' of https://github.com/FluxML/Flux.jl into cudnn_batchnorm 2018-08-03 18:55:46 +05:30
Mike Innes
f5c9361617 matmul fix 2018-08-03 13:02:47 +01:00
Mike Innes
4cf6bac0c1 fix hook 2018-08-03 13:02:47 +01:00
Mike Innes
a50432324b rm broken test 2018-08-03 13:02:47 +01:00
Mike J Innes
70718e7a64 update treelike 2018-08-03 13:02:47 +01:00
Mike J Innes
d782b33701 syntax 2018-08-03 13:02:47 +01:00
Mike J Innes
85fd77d70a linalg deprecations 2018-08-03 13:02:47 +01:00
Mike J Innes
89872c5a8b val deprecations 2018-08-03 13:02:47 +01:00
Mike J Innes
474f578517 ObjectIdDict -> IdDict 2018-08-03 13:02:47 +01:00
Mike J Innes
297bb5f44e update travis 2018-08-03 13:02:47 +01:00
Mike J Innes
e14641e4e2 rm CuArrays tests for now 2018-08-03 13:02:47 +01:00
Mike J Innes
aa209ee137 no longer needed 2018-08-03 13:02:47 +01:00
Mike J Innes
00cfe24d66 fix cat 2018-08-03 13:02:47 +01:00
Mike J Innes
adc216f182 fix broadcasting 2018-08-03 12:56:32 +01:00
Mike J Innes
e486c50610 fix data 2018-08-03 12:56:31 +01:00
Mike J Innes
fb8a220659 fix matmul 2018-08-03 12:56:31 +01:00
Mike J Innes
7057ca739e fix std usage 2018-08-03 12:56:27 +01:00
Mike J Innes
88a265154c deprecations 2018-08-03 12:54:31 +01:00
Mike J Innes
b18b51656c requires update 2018-08-03 12:54:24 +01:00
Mike J Innes
a49e2eae41 deprecated Void 2018-08-03 12:53:52 +01:00
Mike J Innes
1fd49c2a90 fix array show 2018-08-03 12:53:52 +01:00
Yueh-Hua Tu
5b37319289 Add Maxpool and Meanpool 2018-08-01 00:10:53 +08:00
Mike J Innes
a8ccc79f61 perf hacks 2018-07-30 20:08:44 +01:00
Avik Pal
2cc0f112f1 Updates 2018-07-27 20:12:49 +05:30
Avik Pal
7b2982493a Merge branch 'master' of https://github.com/FluxML/Flux.jl 2018-07-21 09:58:24 +05:30
Mike J Innes
c565317d9e
Merge pull request #328 from jjerphan/patch-1
Very Little typo.
2018-07-19 12:35:58 +01:00
Julien Jerphanion
34d0c39e72
Ditto. 2018-07-19 00:14:02 +02:00
Julien Jerphanion
ee630a8566
Very Little typo. 2018-07-18 23:20:43 +02:00
Avik Pal
b4626c20be Merge branch 'master' of https://github.com/FluxML/Flux.jl 2018-07-17 18:50:23 +05:30
Avik Pal
7dd5ec16c9 Fix 2018-07-17 11:22:12 +05:30
Avik Pal
531ecccd38 Error statement 2018-07-17 10:14:23 +05:30
Avik Pal
4035641f00 Remove imports 2018-07-17 10:06:26 +05:30
Avik Pal
8874d9cccd Fix GPU test 2018-07-17 09:53:39 +05:30
Avik Pal
da7fe93b31 Fix test 2018-07-17 09:47:45 +05:30
Avik Pal
0bb3eaa1f6 Update CUDNN Batchnorm with new Flux AD 2018-07-17 09:40:20 +05:30
Avik Pal
646db81f94 Pull BatchNorm CPU updates 2018-07-17 09:24:38 +05:30
CarloLucibello
071dcdda87 update docs 2018-07-16 07:32:13 +02:00
CarloLucibello
185e9148b6 fix cpu batchnorm 2018-07-16 07:11:33 +02:00
Avik Pal
f57db22abe Remove unnecessary file 2018-07-13 14:27:04 +05:30
Avik Pal
2664a16556 Update as per new AD 2018-07-13 14:12:46 +05:30
Avik Pal
0aabf9d86b
Merge branch 'master' into depthwiseconv 2018-07-13 14:04:19 +05:30
Mike J Innes
a0fd91b866
Merge pull request #307 from jarvist/master
Add ADAMW "Fixing Weight Decay Regularization in Adam"
2018-07-11 19:12:58 +01:00
Mike J Innes
6d8e6c0440
Merge pull request #313 from FluxML/ad-overhaul
AD Overhaul
2018-07-11 15:33:02 +01:00
Mike J Innes
dda51a0140 update docs 2018-07-11 15:31:22 +01:00
Mike Innes
10a169bb77 update cudnn rnn 2018-07-10 18:16:37 +01:00
Mike J Innes
70b5efeb4e basic nested AD 2018-07-10 09:03:09 +01:00
Mike J Innes
80af9a3830 broadcast efficiency 2018-07-09 23:40:07 +01:00
Mike J Innes
e763c342ee shave some memory 2018-07-09 19:44:14 +01:00
Mike J Innes
1430053b69 checkpoints 2018-07-09 17:52:34 +01:00
Mike J Innes
7778d17884 functional API 2018-07-09 16:57:44 +01:00
Mike J Innes
5e319c7395 fix gradient definitions 2018-07-09 13:39:10 +01:00
Mike J Innes
41b9412439 new grad api 2018-07-09 13:36:46 +01:00
Avik Pal
84f977c804 Remove comment 2018-07-09 13:35:30 +05:30
Avik Pal
b239fc684e Update tests 2018-07-04 18:57:43 +05:30
Avik Pal
c38d4edef7 Merge branch 'master' of https://github.com/FluxML/Flux.jl 2018-07-04 07:31:45 +05:30
Jarvist Moore Frost
344a750770 Merge branch 'master' of github.com:jarvist/Flux.jl into HEAD 2018-07-03 11:15:43 +01:00
Jarvist Moore Frost
aee4a83c55 Add ADAMW weight-decay.
See http://www.fast.ai/2018/07/02/adam-weight-decay/ and the original
paper https://arxiv.org/abs/1711.05101.pdf for context.

I don't know what I'm doing, and this is quite possibly wrong - but on
a simple Char-RNN I have lying around on my harddisk, this seems to
improve the rate of learning consistently for different hyperparameters
vs. standard ADAM with the same decay constant.
2018-07-03 11:11:32 +01:00
Mike J Innes
ce88273880 gradient hook 2018-07-02 13:19:13 +01:00
Mike Innes
5d8b63dc65 avoid implementation details in docs 2018-06-29 13:53:50 +01:00
Avik Pal
e3b10691d2 make cache optional param 2018-06-28 15:27:59 +05:30
Avik Pal
bcf094451c Fix typo 2018-06-28 14:45:35 +05:30
Avik Pal
d0b79e71e2 fix load error 2018-06-28 14:27:50 +05:30
Avik Pal
7ac9e191cb Revert 1 change 2018-06-28 14:25:22 +05:30
Avik Pal
5ccde88ce6 Minor fix for 5D support 2018-06-28 14:21:17 +05:30
Avik Pal
681d8c4dfc Remove cache 2018-06-28 12:11:32 +05:30
Avik Pal
8f43258ab7 Get the batchnorm working without cache 2018-06-28 12:04:25 +05:30
Avik Pal
4916c8e6da Add treelike for now 2018-06-27 14:54:49 +05:30
Mike J Innes
d76e790818
Merge pull request #306 from maetshju/pull-request/e08fd7a6
Add epsilon term to binarycrossentropy
2018-06-27 08:52:08 +01:00
Matthew Kelley
864d72eef5 Overload Base.eps() for TrackedReal 2018-06-26 23:55:43 -06:00
Matthew Kelley
0e95be3326 Call Flux.Tracker.data() on ŷ for bce 2018-06-26 14:48:51 -06:00
Matthew Kelley
ed032cdb1e Change epsilon value to eps(ŷ) 2018-06-26 12:29:06 -06:00
Matthew Kelley
e08fd7a6d2 Added epsilon term to binarycrossentropy 2018-06-26 11:43:16 -06:00
Mike J Innes
bed6d2311e clearer docs 2018-06-26 16:07:58 +01:00
Mike J Innes
88c16e62dd fixes #284 2018-06-26 15:09:26 +01:00
Mike J Innes
836e3872b6 style 2018-06-26 15:09:21 +01:00
Mike J Innes
2723c9ee04
Merge pull request #257 from staticfloat/sf/back_inf_nan
Check for `Inf` and `NaN` within `back!(::TrackedReal)`
2018-06-26 14:42:33 +01:00
Mike J Innes
d6a75e1289 add activations docs 2018-06-26 14:35:03 +01:00
Mike J Innes
0a04e3ba61 Chain activations 2018-06-26 14:30:46 +01:00
Mike J Innes
4d7548b7a3 Merge commit '1490d87d8387a078a29a336cb37fd7269240179e' 2018-06-26 14:25:36 +01:00
Mike J Innes
1490d87d83 tweaks 2018-06-26 14:25:24 +01:00
Kade
aa8f79f10c Mention CUDAnative.jl's install instructions 2018-06-26 14:22:50 +01:00
Mike J Innes
134ac1586b
Merge pull request #237 from tejank10/scalar_pad_stride
Scalar pad and stride
2018-06-26 14:18:12 +01:00
Mike J Innes
7726a5b605 inferrable 2018-06-26 14:12:57 +01:00
Mike J Innes
3b575930ca Merge branch 'master' into scalar_pad_stride 2018-06-26 14:05:07 +01:00
Mike Innes
7e3cf45ee4 better error 2018-06-25 11:36:52 +01:00
Avik Pal
9a168528de Add tests to make sure CPU and GPU versions have similar outputs 2018-06-23 11:03:15 +05:30
Avik Pal
24ba1c4e6c Make changes as per the review 2018-06-23 11:02:41 +05:30
Avik Pal
f29377123e Add tests for CuDNN BatchNorm 2018-06-22 18:19:18 +05:30
Mike J Innes
aea1e73cde scalar gradients 2018-06-21 13:12:42 +01:00
Avik Pal
91850a8baf Add missing path to curnn.jl 2018-06-20 18:46:42 +05:30
Avik Pal
a4e35e9e91 Adjust atol in tests 2018-06-20 16:22:25 +05:30
Avik Pal
deb4950261 Make cuDNN take only 4D arrays 2018-06-20 15:54:38 +05:30
Avik Pal
3339ad5181 Integrate cudnn BatchNorm with Flux 2018-06-20 15:50:30 +05:30
Avik Pal
714ca23aba Change default value of epsilon to prevent CuDNN BatchNorm warnings 2018-06-20 12:11:22 +05:30
Avik Pal
185f34d9fe Add working backward pass 2018-06-20 12:09:54 +05:30
Avik Pal
bc47d02b3f Remove uncessary imports 2018-06-17 12:40:01 +05:30
Avik Pal
af5ab7f9ef Fix Tensor Descriptor Bug 2018-06-17 12:28:02 +05:30
Avik Pal
c6dcf079ce Update file structure and make function calls correct 2018-06-17 11:47:49 +05:30
Mike J Innes
ac1448f677
Update README.md 2018-06-13 11:13:48 +01:00
Avik Pal
24d13ac326 Fix missing parenthesis 2018-06-12 21:32:56 +05:30
Avik Pal
f12e367cab Adding untested backward pass code 2018-06-12 18:26:09 +05:30
Avik Pal
a83e5d696d Typo 2018-06-12 17:51:52 +05:30
Avik Pal
d4b066fdf9 Forward Pass for BatchNorm Added 2018-06-12 17:49:21 +05:30
Avik Pal
85158d632b Comment out the test 2018-06-11 16:00:20 +05:30
Avik Pal
5c6b066bd9 Merge branch 'depthwiseconv' of https://github.com/avik-pal/Flux.jl into depthwiseconv 2018-06-11 15:41:21 +05:30
Avik Pal
65f2c33991
Merge pull request #2 from FluxML/master
rebase
2018-06-11 15:40:57 +05:30
Avik Pal
2ef4397cbb
Merge pull request #1 from FluxML/master
Merge master
2018-06-11 15:39:39 +05:30
Avik Pal
4a639687de Typo 2018-06-09 18:59:54 +05:30
Avik Pal
6b294736f9 Add Depthwise Convolution is Docs 2018-06-09 14:19:47 +05:30
Mike J Innes
8f7ee76752
Merge pull request #290 from tejank10/patch-1
Default value of dilation
2018-06-09 08:55:35 +01:00
Avik Pal
b59da95786 Merge branch 'depthwiseconv' of https://github.com/avik-pal/Flux.jl into depthwiseconv 2018-06-09 13:11:42 +05:30
Avik Pal
5d7ee884b8 Fix error while backpropagatio 2018-06-09 13:04:49 +05:30
Avik Pal
7f3d11cae0
Merge branch 'master' into depthwiseconv 2018-06-09 11:06:07 +05:30
Avik Pal
1d93fb8e59 Add new constructor and fix a typo in display 2018-06-09 11:02:15 +05:30
Tejan Karmali
d20771d6be
Default value of dilation
dilation should be 1 by default
2018-06-09 02:29:46 +05:30
Mike J Innes
9345607c38
Merge pull request #224 from tejank10/nadam-opt
NADAM optimizer
2018-06-08 12:28:07 +01:00
Tejan Karmali
4a24b69976
Merge branch 'master' into nadam-opt 2018-06-08 16:54:41 +05:30
Mike J Innes
4915b0c8dd
Merge pull request #268 from staticfloat/patch-2
Add `dilation` kwarg to `Conv`
2018-06-07 13:49:02 +01:00
Mike J Innes
af8f3348eb
Merge pull request #270 from staticfloat/sf/tracked_repeat
Add `TrackedArray` support for `repeat(x; inner, outer)`
2018-06-06 17:34:58 +01:00
Mike J Innes
422b31c8f7
Merge pull request #276 from kleskjr/patch-1
Solves Issue #262
2018-06-06 17:33:49 +01:00
Mike Innes
2370bdbe91 see #205 2018-06-06 17:01:28 +01:00
kleskjr
dd3af0c9f7
add 'using Flux: crossentropy'
Following the suggestion from MikeInnes to use 'using Flux: crossentropy' instead 'Flux.crossentropy'
2018-06-05 14:30:14 +02:00
Mike J Innes
1105e3ac20
Merge pull request #266 from jekbradbury/patch-1
Use broadcast for dropout
2018-05-31 17:18:53 +01:00
Avik Pal
52a50b2727 Add tests 2018-05-30 17:12:16 +05:30
Avik Pal
33a7f545b7
Merge branch 'master' into depthwiseconv 2018-05-30 15:58:35 +05:30
Avik Pal
cd6a0856d5 Adds support for Depthwise Convolutions 2018-05-30 15:53:57 +05:30
kleskjr
3a73902379
Solves Issue #262
Makes  the running of the basic examples smoother (Issue #262).
crossentropy is not exported by Flus so, either explicit reference or explicit export should be add to run the examples.
2018-05-25 13:54:17 +02:00
staticfloat@gmail.com
f390a39d77 Add TrackedArray support for repeat(x; inner, outer) 2018-05-22 17:41:05 -07:00
Elliot Saba
e6efca4bf4 Add dilation kwarg to Conv
Now that we have dilated convolution support in `NNlib`, this is enables support in Flux's `Conv` layer.
2018-05-21 13:44:13 -07:00
James Bradbury
af12f006f2
Use broadcast for dropout
Should be fast enough on GPU now that it's not going to be an optimization target again for a while. Hopefully isn't meaningfully slower on CPU?
2018-05-20 04:04:33 -07:00
staticfloat@gmail.com
9fdbe843ef Check for Inf and NaN within back!(::TrackedReal)
This is often checked for within user code, no reason to do that, let's
do it for them within `back!(::TrackedReal)`
2018-05-07 15:30:44 -07:00
Mike J Innes
e92f840510
Merge pull request #255 from FluxML/ad-docs
AD Docs
2018-05-07 16:14:25 +01:00
Mike Innes
5685df1691 tracker docs 2018-05-07 16:12:55 +01:00
Mike J Innes
24ad384a38
Merge pull request #243 from gustafsson/catdim
Support for hcat and cat
2018-05-07 13:04:31 +01:00
Mike Innes
ef9077d9fa style 2018-05-07 13:03:52 +01:00
Mike Innes
b59161a41e export Tracker again 2018-05-05 17:15:18 +01:00
Mike Innes
b35b27be6e doc fix 2018-05-04 15:05:02 +01:00
Mike J Innes
2d3f00da29
Update README.md 2018-05-03 18:50:28 +01:00
Mike J Innes
180c2433fe
Update README.md 2018-05-03 18:34:03 +01:00
Mike J Innes
cfbead633d
Update README.md 2018-05-03 14:14:53 +01:00
Johan Gustafsson
5fc6190956 RowVector tests 2018-05-02 16:10:39 +02:00
Johan Gustafsson
94bb064a0f more tests of array promotion for concatenation
# Conflicts:
#	test/tracker.jl
2018-05-02 16:00:29 +02:00
Johan Gustafsson
cfdb16e609 vcat test #213
Co-authored-by: improbable22 <improbable+github@gmail.com>
2018-05-02 16:00:29 +02:00
Johan Gustafsson
1c189c62ed cat with multiple dims #156
Co-authored-by: americast <sayan.sinha@iitkgp.ac.in>
2018-05-02 15:59:46 +02:00
Johan Gustafsson
fb68529169 define back function right after forward function 2018-05-02 15:59:46 +02:00
Johan Gustafsson
509a2e59f6 cat promotions and mixed ranks 2018-05-02 15:59:46 +02:00
Johan Gustafsson
eaaf5fd34c vcat arrays with ndims>2 2018-05-02 15:59:46 +02:00
Johan Gustafsson
bcef5c4ab5 Support hcat and cat 2018-05-02 15:59:46 +02:00
Johan Gustafsson
13daaec1cb Refactored tests 2018-05-02 15:59:57 +02:00
Johan Gustafsson
59324c0f91 hcat tests #194
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
2018-05-02 15:59:46 +02:00
Johan Gustafsson
51e7e1b40f cat tests #184
Co-authored-by: pevnak <pevnak@gmail.com>
2018-05-02 15:59:46 +02:00
Mike J Innes
7d7d89569c rm this deprecation for 0.6 2018-05-01 12:20:36 +01:00
Mike J Innes
3870027c91 Merge commit '9a7e6e9c' 2018-05-01 12:19:05 +01:00
Mike J Innes
9a7e6e9c5c hold off on some things 2018-05-01 12:18:56 +01:00
CarloLucibello
e186b958dd more exports 2018-05-01 12:13:14 +01:00
Mike J Innes
ee89a7797e
Merge pull request #245 from freeboson/adamax
Add AdaMax optimizer
2018-05-01 11:28:07 +01:00
Mike J Innes
5efbaddb97
Merge pull request #249 from ninjin/nin/minimum
[RFC] Backpropagation for `maximum` and `minimum`
2018-04-30 18:40:42 +01:00
Mike J Innes
4fb6bc7fea add note on metalhead 2018-04-30 18:04:13 +01:00
Mike J Innes
73a51400b6 better error message 2018-04-30 12:09:15 +01:00
Pontus Stenetorp
cfd29b9c76 Backpropagation for maximum and minimum 2018-04-29 13:52:54 +01:00
Sujeet Akula
8c042bd522
element wise max() 2018-04-26 21:12:31 +10:00
Sujeet Akula
5e5f255f81
export typo 2018-04-26 17:42:04 +10:00
Sujeet Akula
4586bda5ab
export/test adamax 2018-04-26 17:40:11 +10:00
Sujeet Akula
b6508e2416
add adamax 2018-04-26 17:37:24 +10:00
Mike J Innes
159ca536ec Update README.md 2018-04-25 11:55:21 +01:00
Iblis Lin
5faf5da171
doc: add GRU to layer reference 2018-04-18 17:36:10 +08:00
Mike J Innes
baff20514d gpu broadcast fix 2018-04-17 18:05:58 +01:00
Mike J Innes
8f73dc6e14 fix gpu cross entropy 2018-04-17 17:56:47 +01:00
tejank10
2ef25775c6 removed extra expand and fixed bug 2018-04-16 01:18:26 +05:30
Mike Innes
d12fb98f2a nicer batchnorm shape error 2018-04-15 20:29:25 +01:00
tejank10
2f5473d435 added expand in conv constructor 2018-04-16 00:59:11 +05:30
Mike J Innes
8f29968c32
Merge pull request #207 from safnuk/pull-request/07b0f95d
BatchNorm for convolutions
2018-04-15 20:10:33 +01:00
Mike J Innes
683a73fed3 download info 2018-04-15 20:09:30 +01:00
Mike J Innes
5fd240f525 interface tweaks 2018-04-15 20:04:42 +01:00
Mike J Innes
73a0be3e04 Merge branch 'master' into pull-request/07b0f95d 2018-04-15 17:10:29 +01:00
Mike J Innes
642543808e
Merge pull request #226 from CarloLucibello/reshape
fix reshape
2018-04-15 16:53:21 +01:00
tejank10
b080f5c82e Scalar pad and stride 2018-04-15 20:32:40 +05:30
Mike J Innes
cb3ae8df6a rename normalise.jl 2018-04-15 15:45:46 +01:00
Mike J Innes
b05e755068 rm jit from cuda 2018-04-15 15:08:58 +01:00
tejank10
5cc681317a added stride for pooling in tracker 2018-04-15 15:07:04 +01:00
tejank10
f6097d58d6 Scalar pad/stride for Conv constructor 2018-04-15 12:15:41 +05:30
Mike Innes
0ba5ce4601 update readme 2018-04-14 02:15:44 +01:00
Mike Innes
0fd701c7fe update paper 2018-04-14 02:09:35 +01:00
Mike Innes
9d7164f15f we'll do this differently 2018-04-14 02:09:35 +01:00
Mike Innes
2ff7843bca readme logo 2018-04-11 19:39:32 +01:00
tejank10
65847bb745 moved epsilon into sqrt 2018-04-04 15:25:20 +05:30
tejank10
3ead662987 Update rule fixed 2018-04-04 15:18:44 +05:30
CarloLucibello
b415333233 fix reshape 2018-04-02 16:09:57 -04:00
tejank10
ea9b5471fa NADAM optimizer 2018-04-03 01:27:22 +05:30
Brad Safnuk
b9a66c679d Fix error in initialization of σ. 2018-03-22 22:20:21 -04:00
Brad Safnuk
35299d4621 Fix type instability when loading onto a gpu.
Also fixes Issue #216.
2018-03-22 21:32:32 -04:00
Mike J Innes
4320738d87 fix 2018-03-21 11:25:47 +00:00
Mike J Innes
d09eb8a8e7
Merge pull request #211 from iblis17/patch-1
minor update for gpu.md code block
2018-03-19 12:20:00 +00:00
Iblis Lin
07ed439005
minor update for gpu.md code block 2018-03-19 20:03:31 +08:00
Mike Innes
1c5f8e3534 ndims for shapes 2018-03-16 14:42:08 +00:00
Mike Innes
80e378d6c9 update readme 2018-03-16 12:23:08 +00:00
Brad Safnuk
07b0f95d61 Tests for batch norm with 2D and 3D convolutions. 2018-03-15 22:52:09 -04:00
Brad Safnuk
db2d9efb72 Update BatchNorm documentation 2018-03-15 21:59:38 -04:00
Brad Safnuk
6653ec86d9 Allow multidimensional inputs to batchnorm.
Can be used in conjunction with convolutional layers, in addition
to dense layers, with the same api.
2018-03-15 21:48:59 -04:00
Mike J Innes
e931552f7d
Merge pull request #200 from chengchingwen/repmat
implement `back` of `repmat`
2018-03-15 15:18:48 +00:00
Mike J Innes
5d7edb5aaa
Merge pull request #197 from chengchingwen/master
Implement `prod` for `TrackedArray`
2018-03-15 15:17:24 +00:00
Mike J Innes
72f13834f5
Merge pull request #204 from boathit/master
eliminate ambiguous
2018-03-14 15:34:41 +00:00
boathit
2ec37790be eliminate ambiguity 2018-03-13 10:50:56 +08:00
boathit
ff2caf032c eliminate ambiguous 2018-03-12 22:48:16 +08:00
Mike J Innes
9ccbac8b80 jit gpu support 2018-03-07 19:18:27 +00:00
chengchingwen
43af3895b0 change prod implementation to avoid small xs 2018-03-07 21:03:13 +08:00
chengchingwen
c00f7f850f implement back of repmat 2018-03-07 20:43:59 +08:00
chengchingwen
7c721475c6 add gradient check for prod and fix dims in back(::typeof(prod),...) 2018-03-07 16:24:44 +08:00
Mike Innes
261c6db371 fix test 2018-03-06 20:55:01 +00:00
Mike Innes
d21c313ea7 tweaks 2018-03-06 19:58:47 +00:00
Mike Innes
36baa7ec2c convnet primitives 2018-03-06 19:58:05 +00:00
Mike Innes
0802b4d5cf closes #198 2018-03-06 16:56:01 +00:00
Mike J Innes
1a0ddbe4f1
Merge pull request #191 from staticfloat/sf/cachedmodels
Use `cache.julialang.org` to store CMU phones
2018-03-06 16:47:19 +00:00
Elliot Saba
6445295318 Better download detection 2018-03-06 08:45:45 -08:00
Elliot Saba
19f691d342 Use cache.julialang.org to store ML models
It's annoying that when third party servers go down our tests break.
Let's at least make sure that if our tests break due to server outages
it's our fault.
2018-03-06 08:03:21 -08:00
Mike Innes
3babeeb440 scalar hashing 2018-03-06 13:49:05 +00:00
chengchingwen
86d782a5ce implement prod for TrackedArray 2018-03-06 18:01:19 +08:00
Mike Innes
eab26be0af optimiser state 2018-03-06 09:41:06 +00:00
Mike J Innes
7fe8308831 bson api 2018-03-06 03:12:42 +00:00
Mike Innes
c95a97f6ae make epochs available 2018-03-06 03:01:56 +00:00
Mike Innes
646e90aae2 checkpointing 2018-03-06 03:01:40 +00:00
Mike Innes
432b9c3222 loadparams! 2018-03-06 02:45:31 +00:00
Mike Innes
6f0d5a44ec model saving docs 2018-03-05 23:52:04 +00:00
Mike Innes
65ed95190a fix 2018-03-05 23:44:25 +00:00
Mike Innes
bfd6a4c0ec cleaner interrupts 2018-03-05 23:05:45 +00:00
Mike Innes
5153cde847 move epochs 2018-03-05 22:56:22 +00:00
Mike J Innes
5c7f856115 cpu docs 2018-03-05 19:25:43 +00:00
Mike J Innes
6b69edfe6c gpu docs 2018-03-05 19:23:13 +00:00
Mike J Innes
662439c164 closes #177 2018-03-05 17:24:46 +00:00
Mike J Innes
eaa9fd2dd3
Merge pull request #187 from staticfloat/patch-1
Add `permutedims()` for tracked arrays
2018-03-02 18:23:19 +00:00
Elliot Saba
36295799ee Add permutedims() for tracked arrays 2018-03-02 10:22:28 -08:00
Mike J Innes
bdd8162bf8 typo 2018-03-01 16:52:58 +00:00
Mike J Innes
cf7dd34767 harder test 2018-03-01 16:37:52 +00:00
Mike J Innes
8019f789f8 use normal log 2018-03-01 16:35:49 +00:00
Mike J Innes
ac57fc3c26 use @ fix in a few places 2018-03-01 16:31:20 +00:00
Mike J Innes
5e84d52ee7 broken test 2018-02-28 23:18:49 +00:00
Mike J Innes
c2fea2acf6 revert this 2018-02-28 23:06:53 +00:00
Mike J Innes
2eb38eedbf update gpu api 2018-02-28 22:51:08 +00:00
Mike J Innes
ccef9f4dd4 jit softmax 2018-02-28 22:07:35 +00:00
Mike J Innes
7606b1a399 single-batch convolution 2018-02-28 14:25:32 +00:00
Mike J Innes
6bdc2b37a9 inline call 2018-02-28 13:47:14 +00:00
Mike J Innes
4606339a57 nd pooling tests 2018-02-28 13:00:38 +00:00
Mike J Innes
3ac6a8ef05 n-d conv tests 2018-02-28 12:20:00 +00:00
Mike J Innes
a401f08cda compile layers 2018-02-27 22:40:51 +00:00
Mike J Innes
5a32976cbf basic compile step 2018-02-27 21:43:41 +00:00
Mike J Innes
bdb8aae107 move cache logic 2018-02-27 21:41:03 +00:00
Mike J Innes
2c74976602 more general 2018-02-27 01:25:40 +00:00
Mike J Innes
466b5c501a cpu/gpu conveniences 2018-02-26 23:10:59 +00:00
Mike J Innes
15d1d3256b conv api updates 2018-02-26 22:43:07 +00:00
Mike J Innes
54919b8dca rm deprecation 2018-02-22 00:23:02 +00:00
Mike J Innes
491785a681 ignore state in mapleaves 2018-02-22 00:22:51 +00:00
Mike J Innes
ec65e2cec7 fix printing 2018-02-22 00:21:48 +00:00
Mike J Innes
af2e6b7e1d fix 2018-02-22 00:15:38 +00:00
Mike J Innes
99b739cf00 fixes #176 2018-02-21 23:21:20 +00:00
Mike J Innes
e3b4b16e01
Merge pull request #178 from schmrlng/pull-request/e6f55641
Convert OneHot CuArrays to dense CuArrays before passing to CUDNN methods
2018-02-21 22:34:11 +00:00
Mike J Innes
6bdd283fbd no longer necessary 2018-02-21 22:29:31 +00:00
Mike J Innes
46f90a6aaf
Merge pull request #181 from iblis17/ib/reexport
introduce Reexport
2018-02-21 22:19:08 +00:00
Mike J Innes
aa82679bef ignore latex 2018-02-21 22:17:15 +00:00
Iblis Lin
043fedde3c
introduce Reexport
- Reexporting NNlib

fix #180
2018-02-21 16:55:20 +08:00
Ed Schmerling
e6f556411a Convert OneHot CuArrays to dense CuArrays before passing to CUDNN methods 2018-02-19 17:32:15 -08:00
Mike J Innes
4035745f6e may help numerical tests 2018-02-19 12:51:02 +00:00
Mike J Innes
d22270050c tweaks 2018-02-18 12:50:51 +00:00
Mike J Innes
989adcdc7d gpu fix 2018-02-17 12:41:53 +00:00
Mike J Innes
11511982a4 numerical stability 2018-02-17 11:56:03 +00:00
Mike J Innes
dda545a24a unnecessary 2018-02-17 11:22:37 +00:00
Mike J Innes
e5791bc5f6 frequencies utility 2018-02-17 11:19:51 +00:00
Mike J Innes
440caa81e7 slack badge 2018-02-16 16:49:36 +00:00
Mike J Innes
91aadf6584 joss paper 2018-02-16 16:39:52 +00:00
Mike J Innes
e3b31b9b87
Merge pull request #169 from jessebett/jessechanges
Reshape with Tuple Dimensions and Kronecker Product
2018-02-16 14:16:42 +00:00
Mike J Innes
60f21d3ff2 don't override base method 2018-02-16 14:15:40 +00:00
Mike J Innes
81ab91418d move train docs 2018-02-16 12:22:53 +00:00
Mike J Innes
1908b4f451 document epochs 2018-02-16 11:23:23 +00:00
Mike J Innes
5e861101f3 epochs util 2018-02-16 11:17:57 +00:00
Mike J Innes
7aa6854c64 more correct 2018-02-16 00:06:15 +00:00
Mike J Innes
ee3784964e fix for external modules 2018-02-15 22:27:00 +00:00
Mike J Innes
63862c2324 easier initialisation with weights 2018-02-15 20:52:29 +00:00
Mike J Innes
01c31e7fcc conv bias 2018-02-15 20:15:41 +00:00
Mike J Innes
bdd07a8bc6 fix 2018-02-14 22:34:11 +00:00
Mike J Innes
1b8b1cd7b1 check params by identity 2018-02-14 21:00:50 +00:00
Mike J Innes
c1ed3e477e more consistent terminology 2018-02-13 17:17:08 +00:00
Mike J Innes
5ea0ef6764 tracker fix 2018-02-13 16:15:36 +00:00
Mike J Innes
1baa7227e3 reorganise batches 2018-02-13 16:05:07 +00:00
Mike J Innes
34217b1fa2 Merge branch 'treebank' 2018-02-13 15:44:27 +00:00
Mike J Innes
d12120207d
Merge pull request #165 from boathit/master
Register back! for logsigmoid and implement (logit)binarycrossentropy
2018-02-13 14:56:53 +00:00
Mike J Innes
49584fb72b rm logsigmoid 2018-02-13 14:52:29 +00:00
Mike J Innes
2f29733888 Merge branch 'master' into HEAD 2018-02-13 14:45:37 +00:00
Mike J Innes
8432d8db06 batchnorm fix 2018-02-13 14:02:35 +00:00
Mike J Innes
820cd3ae42 fixes #164 2018-02-13 13:31:35 +00:00
Mike J Innes
066cb45a38 remove old accuracy fn 2018-02-13 11:12:21 +00:00
Mike J Innes
236edbffec fixes #111 2018-02-13 10:20:38 +00:00
jessebett
fb5b8c7952 Kron test on Vectors also fails occasionally, but also not my method. 2018-02-12 17:33:47 -05:00
jessebett
0732d7db00 Added kron test, kron isn't consistently passing them. 2018-02-12 17:27:10 -05:00
Mike J Innes
f22cfb5b43 re-enable printf 2018-02-12 15:05:09 +00:00
Mike J Innes
334ae9e1cb fixes #171 2018-02-12 12:31:15 +00:00
Mike J Innes
0b3c02fe8d document regularisation, fixes #160 2018-02-09 19:00:26 +00:00
Mike J Innes
0e0057b0c4 basics 2018-02-09 13:51:07 +00:00
jessebett
f84ee8eab0 reshape with tupled dimensions and kronecker product 2018-02-08 14:27:57 -05:00
Mike J Innes
70fbbf48fa humble beginnings of compiler 2018-02-08 18:11:26 +00:00
Mike J Innes
fc157a8c59 TrackedNumber -> TrackedReal 2018-02-08 17:18:40 +00:00
Mike J Innes
d1c56ca768 number fix 2018-02-08 17:04:48 +00:00
Mike J Innes
0f7a1ec022 test params funct 2018-02-08 16:13:20 +00:00
Mike J Innes
961de2ba44
Merge pull request #161 from FluxML/curnn
WIP: CUDNN RNNs
2018-02-08 13:06:52 +00:00
Iblis Lin
f7fdfbe3a9 fix params 2018-02-08 12:56:10 +00:00
Mike J Innes
356ebc4e13 deterministic tests 2018-02-08 10:33:51 +00:00
Mike J Innes
fcbdc49d6b fix reserve usage 2018-02-08 10:27:26 +00:00
Mike J Innes
bc452fcd81 rewrite tests 2018-02-08 02:37:55 +00:00
Mike J Innes
d592f4e327 batch support 2018-02-08 01:45:48 +00:00
Mike J Innes
b8f148b012 hook up backward passes 2018-02-08 00:49:39 +00:00
Mike J Innes
a1d1930097 Merge branch 'master' into curnn 2018-02-07 23:23:02 +00:00
Mike J Innes
4511936a87 fixes #116 2018-02-07 23:21:04 +00:00
Mike J Innes
dc15d6c155
Merge pull request #168 from FluxML/ad
Separate AD infrastructure from types
2018-02-07 23:09:17 +00:00
Mike J Innes
0ac924e8e1 fixups 2018-02-07 22:52:46 +00:00
Mike J Innes
39f7f8fdf3 tracked tuples 2018-02-07 22:21:42 +00:00
Mike J Innes
79e4e25fea seperate number type 2018-02-07 20:39:36 +00:00
Mike J Innes
282889970d seperate tracking infrastructure from array wrapper 2018-02-07 17:43:25 +00:00
Mike J Innes
30b3437c56 backward passes 2018-02-06 18:56:17 +00:00
Mike J Innes
f866fbe575 nullable c refactor 2018-02-06 15:01:48 +00:00
Mike J Innes
07e1b1e0a9 avoid val 2018-02-06 12:44:18 +00:00
boathit
7e37a96c6f Register back! for logsigmoid and implement (logit)binarycrossentropy 2018-02-06 19:36:16 +08:00
boathit
6e65789828 Register back! for logsigmoid and implement (logit)binarycrossentropy 2018-02-06 19:32:46 +08:00
Mike J Innes
f9be72f545 logsoftmax tests 2018-02-05 18:50:59 +00:00
Mike J Innes
a4bf5936b0 diagm 2018-02-05 18:29:35 +00:00
Mike J Innes
2fec75005d
Merge pull request #123 from GenaBitu/cat-fix
Added vcat for multiple TrackedVectors
2018-02-05 18:10:48 +00:00
Mike J Innes
c6b12217bd make some tests less trivial 2018-02-05 18:10:02 +00:00
Mike J Innes
47cebab26e test multiple inputs/dims 2018-02-05 18:09:54 +00:00
Mike J Innes
2a2475a9c2 get tracker graph 2018-02-05 17:40:07 +00:00
Mike J Innes
14086b8c2d train forward pass 2018-02-02 17:48:08 +00:00
Mike J Innes
9a6fcf057b hook up interface 2018-02-02 16:42:18 +00:00
Mike J Innes
b1c5786012 Merge branch 'master' into curnn 2018-02-02 15:56:44 +00:00
Mike J Innes
49e1e78f67 make data/value available 2018-02-02 15:56:04 +00:00
Mike J Innes
62c3376feb link fix 2018-02-02 15:56:04 +00:00
Mike J Innes
0f1e7b5578 update rnn structure 2018-02-01 20:57:39 +00:00
Mike J Innes
106502a75d typo 2018-01-31 21:57:04 +00:00
Mike J Innes
af3ccf85ff coagulate gates 2018-01-31 16:56:27 +00:00
Mike J Innes
8ad837bb70 LSTM 2018-01-31 14:15:57 +00:00
Mike J Innes
4bfb603da6 gru forward 2018-01-31 13:46:55 +00:00
Mike J Innes
b1bb05403c basic forward pass 2018-01-30 18:18:37 +00:00
Mike J Innes
0b886507dc param offsets 2018-01-30 14:43:39 +00:00
Mike J Innes
af0c5523ff rnnTrainingReserveSize 2018-01-30 14:43:39 +00:00
Mike J Innes
3fb83d642d rnnWorkspaceSize 2018-01-30 14:43:39 +00:00
Mike J Innes
6b4e114d5d rnnParamSize 2018-01-30 14:43:39 +00:00
Mike J Innes
ee6c3e18a9 basic RNNDesc 2018-01-30 14:43:39 +00:00
Mike J Innes
842bf03051 typo 2018-01-30 14:43:05 +00:00
Mike J Innes
0c9549c469 rm lazy 2018-01-24 13:28:52 +00:00
Mike J Innes
5118ef9163 remove batching work for now 2018-01-24 13:12:38 +00:00
Mike J Innes
2b545cf0ca gpu testing message 2018-01-24 13:12:26 +00:00
Mike J Innes
ed8d026723
Merge pull request #153 from boathit/master
Registering backward function for logsoftmax
2018-01-23 13:54:55 +00:00
boathit
374d7a5f1e Registering backward function for logsoftmax 2018-01-21 15:20:59 +08:00
Mike J Innes
72eabde373 load data 2018-01-17 16:39:55 +00:00
Mike J Innes
bd57359535 docstrings 2018-01-17 16:12:12 +00:00
Mike J Innes
8cca7accf2 mnist 2018-01-17 15:55:37 +00:00
Mike J Innes
140d50208f recommend update 2018-01-17 13:56:19 +00:00
Mike J Innes
4207fb98f2 basic GPU tests 2018-01-16 17:58:14 +00:00
GenaBitu
096e20c5af
Added vcat(...) test 2018-01-16 11:08:45 +01:00
GenaBitu
bc8a32bc56
Merge branch 'master' into cat-fix 2018-01-16 11:01:31 +01:00
Mike J Innes
1beb30e19a closes #118 2018-01-15 17:00:47 +00:00
Mike J Innes
872d5b902c
Update README.md 2018-01-15 16:17:14 +00:00
Mike J Innes
b0b8c9dbd1
Merge pull request #112 from baggepinnen/gru
Implement Gated Recurrent Unit
2018-01-10 14:13:01 +00:00
Mike J Innes
8f8589a7f4 fix initialisation 2018-01-10 14:11:52 +00:00
Mike J Innes
b44237468e Merge branch 'master' into gru 2018-01-10 13:59:33 +00:00
Mike J Innes
805cb9178f fixes #146 2018-01-10 12:48:50 +00:00
Mike J Innes
daba42c595
Merge pull request #144 from mtikekar/master
fix typo in conv.jl (fixes #133)
2018-01-10 12:32:50 +00:00
Mehul Tikekar
2fef799109 fix typo in conv.jl (fixes #133) 2018-01-08 16:46:58 -05:00
Mike J Innes
468f641f66 use Adapt 2018-01-08 16:34:22 +00:00
Mike J Innes
9cfa459516
Merge pull request #138 from rdeits/forwarddiff
explicit ForwardDiff version requirement
2018-01-04 10:35:11 +00:00
Robin Deits
7ac6b3fccf explicit forwarddiff requirement 2018-01-03 14:41:20 -05:00
Mike J Innes
98b362729d pool padding 2017-12-18 18:18:14 +00:00
Mike J Innes
6b6974e14a
Merge pull request #128 from FluxML/conv
Convolution API
2017-12-18 18:09:27 +00:00
Mike J Innes
e3577d759c conv docs 2017-12-18 18:05:48 +00:00
Mike J Innes
269d8f36b9 conv padding 2017-12-18 18:05:38 +00:00
Mike J Innes
51f93d9f0e conv polish 2017-12-15 16:24:45 +00:00
Mike J Innes
386eafc443 reshape 2017-12-15 16:18:16 +00:00
Mike J Innes
73ae25289d remove old util 2017-12-15 16:18:01 +00:00
Mike J Innes
6890a61587 todo 2017-12-15 16:17:45 +00:00
Mike J Innes
9b833a4345 more onehot indexing 2017-12-15 16:17:39 +00:00
Mike J Innes
9d0dd9fb7e layer wip 2017-12-15 13:22:57 +00:00
Mike J Innes
0bf22dfb8e pool gradients 2017-12-15 02:29:14 +00:00
Mike J Innes
d949b31aa5 conv gradient 2017-12-15 02:24:32 +00:00
Mike J Innes
5b97d2ba04 closes #127 2017-12-13 18:24:56 +00:00
Mike J Innes
23096824d5 import jacobian 2017-12-13 17:29:32 +00:00
Mike J Innes
9c7c9d2342
Merge pull request #124 from baggepinnen/jacobian
Add jacobian function
2017-12-13 17:07:57 +00:00
Mike J Innes
95d1287455 Merge branch 'master' into jacobian 2017-12-13 17:06:23 +00:00
Mike J Innes
27d896943e
Merge pull request #120 from staticfloat/sf/dense_initialization
Better default initialization for Dense layers
2017-12-13 16:18:02 +00:00
Mike J Innes
9926202408
Merge pull request #122 from staticfloat/sf/weighted_crossentropy
Add `weighted_crossentropy` for imbalanced classification problems
2017-12-13 15:29:54 +00:00
Mike J Innes
e3a688e706 use kwarg 2017-12-13 15:27:15 +00:00
Mike J Innes
128725cefd Merge branch 'master' into sf/weighted_crossentropy 2017-12-13 15:14:47 +00:00
Mike J Innes
29787eba45 fixes #114 2017-12-12 17:23:15 +00:00
Mike J Innes
b7b6c975bc fixes #110 2017-12-12 17:07:39 +00:00
Mike J Innes
403cc26327 Merge branch 'master' into gru 2017-12-12 16:54:00 +00:00
Mike J Innes
86097e76fd tweak batchnorm example 2017-12-08 19:34:34 +00:00
Mike J Innes
de69d23901
Merge pull request #84 from iblis17/norm-layer
layer: implement BatchNorm layer
2017-12-08 19:32:55 +00:00
Mike J Innes
6f997e798a Merge branch 'master' into batchnorm 2017-12-08 19:31:50 +00:00
Mike J Innes
1d916c81b5 Merge branch 'master' into HEAD 2017-12-08 18:31:55 +00:00
Mike J Innes
e01c706e71
Merge pull request #119 from baggepinnen/amsgrad
Amsgrad
2017-12-08 18:24:54 +00:00
Mike J Innes
55bbe50f32 regression test 2017-12-08 18:24:07 +00:00
Mike J Innes
24a6569589 Merge branch 'master' into amsgrad 2017-12-08 18:20:53 +00:00
Mike J Innes
9c61cf61ef
Merge pull request #94 from CarloLucibello/dropout
improve optimizers
2017-12-08 17:14:54 +00:00
Mike J Innes
69cc5642b4 regression testing 2017-12-08 17:10:29 +00:00
Mike J Innes
f82dbf4798 Merge branch 'master' into HEAD 2017-12-08 17:00:31 +00:00
Mike J Innes
951c21366a fix regex 2017-12-08 16:42:30 +00:00
GenaBitu
7e51418679
Added back for multi-parameter vcat 2017-12-08 16:10:09 +01:00
baggepinnen
385dee9d16 Add jacobian function 2017-12-08 14:46:12 +01:00
GenaBitu
41f3eedc39
Proper multi-variable vcat 2017-12-07 17:50:18 +01:00
Elliot Saba
41446d547f Add weighted_crossentropy for imbalanced classification problems 2017-12-05 17:09:05 -08:00
Elliot Saba
c59b820bed Add glorot (Xavier) initialization
Set default `Dense` and `RNN` inits to `glorot_uniform()` for `W`, `zeros` for `b`.
2017-12-05 14:24:48 -08:00
GenaBitu
62b3600eca
Merge branch 'master' into cat-fix 2017-12-05 11:13:29 +01:00
baggepinnen
41febee9c1 Export and indent 2017-12-04 09:34:27 +01:00
baggepinnen
36001d085a Implement AMSGrad optimiser 2017-12-04 09:17:05 +01:00
Mike J Innes
cab235a578 gpu compat 2017-11-30 13:51:31 +00:00
Mike J Innes
19039f4881 export sigmoid 2017-11-30 13:37:38 +00:00
Mike J Innes
2d33f19346 onehot unk arg 2017-11-29 16:45:50 +00:00
baggepinnen
fa718c7475 Implement Gated Recurrent Unit 2017-11-24 14:33:06 +01:00
CarloLucibello
13b934c250 improve optimizers 2017-11-24 12:12:20 +01:00
Mike J Innes
dc1f08a709
Merge pull request #98 from FluxML/log
GPU-ready log function
2017-11-23 17:17:39 +00:00
Mike J Innes
9f5c4dd3e9
Merge pull request #104 from baggepinnen/patch-1
Allow array of optimisers to train!
2017-11-21 17:16:35 +01:00
Mike J Innes
feb35783e6
Merge pull request #95 from FluxML/layernorm
Layer Normalisation
2017-11-21 17:12:49 +01:00
Mike J Innes
351d3d4771 std derivative 2017-11-21 17:04:04 +01:00
Mike J Innes
b06884b912 LayerNorm tweaks 2017-11-21 16:32:36 +01:00
skariel
11d53781b2 adding layer normalization 2017-11-21 16:30:24 +01:00
Mike J Innes
979949d01a style 2017-11-21 15:25:09 +01:00
Mike J Innes
785fbcf68e
Merge pull request #107 from baggepinnen/patch-2
Fix bug in rmsprop and adadelta
2017-11-21 15:24:11 +01:00
Mike J Innes
e51268caf5 mention treelike 2017-11-21 12:59:39 +01:00
Mike J Innes
187fddc11c doc fixes 2017-11-21 12:29:02 +01:00
Fredrik Bagge Carlson
8991ce028c
Fix bug in rmsprop and adadelta
`@. p.Δ = η * p.Δ / √acc` parses correctly while `@. p.Δ /= √acc*η` seems to parse like `@. p.Δ /= (√acc*η)`, hence the step size was de facto interpreted as `1/η`
2017-11-14 17:32:16 +01:00
Mike J Innes
e0657d93ec mv numeric.jl to nnlib 2017-11-09 15:06:29 +00:00
Mike J Innes
2cb94981a0 gpu-ready log 2017-11-09 15:04:01 +00:00
Mike J Innes
e5d99d784e fixes #79 2017-11-09 14:53:26 +00:00
Mike J Innes
ccdc046546 fixes #79 2017-11-09 14:52:28 +00:00
Mike J Innes
752a9e2808 tree utilities 2017-11-08 22:19:01 +00:00
Mike J Innes
6eb2ec154b sentiment treebank loader 2017-11-08 22:19:01 +00:00
Mike J Innes
8777362eee exports 2017-11-08 22:19:01 +00:00
Mike J Innes
8b05317895 basic tree 2017-11-08 22:19:01 +00:00
Mike J Innes
7e9468d8f8 treebank skeleton 2017-11-08 22:19:01 +00:00
Mike J Innes
bdf02e42ae test tweaks 2017-11-08 22:18:45 +00:00
Mike J Innes
fcd091e8f0 Ac_mul_B derivatives 2017-11-08 22:18:45 +00:00
Mike J Innes
d4229c4815 useful params method 2017-11-08 22:18:45 +00:00
Mike J Innes
d6423eefe5 matrix-vector fast path 2017-11-08 22:18:45 +00:00
Fredrik Bagge Carlson
97244e0a68
Allow array of optimisers to train!
This allows an array of optimisers to be sent to `train!`
2017-11-04 13:27:32 +01:00
Mike J Innes
efa51f02e7 basic batch type 2017-11-02 11:49:42 +00:00
Mike J Innes
21ea93ffcd rename treelike 2017-11-02 11:47:34 +00:00
Iblis Lin
6c7613e02b batchnorm: leverage TrackedArray mean 2017-11-02 14:20:34 +08:00
Iblis Lin
88bd8a8fbd batchnorm: make CuArrays happy 2017-11-02 14:02:41 +08:00
Iblis Lin
477da75428 batchnorm: fix mapchildren 2017-11-02 13:32:12 +08:00
Iblis Lin
7f5ba594a9 batchnorm: more test cases 2017-11-02 13:32:12 +08:00
Iblis Lin
5253841acc batchnorm: update docs 2017-11-02 13:32:12 +08:00
Iblis Lin
ce46843459 batchnorm: add test cases 2017-11-02 13:32:12 +08:00
Iblis Lin
b3356cc6bb batchnorm: batch σ correct coefficient 2017-11-02 13:32:12 +08:00
Iblis Lin
e0201be770 batchnorm: parameterize momentum and epsilon 2017-11-02 13:32:12 +08:00
Iblis Lin
669273b008 layer: implement BatchNorm layer
See [Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf)
2017-11-02 13:32:12 +08:00
Mike J Innes
e7a510da9a add cmudict dataset 2017-11-01 16:01:55 +00:00
Mike J Innes
0f8ba87dc6 treelike tuples 2017-10-31 16:37:41 +00:00
Mike J Innes
e943a39ee7 combine special cases 2017-10-31 16:37:33 +00:00
Iblis Lin
3d8b7250ae add scalar mean 2017-10-31 10:42:32 +00:00
Mike J Innes
1186170542
Merge pull request #99 from iblis17/mean
TrackedArray: implement `mean`
2017-10-30 16:37:08 +00:00
Iblis Lin
c43bda019b TrackedArray: implement mean
```julia
julia> p
Tracked 2×3 Array{Float64,2}:
 1.0  3.0  5.0
 2.0  4.0  6.0
```

Before
```julia
julia> @benchmark Flux.Tracker.back!(sum($p, 2) ./ size($p, 2), ones(2, 1))
BenchmarkTools.Trial:
  memory estimate:  3.44 KiB
  allocs estimate:  75
  --------------
  minimum time:     20.438 μs (0.00% GC)
  median time:      21.239 μs (0.00% GC)
  mean time:        22.354 μs (1.68% GC)
  maximum time:     3.811 ms (98.51% GC)
  --------------
  samples:          10000
  evals/sample:     1
```

After
```julia
julia> @benchmark Flux.Tracker.back!(mean($p, 2), ones(2, 1))
BenchmarkTools.Trial:
  memory estimate:  1008 bytes
  allocs estimate:  21
  --------------
  minimum time:     5.973 μs (0.00% GC)
  median time:      6.310 μs (0.00% GC)
  mean time:        6.630 μs (1.96% GC)
  maximum time:     680.709 μs (97.28% GC)
  --------------
  samples:          10000
  evals/sample:     6
```
2017-10-30 16:21:02 +08:00
Mike J Innes
4c1b1eb18c Merge pull request #92 from CarloLucibello/drop
add Dropout layer
2017-10-26 12:07:28 +01:00
Mike J Innes
84efbbcc84 tracker predicate tweaks 2017-10-26 12:06:29 +01:00
Mike J Innes
cf6b930f63 reorganise 2017-10-26 11:46:12 +01:00
Mike J Innes
0df300299f clearer error message, fixes #93 2017-10-26 11:15:14 +01:00
GenaBitu
df06c3351d
Merge branch 'master' into cat-fix 2017-10-26 00:52:29 +02:00
CarloLucibello
711ea09d99 address comments 2017-10-25 02:35:27 +02:00
CarloLucibello
536ab3861d setmode! -> testmode! 2017-10-23 16:23:29 +02:00
Mike J Innes
6768afda29 minor wording 2017-10-23 13:07:07 +01:00
CarloLucibello
00a9e5f01f construct TrackedScalar with params(1) 2017-10-23 10:49:45 +01:00
CarloLucibello
86c7c9246e add == and < for tracked arrays 2017-10-23 11:41:08 +02:00
CarloLucibello
2e1ed4c3fc add dropout 2017-10-23 10:12:53 +02:00
Mike J Innes
2a66545ef8 rnn state reset 2017-10-19 17:21:08 +01:00
Mike J Innes
99a7697d13 adam eta default arg 2017-10-19 14:31:34 +01:00
Mike J Innes
e5c8f6d835 only export known good optimisers 2017-10-19 11:26:11 +01:00
Mike J Innes
5b6a5667ed tracked array restructure 2017-10-18 22:54:58 +01:00
Mike J Innes
c8d4844da4 chunk util 2017-10-18 17:07:58 +01:00
Mike J Innes
07ad7cfa40 learning rate as default arg 2017-10-18 17:07:49 +01:00
Mike J Innes
e82428bb83 batching docs 2017-10-18 16:40:14 +01:00
Mike J Innes
b817ce632c syntax highlighting 2017-10-18 15:44:06 +01:00
Mike J Innes
fd249b773e rnn docs 2017-10-18 15:30:05 +01:00
Mike J Innes
897f812055 remove this until nnlib release 2017-10-18 14:52:48 +01:00
Mike J Innes
190f48a709 nnlib docs 2017-10-18 14:40:58 +01:00
Mike J Innes
12944ae125 nnlib exports 2017-10-18 12:56:58 +01:00
Mike J Innes
0fbc8dff61 typoe 2017-10-18 12:48:58 +01:00
Mike J Innes
d6dd27dae5 dense layer example 2017-10-18 12:47:45 +01:00
Mike J Innes
92f65f91c5 update community page 2017-10-18 12:39:58 +01:00
Mike J Innes
55ba236da6 model layers note 2017-10-18 12:31:06 +01:00
Mike J Innes
c4166fd725 optimiser clarity 2017-10-18 12:22:45 +01:00
Mike J Innes
7426faf37d optimiser docs 2017-10-18 12:09:48 +01:00
GenaBitu
2084df96ae
Merge branch 'master' into cat-fix 2017-10-06 15:00:26 +02:00
GenaBitu
136f9bbf74
Hack which doesn't break backprop 2017-09-22 11:47:04 +02:00
GenaBitu
a5fe5b6e65
Added multi-variable vcat for TrackedVector 2017-09-22 11:22:21 +02:00
87 changed files with 7061 additions and 746 deletions

2
.gitattributes vendored Normal file
View File

@ -0,0 +1,2 @@
paper/* linguist-documentation
CITATION.bib linguist-detectable=false

1
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1 @@
custom: https://numfocus.salsalabs.org/donate-to-julia/index.html

12
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,12 @@
[Please delete this text and describe your change here.
For bugfixes, please detail the bug and include a test case which your patch fixes.
If you are adding a new feature, please clearly describe the design, its rationale, the possible alternatives considered.
It is easiest to merge new features when there is clear precedent in other systems; we need to know we're taking
the right direction since it can be hard to change later.]
### PR Checklist
- [ ] Tests are added
- [ ] Entry in NEWS.md
- [ ] Documentation, if applicable
- [ ] Final review from `@MikeInnes` or `@dhairyagandhi96` (for API changes).

16
.github/workflows/CompatHelper.yml vendored Normal file
View File

@ -0,0 +1,16 @@
name: CompatHelper
on:
schedule:
- cron: '00 00 * * *'
jobs:
CompatHelper:
runs-on: ubuntu-latest
steps:
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: julia -e 'using CompatHelper; CompatHelper.main()'

11
.github/workflows/TagBot.yml vendored Normal file
View File

@ -0,0 +1,11 @@
name: TagBot
on:
schedule:
- cron: 0 * * * *
jobs:
TagBot:
runs-on: ubuntu-latest
steps:
- uses: JuliaRegistries/TagBot@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}

3
.gitignore vendored
View File

@ -3,5 +3,4 @@
*.jl.mem
docs/build/
docs/site/
docs/flux.css
demos
deps

41
.gitlab-ci.yml Normal file
View File

@ -0,0 +1,41 @@
include:
- 'https://raw.githubusercontent.com/JuliaGPU/gitlab-ci/master/templates/v6.yml'
image: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
# julia:1.0:
# extends:
# - .julia:1.0
# - .test
# tags:
# - nvidia
#
# julia:1.1:
# extends:
# - .julia:1.1
# - .test
# tags:
# - nvidia
#
# julia:1.2:
# extends:
# - .julia:1.2
# - .test
# tags:
# - nvidia
julia:1.3:
extends:
- .julia:1.3
- .test
tags:
- nvidia
julia:nightly:
extends:
- .julia:nightly
- .test
tags:
- nvidia
allow_failure: true

View File

@ -1,14 +1,32 @@
# Documentation: http://docs.travis-ci.com/user/languages/julia/
language: julia
os:
- linux
# - osx
julia:
- 0.6
# uncomment the following lines to override the default test script
- 1.3
- 1
- nightly
notifications:
email: false
jobs:
include:
- stage: "Documentation"
julia: 1.3
os: linux
script:
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd()));
Pkg.instantiate()'
- julia --project=docs/ docs/make.jl
after_success: skip
allow_failures:
- julia: nightly
## uncomment the following lines to override the default test script
script:
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
- julia -e 'Pkg.clone(pwd()); Pkg.build("Flux"); Pkg.test("Flux"; coverage=true)'
after_success:
- julia -e 'Pkg.add("Documenter")'
- julia -e 'cd(Pkg.dir("Flux")); include(joinpath("docs", "make.jl"))'
- julia --color=yes -e 'using Pkg; Pkg.activate(); Pkg.instantiate(); Pkg.test()'

29
CITATION.bib Normal file
View File

@ -0,0 +1,29 @@
@article{Flux.jl-2018,
author = {Michael Innes and
Elliot Saba and
Keno Fischer and
Dhairya Gandhi and
Marco Concetto Rudilosso and
Neethu Mariya Joy and
Tejan Karmali and
Avik Pal and
Viral Shah},
title = {Fashionable Modelling with Flux},
journal = {CoRR},
volume = {abs/1811.01457},
year = {2018},
url = {http://arxiv.org/abs/1811.01457},
archivePrefix = {arXiv},
eprint = {1811.01457},
timestamp = {Thu, 22 Nov 2018 17:58:30 +0100},
biburl = {https://dblp.org/rec/bib/journals/corr/abs-1811-01457},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{innes:2018,
author = {Mike Innes},
title = {Flux: Elegant Machine Learning with Julia},
journal = {Journal of Open Source Software},
year = {2018},
doi = {10.21105/joss.00602},
}

View File

@ -1,6 +1,6 @@
The Flux.jl package is licensed under the MIT "Expat" License:
> Copyright (c) 2016: Mike Innes.
> Copyright (c) 2016-19: Julia Computing, INc., Mike Innes and Contributors
>
> Permission is hereby granted, free of charge, to any person obtaining
> a copy of this software and associated documentation files (the

387
Manifest.toml Normal file
View File

@ -0,0 +1,387 @@
# This file is machine-generated - editing it directly is not advised
[[AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "051c95d6836228d120f5f4b984dd5aba1624f716"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "0.5.0"
[[AbstractTrees]]
deps = ["Markdown"]
git-tree-sha1 = "33e450545eaf7699da1a6e755f9ea65f14077a45"
uuid = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
version = "0.3.3"
[[Adapt]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "fd04049c7dd78cfef0b06cdc1f0f181467655712"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "1.1.0"
[[ArrayLayouts]]
deps = ["FillArrays", "LinearAlgebra"]
git-tree-sha1 = "a504dca2ac7eda8761c8f7c1ed52427a1be75a3c"
uuid = "4c555306-a7a7-4459-81d9-ec55ddd5c99a"
version = "0.2.6"
[[Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
[[BinaryProvider]]
deps = ["Libdl", "Logging", "SHA"]
git-tree-sha1 = "ecdec412a9abc8db54c0efc5548c64dfce072058"
uuid = "b99e7846-7c00-51b0-8f62-c81ae34c0232"
version = "0.5.10"
[[CEnum]]
git-tree-sha1 = "1b77a77c3b28e0b3f413f7567c9bb8dd9bdccd14"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.3.0"
[[CUDAapi]]
deps = ["Libdl", "Logging"]
git-tree-sha1 = "831b825d10104bd29e28f6da93312a976830717b"
uuid = "3895d2a7-ec45-59b8-82bb-cfc6a382f9b3"
version = "4.0.0"
[[CUDAdrv]]
deps = ["CEnum", "CUDAapi", "Printf"]
git-tree-sha1 = "f56bbf18c86bcff7a961a32a4947a5abb2963a29"
uuid = "c5f51814-7f29-56b8-a69c-e4d8f6be1fde"
version = "6.3.0"
[[CUDAnative]]
deps = ["Adapt", "BinaryProvider", "CEnum", "CUDAapi", "CUDAdrv", "ExprTools", "GPUCompiler", "LLVM", "Libdl", "Pkg", "Printf"]
git-tree-sha1 = "ac86db2b05fdfec96b011e25a504ffe7476e8a68"
uuid = "be33ccc6-a3ff-5ff2-a52e-74243cff1e17"
version = "3.1.0"
[[CodeTracking]]
deps = ["InteractiveUtils", "UUIDs"]
git-tree-sha1 = "cab4da992adc0a64f63fa30d2db2fd8bec40cab4"
uuid = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
version = "0.5.11"
[[CodecZlib]]
deps = ["TranscodingStreams", "Zlib_jll"]
git-tree-sha1 = "ded953804d019afa9a3f98981d99b33e3db7b6da"
uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
version = "0.7.0"
[[ColorTypes]]
deps = ["FixedPointNumbers", "Random"]
git-tree-sha1 = "c73d9cfc2a9d8433dc77f5bff4bddf46b1d78c20"
uuid = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
version = "0.10.3"
[[Colors]]
deps = ["ColorTypes", "FixedPointNumbers", "InteractiveUtils", "Reexport"]
git-tree-sha1 = "1e9bba7984e78aa8cdeea7f9f7cc984ad4e4b1c7"
uuid = "5ae59095-9a9b-59fe-a467-6f913c188581"
version = "0.12.2"
[[CommonSubexpressions]]
deps = ["Test"]
git-tree-sha1 = "efdaf19ab11c7889334ca247ff4c9f7c322817b0"
uuid = "bbf7d656-a473-5ed7-a52c-81e309532950"
version = "0.2.0"
[[CompilerSupportLibraries_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "7c4f882c41faa72118841185afc58a2eb00ef612"
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "0.3.3+0"
[[Cthulhu]]
deps = ["CodeTracking", "InteractiveUtils", "REPL", "UUIDs", "Unicode"]
git-tree-sha1 = "f3643e78353199d3097821e806348bd83f364155"
uuid = "f68482b8-f384-11e8-15f7-abe071a5a75f"
version = "1.1.1"
[[CuArrays]]
deps = ["AbstractFFTs", "Adapt", "CEnum", "CUDAapi", "CUDAdrv", "CUDAnative", "DataStructures", "GPUArrays", "Libdl", "LinearAlgebra", "MacroTools", "NNlib", "Pkg", "Printf", "Random", "Reexport", "Requires", "SparseArrays", "Statistics", "TimerOutputs"]
git-tree-sha1 = "1582b74d2322df7dd94549d4ac9d095e0f20e884"
uuid = "3a865a2d-5b23-5a0f-bc46-62713ec82fae"
version = "2.2.1"
[[DataAPI]]
git-tree-sha1 = "176e23402d80e7743fc26c19c681bfb11246af32"
uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
version = "1.3.0"
[[DataStructures]]
deps = ["InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "af6d9c86e191c917c2276fbede1137e8ea20157f"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.17.17"
[[Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"
[[DelimitedFiles]]
deps = ["Mmap"]
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
[[DiffResults]]
deps = ["StaticArrays"]
git-tree-sha1 = "da24935df8e0c6cf28de340b958f6aac88eaa0cc"
uuid = "163ba53b-c6d8-5494-b064-1a9d43ac40c5"
version = "1.0.2"
[[DiffRules]]
deps = ["NaNMath", "Random", "SpecialFunctions"]
git-tree-sha1 = "eb0c34204c8410888844ada5359ac8b96292cfd1"
uuid = "b552c78f-8df3-52c6-915a-8e097449b14b"
version = "1.0.1"
[[Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"
[[ExprTools]]
git-tree-sha1 = "6f0517056812fd6aa3af23d4b70d5325a2ae4e95"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.1"
[[FillArrays]]
deps = ["LinearAlgebra", "Random", "SparseArrays"]
git-tree-sha1 = "44f561e293987ffc84272cd3d2b14b0b93123d63"
uuid = "1a297f60-69ca-5386-bcde-b61e274b549b"
version = "0.8.10"
[[FixedPointNumbers]]
git-tree-sha1 = "3ba9ea634d4c8b289d590403b4a06f8e227a6238"
uuid = "53c48c17-4a7d-5ca2-90c5-79b7896eea93"
version = "0.8.0"
[[ForwardDiff]]
deps = ["CommonSubexpressions", "DiffResults", "DiffRules", "NaNMath", "Random", "SpecialFunctions", "StaticArrays"]
git-tree-sha1 = "869540e4367122fbffaace383a5bdc34d6e5e5ac"
uuid = "f6369f11-7733-5829-9624-2563aa707210"
version = "0.10.10"
[[Functors]]
deps = ["MacroTools"]
git-tree-sha1 = "f40adc6422f548176bb4351ebd29e4abf773040a"
uuid = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
version = "0.1.0"
[[Future]]
deps = ["Random"]
uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"
[[GPUArrays]]
deps = ["AbstractFFTs", "Adapt", "LinearAlgebra", "Printf", "Random", "Serialization"]
git-tree-sha1 = "d887693eb1bd5e1fd573262a978745481895ec7d"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "3.4.1"
[[GPUCompiler]]
deps = ["Cthulhu", "DataStructures", "InteractiveUtils", "LLVM", "Libdl", "TimerOutputs"]
git-tree-sha1 = "5275aa268ecd09640b32560e1eae90c78816e4d1"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.2.0"
[[IRTools]]
deps = ["InteractiveUtils", "MacroTools", "Test"]
git-tree-sha1 = "90ee39f9beaaa186e4968417ea2b8ed5673c91c0"
uuid = "7869d1d1-7146-5819-86e3-90919afe41df"
version = "0.3.3"
[[InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"
[[Juno]]
deps = ["Base64", "Logging", "Media", "Profile"]
git-tree-sha1 = "a686b0cf235fa3e491b79b4783c2d2382292b436"
uuid = "e5e0dc1b-0480-54bc-9374-aad01c23163d"
version = "0.8.2"
[[LLVM]]
deps = ["CEnum", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "dd3f584c3dbefe39b2a8fbafa1a3b77e31e21255"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "1.5.1"
[[LibGit2]]
deps = ["Printf"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"
[[Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"
[[LinearAlgebra]]
deps = ["Libdl"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
[[Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"
[[MacroTools]]
deps = ["Markdown", "Random"]
git-tree-sha1 = "f7d2e3f654af75f01ec49be82c231c382214223a"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.5"
[[Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"
[[Media]]
deps = ["MacroTools", "Test"]
git-tree-sha1 = "75a54abd10709c01f1b86b84ec225d26e840ed58"
uuid = "e89f7d12-3494-54d1-8411-f7d8b9ae1f27"
version = "0.5.0"
[[Missings]]
deps = ["DataAPI"]
git-tree-sha1 = "de0a5ce9e5289f27df672ffabef4d1e5861247d5"
uuid = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
version = "0.4.3"
[[Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"
[[NNlib]]
deps = ["BinaryProvider", "Libdl", "LinearAlgebra", "Requires", "Statistics"]
git-tree-sha1 = "d9f196d911f55aeaff11b11f681b135980783824"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.6.6"
[[NaNMath]]
git-tree-sha1 = "928b8ca9b2791081dc71a51c55347c27c618760f"
uuid = "77ba4419-2d1f-58cd-9bb1-8ffee604a2e3"
version = "0.3.3"
[[OpenSpecFun_jll]]
deps = ["CompilerSupportLibraries_jll", "Libdl", "Pkg"]
git-tree-sha1 = "d51c416559217d974a1113522d5919235ae67a87"
uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
version = "0.5.3+3"
[[OrderedCollections]]
git-tree-sha1 = "12ce190210d278e12644bcadf5b21cbdcf225cd3"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.2.0"
[[Pkg]]
deps = ["Dates", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "UUIDs"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
[[Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"
[[Profile]]
deps = ["Printf"]
uuid = "9abbd945-dff8-562f-b5e8-e1ebf5ef1b79"
[[REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
[[Random]]
deps = ["Serialization"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
[[Reexport]]
deps = ["Pkg"]
git-tree-sha1 = "7b1d07f411bc8ddb7977ec7f377b97b158514fe0"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "0.2.0"
[[Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "d37400976e98018ee840e0ca4f9d20baa231dc6b"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.0.1"
[[SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
[[Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"
[[Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"
[[SortingAlgorithms]]
deps = ["DataStructures", "Random", "Test"]
git-tree-sha1 = "03f5898c9959f8115e30bc7226ada7d0df554ddd"
uuid = "a2af1166-a08f-5f64-846c-94a0d3cef48c"
version = "0.3.1"
[[SparseArrays]]
deps = ["LinearAlgebra", "Random"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
[[SpecialFunctions]]
deps = ["OpenSpecFun_jll"]
git-tree-sha1 = "d8d8b8a9f4119829410ecd706da4cc8594a1e020"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "0.10.3"
[[StaticArrays]]
deps = ["LinearAlgebra", "Random", "Statistics"]
git-tree-sha1 = "5c06c0aeb81bef54aed4b3f446847905eb6cbda0"
uuid = "90137ffa-7385-5640-81b9-e52037218182"
version = "0.12.3"
[[Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
[[StatsBase]]
deps = ["DataAPI", "DataStructures", "LinearAlgebra", "Missings", "Printf", "Random", "SortingAlgorithms", "SparseArrays", "Statistics"]
git-tree-sha1 = "a6102b1f364befdb05746f386b67c6b7e3262c45"
uuid = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
version = "0.33.0"
[[Test]]
deps = ["Distributed", "InteractiveUtils", "Logging", "Random"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[[TimerOutputs]]
deps = ["Printf"]
git-tree-sha1 = "f458ca23ff80e46a630922c555d838303e4b9603"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.6"
[[TranscodingStreams]]
deps = ["Random", "Test"]
git-tree-sha1 = "7c53c35547de1c5b9d46a4797cf6d8253807108c"
uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
version = "0.9.5"
[[UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
[[Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"
[[ZipFile]]
deps = ["Libdl", "Printf", "Zlib_jll"]
git-tree-sha1 = "254975fef2fc526583bb9b7c9420fe66ffe09f2f"
uuid = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
version = "0.9.2"
[[Zlib_jll]]
deps = ["Libdl", "Pkg"]
git-tree-sha1 = "a2e0d558f6031002e380a90613b199e37a8565bf"
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
version = "1.2.11+10"
[[Zygote]]
deps = ["AbstractFFTs", "ArrayLayouts", "DiffRules", "FillArrays", "ForwardDiff", "Future", "IRTools", "InteractiveUtils", "LinearAlgebra", "MacroTools", "NNlib", "NaNMath", "Random", "Requires", "SpecialFunctions", "Statistics", "ZygoteRules"]
git-tree-sha1 = "707ceea58e2bd0ff3077ab13a92f8355181d3ee4"
uuid = "e88e6eb3-aa80-5325-afca-941959d7151f"
version = "0.4.20"
[[ZygoteRules]]
deps = ["MacroTools"]
git-tree-sha1 = "b3b4882cc9accf6731a08cc39543fbc6b669dca8"
uuid = "700de1a5-db45-46bc-99cf-38207098b444"
version = "0.2.0"

61
NEWS.md Normal file
View File

@ -0,0 +1,61 @@
# v0.11
* Change to `DataLoader`'s constructor [https://github.com/FluxML/Flux.jl/pull/1152]
* Use `DataLoader` with `NamedTuple`s, so that tensors can be accessed by name [https://github.com/FluxML/Flux.jl/pull/1221].
* Error if Dense layers weights and biases are not arrays [https://github.com/FluxML/Flux.jl/pull/1218].
# v0.10.5
* Add option for [same padding](https://github.com/FluxML/Flux.jl/pull/901) to conv and pooling layers by setting `pad=SamePad()`.
* Added option to set `bias` to [Flux.Zeros](https://github.com/FluxML/Flux.jl/pull/873) to eliminating `bias` from being trained.
* Added `GlobalMaxPool` and `GlobalMeanPool` [layers](https://github.com/FluxML/Flux.jl/pull/950) for performing global pooling operations.
* Added `ClipValue` and `ClipNorm` in this [pr](https://github.com/FluxML/Flux.jl/pull/1133) to `Flux.Optimise` to provide a cleaner API for gradient clipping.
* Added new kwarg-only [constructors](https://github.com/FluxML/Flux.jl/pull/873) for the various convolutional layers.
* Documented the convolutional layer constructors accepting `weight` and `bias` keyword arguments to supply custom arrays for those fields.
* Testing suite improvements now test for gradients of all layers along with GPU support.
* Functors have now moved to [Functors.jl](https://github.com/FluxML/Flux.jl/pull/1174) to allow for their use outside of Flux.
* Added [helper functions](https://github.com/FluxML/Flux.jl/pull/873) `Flux.convfilter` and `Flux.depthwiseconvfilter` to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations.
# v0.10.0
* The default AD engine has switched from [Tracker to Zygote.jl](https://github.com/FluxML/Flux.jl/pull/669)
- The dependency on Tracker.jl has been removed.
- This means Flux now does not depend on using a specialised `TrackedArray` type, and can be used with normal Array implementations directly.
- Tracker compatibility is maintained in most common cases, but Zygote will be the preferred AD backend for Flux from now on.
* The CUDNN wrappers have been [moved from Flux into CuArrays](https://github.com/FluxML/Flux.jl/pull/874), to allow for better supporting the CUDA backend, and improve user experience, not to mention making Flux lean.
* `*crossentropy` functions now [work as expected with CuArrays](https://github.com/FluxML/Flux.jl/pull/926). [PR for binarycrossentropy](https://github.com/FluxML/Flux.jl/pull/940).
* Added [clearer docs](https://github.com/FluxML/Flux.jl/pull/904) around training and the Optimiser interface.
* [Layer initialisations](https://github.com/FluxML/Flux.jl/pull/937) have been improved with a clearer API on how to extend it for other purposes.
* [Better messaging around CUDA availability](https://github.com/FluxML/Flux.jl/pull/924), with hooks to initialize the GPU as default where possible.
* `@treelike` has been formalised as a [functor](https://github.com/FluxML/Flux.jl/pull/865), with an effective deprecation.
* `testmode!` is deprecated in favour of [istraining](https://github.com/FluxML/Flux.jl/pull/669)
# v0.9.0
* [Depthwise convolutional layer API changes](https://github.com/FluxML/Flux.jl/pull/756) from `in => mult` channel specification to `in => out` channel specification, and deprecates implicit `out` constructor.
* New [SkipConnection](https://github.com/FluxML/Flux.jl/pull/446), which can be used to train residual neural network architectures.
* New [RADAM](https://github.com/FluxML/Flux.jl/pull/842) optimiser.
# v0.8.0
* [Dropout now has a `dims` argument for specifying the unbroadcast dimensions.](https://github.com/FluxML/Flux.jl/pull/563)
* New [ConvTranspose layer](https://github.com/FluxML/Flux.jl/pull/311).
* New [Maxout layer](https://github.com/FluxML/Flux.jl/pull/647)
* Datasets are now [hash verified on download](https://github.com/FluxML/Flux.jl/pull/585) to avoid corruption.
* We now [zero the initial state for RNNs](https://github.com/FluxML/Flux.jl/pull/590/).
* [Normalisation can now work on arbitrary `dims`.](https://github.com/FluxML/Flux.jl/pull/592)
* Many docs and bugfixes thanks to @KristofferC and others.
* [NamedTuples now work like Tuples](https://github.com/FluxML/Flux.jl/pull/603) when doing `mapleaves`.
* New "performance tips" [section of the docs](https://github.com/FluxML/Flux.jl/pull/615).
* The training loop is [now more readable](https://github.com/FluxML/Flux.jl/pull/651) and better shows how to use the lower-level APIs.
* New [AlphaDropout](https://github.com/FluxML/Flux.jl/pull/656).
* [Data.Iris](https://github.com/FluxML/Flux.jl/pull/652) makes Fisher's Iris dataset available with `Iris.labels` and `Iris.features`.
* New [InstanceNorm](https://github.com/FluxML/Flux.jl/pull/634), as popularized by [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).
* New [GroupNorm](https://github.com/FluxML/Flux.jl/pull/696), as described in [Group Normalization](https://arxiv.org/abs/1803.08494).
* New [CrossCor](https://github.com/FluxML/Flux.jl/pull/762).
AD Changes:
* `det`, `logdet` and `logabsdet` [now have adjoints](https://github.com/FluxML/Flux.jl/pull/596/files).
* Support for [PermuteDimsArray](https://github.com/FluxML/Flux.jl/pull/576).
* Flux.Tracker is now its [own package](https://github.com/FluxML/Tracker.jl), in preparation for replacing it with Zygote.
# v0.7.0
Despite the heroic efforts of scholars and archeologists, pre-0.7 history is lost to the sands of time.

51
Project.toml Normal file
View File

@ -0,0 +1,51 @@
name = "Flux"
uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c"
version = "0.11.0-DEV"
[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
CuArrays = "3a865a2d-5b23-5a0f-bc46-62713ec82fae"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
Juno = "e5e0dc1b-0480-54bc-9374-aad01c23163d"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
ZipFile = "a5390f91-8eb1-5f08-bee0-b1d1ffed6cea"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
[compat]
AbstractTrees = "0.2, 0.3"
Adapt = "1, 2.0"
CodecZlib = "0.5, 0.6, 0.7"
Colors = "0.8, 0.9, 0.10, 0.11, 0.12"
CuArrays = "2"
Functors = "0.1"
Juno = "0.5, 0.6, 0.7, 0.8"
MacroTools = "0.3, 0.4, 0.5"
NNlib = "0.6"
Reexport = "0.2"
StatsBase = "0"
ZipFile = "0.7, 0.8, 0.9"
Zygote = "0.4.13"
julia = "1.3"
[extras]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
IterTools = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[targets]
test = ["Test", "Documenter", "IterTools", "LinearAlgebra"]

View File

@ -1,11 +1,15 @@
# Флукс
<p align="center">
<img width="400px" src="https://raw.githubusercontent.com/FluxML/fluxml.github.io/master/logo.png"/>
</p>
[![Build Status](https://travis-ci.org/FluxML/Flux.jl.svg?branch=master)](https://travis-ci.org/FluxML/Flux.jl) [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://fluxml.github.io/Flux.jl/stable/) [![Join the chat at https://gitter.im/FluxML](https://badges.gitter.im/FluxML/Lobby.svg)](https://gitter.im/FluxML/Lobby) [Slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866)
[![Build Status](https://travis-ci.org/FluxML/Flux.jl.svg?branch=master)](https://travis-ci.org/FluxML/Flux.jl) [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://fluxml.github.io/Flux.jl/stable/) [![](https://img.shields.io/badge/chat-on%20slack-yellow.svg)](https://slackinvite.julialang.org/) [![DOI](https://joss.theoj.org/papers/10.21105/joss.00602/status.svg)](https://doi.org/10.21105/joss.00602)
Flux is a refreshing approach to machine learning. It provides lightweight abstractions on top of Julia's native GPU and AD support, while remaining fully hackable (right down to the [GPU kernels](https://github.com/FluxML/CuArrays.jl)).
Flux is an elegant approach to machine learning. It's a 100% pure-Julia stack, and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable.
```julia
julia> Pkg.add("Flux")
] add Flux
```
See the [documentation](http://fluxml.github.io/Flux.jl/stable/) or the [model zoo](https://github.com/FluxML/model-zoo/) for examples.
See the [documentation](https://fluxml.github.io/Flux.jl/) or the [model zoo](https://github.com/FluxML/model-zoo/) for examples.
If you use Flux in your research, please [cite](CITATION.bib) our work.

View File

@ -1,7 +0,0 @@
julia 0.6.0
DataFlow 0.2.1
Juno
MacroTools 0.3.3
NNlib
ForwardDiff
Requires

4
bors.toml Normal file
View File

@ -0,0 +1,4 @@
status = [
"ci/gitlab%"
]
timeout-sec = 7200

6
docs/Project.toml Normal file
View File

@ -0,0 +1,6 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
[compat]
Documenter = "0.24"

View File

@ -1,27 +1,36 @@
using Documenter, Flux
using Documenter, Flux, NNlib
makedocs(modules=[Flux],
doctest = false,
format = :html,
analytics = "UA-36890222-9",
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
makedocs(modules=[Flux, NNlib],
doctest = VERSION >= v"1.4",
sitename = "Flux",
assets = ["../flux.css"],
pages = ["Home" => "index.md",
"Building Models" =>
["Basics" => "models/basics.md",
"Recurrence" => "models/recurrence.md",
"Layer Reference" => "models/layers.md"],
"Regularisation" => "models/regularisation.md",
"Model Reference" => "models/layers.md",
"Advanced Model Building" => "models/advanced.md",
"NNlib" => "models/nnlib.md"],
"Handling Data" =>
["One-Hot Encoding" => "data/onehot.md",
"DataLoader" => "data/dataloader.md"],
"Training Models" =>
["Optimisers" => "training/optimisers.md",
"Training" => "training/training.md"],
"One-Hot Encoding" => "data/onehot.md",
"GPU Support" => "gpu.md",
"Contributing & Help" => "contributing.md"])
"Saving & Loading" => "saving.md",
"The Julia Ecosystem" => "ecosystem.md",
"Utility Functions" => "utilities.md",
"Performance Tips" => "performance.md",
"Datasets" => "datasets.md",
"Community" => "community.md"],
format = Documenter.HTML(
analytics = "UA-36890222-9",
assets = ["assets/flux.css"],
prettyurls = get(ENV, "CI", nothing) == "true"),
)
deploydocs(
repo = "github.com/FluxML/Flux.jl.git",
target = "build",
osname = "linux",
julia = "0.6",
deps = nothing,
make = nothing)
deploydocs(repo = "github.com/FluxML/Flux.jl.git",
target = "build",
push_preview = true)

113
docs/src/assets/flux.css Normal file
View File

@ -0,0 +1,113 @@
@import url('https://fonts.googleapis.com/css?family=Lato:400,400i');
body {
font-family: Lato, "Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif;
}
nav.toc {
padding-top: 0;
background: rgb(240, 240, 240);
line-height: 2em;
cursor: default;
user-select: none;
}
h1+h2 {
margin-top: 0;
}
/* Green banner in ToC */
nav.toc > h1 {
margin-top: 0;
padding-top: 0.4em;
padding-bottom: 0.5em;
border-bottom: 5px solid white;
box-shadow: 0px -2px 5px rgb(60,60,60);
margin-bottom: 0.5em;
background: rgb(60, 150, 60);
font-style: italic;
font-weight: normal;
font-size: 50pt;
text-transform: lowercase;
text-shadow: 2px 2px 5px rgba(0,0,0,0.2);
color: white;
}
/* Reduce ToC font size */
.toctext {
font-size: 10pt;
}
/* Fade out non-clickable ToC headers */
nav.toc ul span.toctext {
color: rgb(180, 180, 180);
}
nav.toc ul .toctext {
color: rgb(100, 100, 100);
}
nav.toc ul a.toctext:hover {
color: inherit;
background: rgb(220, 220, 220);
cursor: default;
}
nav.toc li.current > .toctext {
background: linear-gradient(90deg, rgb(245,245,245) 0%, white 90%);
font-weight: normal;
}
nav.toc ul.internal li.toplevel {
font-weight: normal;
}
/* Content */
article { max-width: none; }
article > p, article > ul {
max-width: 45em;
}
/* Links */
a, a:visited { color: rgb(0, 120, 0); }
article p a { border-bottom: 1px solid rgb(200, 230, 200); }
a:hover, a:visited:hover { color: rgb(0, 80, 0); }
/* Article Links */
article p a { border-bottom: 1px solid rgb(200, 230, 200); }
article p a:hover, article a:visited:hover { color: rgb(0, 120, 0); }
article p a:hover { border-bottom: 1px solid rgb(150, 200, 150); }
/* Doctstrings */
article section.docstring {
padding: 0.5em 0;
border-left: none;
border-right: none;
border-bottom: none;
}
/* Code */
article pre, article p > code {
background: rgb(245, 250, 245);
}
article pre {
border: none;
max-width: none;
padding: 1em;
border-radius: 10px 0px 0px 10px;
margin-left: -1em;
margin-right: -2em;
}
.hljs-comment {
font-style: italic;
}
.hljs-number {
color: rgb(0, 150, 150);
}

5
docs/src/community.md Normal file
View File

@ -0,0 +1,5 @@
# Community
All Flux users are welcome to join our community on the [Julia forum](https://discourse.julialang.org/), or the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning). If you have questions or issues we'll try to help you out.
If you're interested in hacking on Flux, the [source code](https://github.com/FluxML/Flux.jl) is open and easy to understand -- it's all just the same Julia code you work with normally. You might be interested in our [intro issues](https://github.com/FluxML/Flux.jl/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22) to get started.

View File

@ -1,9 +0,0 @@
# Contributing & Help
If you need help, please ask on the [Julia forum](https://discourse.julialang.org/), the [slack](https://discourse.julialang.org/t/announcing-a-julia-slack/4866) (channel #machine-learning), or Flux's [Gitter](https://gitter.im/FluxML/Lobby).
Right now, the best way to help out is to try out the examples and report any issues or missing features as you find them. The second best way is to help us spread the word, perhaps by [starring the repo](https://github.com/MikeInnes/Flux.jl).
If you're interested in hacking on Flux, most of the [code](https://github.com/MikeInnes/Flux.jl/tree/master/src) is pretty straightforward. Adding new [layer definitions](https://github.com/MikeInnes/Flux.jl/tree/master/src/layers) or cost functions is simple using the Flux DSL itself, and things like data utilities and training processes are all plain Julia code.
If you get stuck or need anything, let us know!

View File

@ -0,0 +1,6 @@
# DataLoader
Flux provides the `DataLoader` type in the `Flux.Data` module to handle iteration over mini-batches of data.
```@docs
Flux.Data.DataLoader
```

View File

@ -3,37 +3,42 @@
It's common to encode categorical variables (like `true`, `false` or `cat`, `dog`) in "one-of-k" or ["one-hot"](https://en.wikipedia.org/wiki/One-hot) form. Flux provides the `onehot` function to make this easy.
```
julia> using Flux: onehot
julia> using Flux: onehot, onecold
julia> onehot(:b, [:a, :b, :c])
3-element Flux.OneHotVector:
false
true
false
0
1
0
julia> onehot(:c, [:a, :b, :c])
3-element Flux.OneHotVector:
false
false
true
0
0
1
```
The inverse is `argmax` (which can take a general probability distribution, as well as just booleans).
The inverse is `onecold` (which can take a general probability distribution, as well as just booleans).
```julia
julia> argmax(ans, [:a, :b, :c])
julia> onecold(ans, [:a, :b, :c])
:c
julia> argmax([true, false, false], [:a, :b, :c])
julia> onecold([true, false, false], [:a, :b, :c])
:a
julia> argmax([0.3, 0.2, 0.5], [:a, :b, :c])
julia> onecold([0.3, 0.2, 0.5], [:a, :b, :c])
:c
```
```@docs
Flux.onehot
Flux.onecold
```
## Batches
`onehotbatch` creates a batch (matrix) of one-hot vectors, and `argmax` treats matrices as batches.
`onehotbatch` creates a batch (matrix) of one-hot vectors, and `onecold` treats matrices as batches.
```julia
julia> using Flux: onehotbatch
@ -52,3 +57,7 @@ julia> onecold(ans, [:a, :b, :c])
```
Note that these operations returned `OneHotVector` and `OneHotMatrix` rather than `Array`s. `OneHotVector`s behave like normal vectors but avoid any unnecessary cost compared to using an integer index directly. For example, multiplying a matrix with a one-hot vector simply slices out the relevant row of the matrix under the hood.
```@docs
Flux.onehotbatch
```

20
docs/src/datasets.md Normal file
View File

@ -0,0 +1,20 @@
# Datasets
Flux includes several standard machine learning datasets.
```@docs
Flux.Data.Iris.features()
Flux.Data.Iris.labels()
Flux.Data.MNIST.images()
Flux.Data.MNIST.labels()
Flux.Data.FashionMNIST.images()
Flux.Data.FashionMNIST.labels()
Flux.Data.CMUDict.phones()
Flux.Data.CMUDict.symbols()
Flux.Data.CMUDict.rawdict()
Flux.Data.CMUDict.cmudict()
Flux.Data.Sentiment.train()
Flux.Data.Sentiment.test()
Flux.Data.Sentiment.dev()
```

21
docs/src/ecosystem.md Normal file
View File

@ -0,0 +1,21 @@
# The Julia Ecosystem
One of the main strengths of Julia lies in an ecosystem of packages
globally providing a rich and consistent user experience.
This is a non-exhaustive list of Julia packages, nicely complementing `Flux` in typical
machine learning and deep learning workflows:
- [ArgParse.jl](https://github.com/carlobaldassi/ArgParse.jl): package for parsing command-line arguments to Julia programs.
- [Augmentor.jl](https://github.com/Evizero/Augmentor.jl): a fast image augmentation library in Julia for machine learning.
- [BSON.jl](https://github.com/JuliaIO/BSON.jl): package for working with the Binary JSON serialisation format
- [DataFrames.jl](https://github.com/joshday/OnlineStats.jl): in-memory tabular data in Julia
- [DrWatson.jl](https://github.com/JuliaDynamics/DrWatson.jl): a scientific project assistant software
- [MLDatasets.jl](https://github.com/JuliaML/MLDatasets.jl): utility package for accessing common machine learning datasets
- [OnlineStats.jl](https://github.com/joshday/OnlineStats.jl): single-pass algorithms for statistics
- [Parameters.jl](https://github.com/mauro3/Parameters.jl): types with default field values, keyword constructors and (un-)pack macros
- [ProgressMeters.jl](https://github.com/timholy/ProgressMeter.jl): progress meters for long-running computations
- [TensorBoardLogger.jl](https://github.com/PhilipVinc/TensorBoardLogger.jl): easy peasy logging to [tensorboard](https://www.tensorflow.org/tensorboard) in Julia
This tight integration among Julia pakages is shown in some of the examples in the [model-zoo](https://github.com/FluxML/model-zoo) repository.

View File

@ -1,9 +1,15 @@
# GPU Support
Support for array operations on other hardware backends, like GPUs, is provided by external packages like [CuArrays](https://github.com/JuliaGPU/CuArrays.jl) and [CLArrays](https://github.com/JuliaGPU/CLArrays.jl). Flux doesn't care what array type you use, so we can just plug these in without any other changes.
NVIDIA GPU support should work out of the box on systems with CUDA and CUDNN installed. For more details see the [CuArrays](https://github.com/JuliaGPU/CuArrays.jl) readme.
## GPU Usage
Support for array operations on other hardware backends, like GPUs, is provided by external packages like [CuArrays](https://github.com/JuliaGPU/CuArrays.jl). Flux is agnostic to array types, so we simply need to move model weights and data to the GPU and Flux will handle it.
For example, we can use `CuArrays` (with the `cu` converter) to run our [basic example](models/basics.md) on an NVIDIA GPU.
(Note that you need to have CUDA available to use CuArrays please see the [CuArrays.jl](https://github.com/JuliaGPU/CuArrays.jl) instructions for more details.)
```julia
using CuArrays
@ -19,17 +25,52 @@ loss(x, y) # ~ 3
Note that we convert both the parameters (`W`, `b`) and the data set (`x`, `y`) to cuda arrays. Taking derivatives and training works exactly as before.
If you define a structured model, like a `Dense` layer or `Chain`, you just need to convert the internal parameters. Flux provides `mapleaves`, which allows you to alter all parameters of a model at once.
If you define a structured model, like a `Dense` layer or `Chain`, you just need to convert the internal parameters. Flux provides `fmap`, which allows you to alter all parameters of a model at once.
```julia
d = Dense(10, 5, σ)
d = mapleaves(cu, d)
d.W # Tracked CuArray
d = fmap(cu, d)
d.W # CuArray
d(cu(rand(10))) # CuArray output
m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
m = mapleaves(cu, m)
m = fmap(cu, m)
d(cu(rand(10)))
```
The [mnist example](https://github.com/FluxML/model-zoo/blob/master/mnist/mnist.jl) contains the code needed to run the model on the GPU; just uncomment the lines after `using CuArrays`.
As a convenience, Flux provides the `gpu` function to convert models and data to the GPU if one is available. By default, it'll do nothing, but loading `CuArrays` will cause it to move data to the GPU instead.
```julia
julia> using Flux, CuArrays
julia> m = Dense(10,5) |> gpu
Dense(10, 5)
julia> x = rand(10) |> gpu
10-element CuArray{Float32,1}:
0.800225
0.511655
julia> m(x)
5-element CuArray{Float32,1}:
-0.30535
-0.618002
```
The analogue `cpu` is also available for moving models and data back off of the GPU.
```julia
julia> x = rand(10) |> gpu
10-element CuArray{Float32,1}:
0.235164
0.192538
julia> x |> cpu
10-element Array{Float32,1}:
0.235164
0.192538
```

View File

@ -1,14 +1,17 @@
# Flux: The Julia Machine Learning Library
Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. The whole stack is implemented in clean Julia code (right down to the [GPU kernels](https://github.com/FluxML/CuArrays.jl)) and any part can be tweaked to your liking.
Flux is a library for machine learning. It comes "batteries-included" with many useful tools built in, but also lets you use the full power of the Julia language where you need it. We follow a few key principles:
# Installation
* **Doing the obvious thing**. Flux has relatively few explicit APIs for features like regularisation or embeddings. Instead, writing down the mathematical form will work and be fast.
* **You could have written Flux**. All of it, from [LSTMs](https://github.com/FluxML/Flux.jl/blob/ec16a2c77dbf6ab8b92b0eecd11661be7a62feef/src/layers/recurrent.jl#L131) to [GPU kernels](https://github.com/JuliaGPU/CuArrays.jl), is straightforward Julia code. When in doubt, its well worth looking at [the source](https://github.com/FluxML/Flux.jl/). If you need something different, you can easily roll your own.
* **Play nicely with others**. Flux works well with Julia libraries from [data frames](https://github.com/JuliaComputing/JuliaDB.jl) and [images](https://github.com/JuliaImages/Images.jl) to [differential equation solvers](https://github.com/JuliaDiffEq/DifferentialEquations.jl), so you can easily build complex data processing pipelines that integrate Flux models.
Install [Julia 0.6.0 or later](https://julialang.org/downloads/), if you haven't already.
## Installation
```julia
Pkg.add("Flux")
Pkg.test("Flux") # Check things installed correctly
```
Download [Julia 1.0](https://julialang.org/) or later, if you haven't already. You can add Flux from using Julia's package manager, by typing `] add Flux` in the Julia prompt.
Start with the [basics](models/basics.md). The [model zoo](https://github.com/FluxML/model-zoo/) is also a good starting point for many common kinds of models.
If you have CUDA you can also run `] add CuArrays` to get GPU support; see [here](gpu.md) for more details.
## Learning Flux
There are several different ways to learn Flux. If you just want to get started writing models, the [model zoo](https://github.com/FluxML/model-zoo/) gives good starting points for many common ones. This documentation provides a reference to all of Flux's APIs, as well as a from-scratch introduction to Flux's take on models and how they work. Once you understand these docs, congratulations, you also understand [Flux's source code](https://github.com/FluxML/Flux.jl), which is intended to be concise, legible and a good reference for more advanced concepts.

View File

@ -0,0 +1,73 @@
# Advanced Model Building and Customisation
Here we will try and describe usage of some more advanced features that Flux provides to give more control over model building.
## Customising Parameter Collection for a Model
Taking reference from our example `Affine` layer from the [basics](basics.md#Building-Layers-1).
By default all the fields in the `Affine` type are collected as its parameters, however, in some cases it may be desired to hold other metadata in our "layers" that may not be needed for training, and are hence supposed to be ignored while the parameters are collected. With Flux, it is possible to mark the fields of our layers that are trainable in two ways.
The first way of achieving this is through overloading the `trainable` function.
```julia-repl
julia> @functor Affine
julia> a = Affine(rand(3,3), rand(3))
Affine{Array{Float64,2},Array{Float64,1}}([0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955])
julia> Flux.params(a) # default behavior
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297], [0.42394, 0.0170927, 0.544955]])
julia> Flux.trainable(a::Affine) = (a.W,)
julia> Flux.params(a)
Params([[0.66722 0.774872 0.249809; 0.843321 0.403843 0.429232; 0.683525 0.662455 0.065297]])
```
Only the fields returned by `trainable` will be collected as trainable parameters of the layer when calling `Flux.params`.
Another way of achieving this is through the `@functor` macro directly. Here, we can mark the fields we are interested in by grouping them in the second argument:
```julia
Flux.@functor Affine (W,)
```
However, doing this requires the `struct` to have a corresponding constructor that accepts those parameters.
## Freezing Layer Parameters
When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`.
Consider a simple multi-layer perceptron model where we want to avoid optimising the first two `Dense` layers. We can obtain
this using the slicing features `Chain` provides:
```julia
m = Chain(
Dense(784, 64, relu),
Dense(64, 64, relu),
Dense(32, 10)
)
ps = Flux.params(m[3:end])
```
The `Zygote.Params` object `ps` now holds a reference to only the parameters of the layers passed to it.
During training, the gradients will only be computed for (and applied to) the last `Dense` layer, therefore only that would have its parameters changed.
`Flux.params` also takes multiple inputs to make it easy to collect parameters from heterogenous models with a single call. A simple demonstration would be if we wanted to omit optimising the second `Dense` layer in the previous example. It would look something like this:
```julia
Flux.params(m[1], m[3:end])
```
Sometimes, a more fine-tuned control is needed.
We can freeze a specific parameter of a specific layer which already entered a `Params` object `ps`,
by simply deleting it from `ps`:
```julia
ps = params(m)
delete!(ps, m[2].b)
```

View File

@ -2,58 +2,114 @@
## Taking Gradients
Consider a simple linear regression, which tries to predict an output array `y` from an input `x`. (It's a good idea to follow this example in the Julia repl.)
Flux's core feature is taking gradients of Julia code. The `gradient` function takes another Julia function `f` and a set of arguments, and returns the gradient with respect to each argument. (It's a good idea to try pasting these examples in the Julia terminal.)
```jldoctest basics
julia> using Flux
julia> f(x) = 3x^2 + 2x + 1;
julia> df(x) = gradient(f, x)[1]; # df/dx = 6x + 2
julia> df(2)
14
julia> d2f(x) = gradient(df, x)[1]; # d²f/dx² = 6
julia> d2f(2)
6
```
When a function has many parameters, we can get gradients of each one at the same time:
```jldoctest basics
julia> f(x, y) = sum((x .- y).^2);
julia> gradient(f, [2, 1], [2, 0])
([0, 2], [0, -2])
```
But machine learning models can have *hundreds* of parameters! To handle this, Flux lets you work with collections of parameters, via `params`. You can get the gradient of all parameters used in a program without explicitly passing them in.
```jldoctest basics
julia> x = [2, 1];
julia> y = [2, 0];
julia> gs = gradient(params(x, y)) do
f(x, y)
end
Grads(...)
julia> gs[x]
2-element Array{Int64,1}:
0
2
julia> gs[y]
2-element Array{Int64,1}:
0
-2
```
Here, `gradient` takes a zero-argument function; no arguments are necessary because the `params` tell it what to differentiate.
This will come in really handy when dealing with big, complicated models. For now, though, let's start with something simple.
## Simple Models
Consider a simple linear regression, which tries to predict an output array `y` from an input `x`.
```julia
W = rand(2, 5)
b = rand(2)
predict(x) = W*x .+ b
loss(x, y) = sum((predict(x) .- y).^2)
function loss(x, y)
ŷ = predict(x)
sum((y .- ŷ).^2)
end
x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3
```
To improve the prediction we can take the gradients of `W` and `b` with respect to the loss function and perform gradient descent. We could calculate gradients by hand, but Flux will do it for us if we tell it that `W` and `b` are trainable *parameters*.
To improve the prediction we can take the gradients of `W` and `b` with respect to the loss and perform gradient descent.
```julia
using Flux.Tracker: param, back!, data, grad
using Flux
W = param(W)
b = param(b)
l = loss(x, y)
back!(l)
gs = gradient(() -> loss(x, y), params(W, b))
```
`loss(x, y)` returns the same number, but it's now a *tracked* value that records gradients as it goes along. Calling `back!` then calculates the gradient of `W` and `b`. We can see what this gradient is, and modify `W` to train the model.
Now that we have gradients, we can pull them out and update `W` to train the model.
```julia
grad(W)
W̄ = gs[W]
# Update the parameter
W.data .-= 0.1grad(W)
W .-= 0.1 .* W̄
loss(x, y) # ~ 2.5
```
The loss has decreased a little, meaning that our prediction `x` is closer to the target `y`. If we have some data we can already try [training the model](../training/training.md).
All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can *look* very different they might have millions of parameters or complex control flow, and there are ways to manage this complexity. Let's see what that looks like.
All deep learning in Flux, however complex, is a simple generalisation of this example. Of course, models can *look* very different they might have millions of parameters or complex control flow. Let's see how Flux handles more complex models.
## Building Layers
It's common to create more complex models than the linear regression above. For example, we might want to have two linear layers with a nonlinearity like [sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) (`σ`) in between them. In the above style we could write this as:
```julia
W1 = param(rand(3, 5))
b1 = param(rand(3))
using Flux
W1 = rand(3, 5)
b1 = rand(3)
layer1(x) = W1 * x .+ b1
W2 = param(rand(2, 3))
b2 = param(rand(2))
W2 = rand(2, 3)
b2 = rand(2)
layer2(x) = W2 * x .+ b2
model(x) = layer2(σ.(layer1(x)))
@ -65,8 +121,8 @@ This works but is fairly unwieldy, with a lot of repetition especially as we
```julia
function linear(in, out)
W = param(randn(out, in))
b = param(randn(out))
W = randn(out, in)
b = randn(out)
x -> W * x .+ b
end
@ -75,7 +131,7 @@ linear2 = linear(3, 2)
model(x) = linear2(σ.(linear1(x)))
model(x) # => 2-element vector
model(rand(5)) # => 2-element vector
```
Another (equivalent) way is to create a struct that explicitly represents the affine layer.
@ -87,7 +143,7 @@ struct Affine
end
Affine(in::Integer, out::Integer) =
Affine(param(randn(out, in)), param(randn(out)))
Affine(randn(out, in), randn(out))
# Overload call, so the object can be used as a function
(m::Affine)(x) = m.W * x .+ m.b
@ -118,7 +174,7 @@ using Flux
layers = [Dense(10, 5, σ), Dense(5, 2), softmax]
model(x) = foldl((x, m) -> m(x), x, layers)
model(x) = foldl((x, m) -> m(x), layers, init = x)
model(rand(10)) # => 2-element vector
```
@ -151,3 +207,36 @@ m = Chain(x -> x^2, x -> x+1)
m(5) # => 26
```
## Layer helpers
Flux provides a set of helpers for custom layers, which you can enable by calling
```julia
Flux.@functor Affine
```
This enables a useful extra set of functionality for our `Affine` layer, such as [collecting its parameters](../training/optimisers.md) or [moving it to the GPU](../gpu.md).
For some more helpful tricks, including parameter freezing, please checkout the [advanced usage guide](advanced.md).
## Utility functions
Flux provides some utility functions to help you generate models in an automated fashion.
`outdims` enables you to calculate the spatial output dimensions of layers like `Conv` when applied to input images of a given size.
Currently limited to the following layers:
- `Chain`
- `Dense`
- `Conv`
- `Diagonal`
- `Maxout`
- `ConvTranspose`
- `DepthwiseConv`
- `CrossCor`
- `MaxPool`
- `MeanPool`
```@docs
Flux.outdims
```

View File

@ -1,6 +1,92 @@
## Model Layers
## Basic Layers
These core layers form the foundation of almost all neural networks.
```@docs
Chain
Dense
```
## Convolution and Pooling Layers
These layers are used to build convolutional neural networks (CNNs).
```@docs
Conv
MaxPool
GlobalMaxPool
MeanPool
GlobalMeanPool
DepthwiseConv
ConvTranspose
CrossCor
SamePad
flatten
Flux.Zeros
Flux.convfilter
Flux.depthwiseconvfilter
```
## Recurrent Layers
Much like the core layers above, but can be used to process sequence data (as well as other kinds of structured data).
```@docs
RNN
LSTM
GRU
Flux.Recur
Flux.reset!
```
## Other General Purpose Layers
These are marginally more obscure than the Basic Layers.
But in contrast to the layers described in the other sections are not readily grouped around a particular purpose (e.g. CNNs or RNNs).
```@docs
Maxout
SkipConnection
```
## Normalisation & Regularisation
These layers don't affect the structure of the network but may improve training times or reduce overfitting.
```@docs
Flux.normalise
BatchNorm
Flux.dropout
Dropout
AlphaDropout
LayerNorm
InstanceNorm
GroupNorm
```
### Testmode
Many normalisation layers behave differently under training and inference (testing). By default, Flux will automatically determine when a layer evaluation is part of training or inference. Still, depending on your use case, it may be helpful to manually specify when these layers should be treated as being trained or not. For this, Flux provides `Flux.testmode!`. When called on a model (e.g. a layer or chain of layers), this function will place the model into the mode specified.
```@docs
Flux.testmode!
trainmode!
```
## Cost Functions
```@docs
Flux.mae
Flux.mse
Flux.msle
Flux.huber_loss
Flux.crossentropy
Flux.logitcrossentropy
Flux.binarycrossentropy
Flux.logitbinarycrossentropy
Flux.kldivergence
Flux.poisson
Flux.hinge
Flux.squared_hinge
Flux.dice_coeff_loss
Flux.tversky_loss
```

61
docs/src/models/nnlib.md Normal file
View File

@ -0,0 +1,61 @@
# NNlib
Flux re-exports all of the functions exported by the [NNlib](https://github.com/FluxML/NNlib.jl) package.
## Activation Functions
Non-linearities that go between layers of your model. Note that, unless otherwise stated, activation functions operate on scalars. To apply them to an array you can call `σ.(xs)`, `relu.(xs)` and so on.
```@docs
NNlib.celu
NNlib.elu
NNlib.gelu
NNlib.hardsigmoid
NNlib.hardtanh
NNlib.leakyrelu
NNlib.lisht
NNlib.logcosh
NNlib.logsigmoid
NNlib.mish
NNlib.relu
NNlib.relu6
NNlib.rrelu
NNlib.selu
NNlib.sigmoid
NNlib.softplus
NNlib.softshrink
NNlib.softsign
NNlib.swish
NNlib.tanhshrink
NNlib.trelu
```
## Softmax
```@docs
NNlib.softmax
NNlib.logsoftmax
```
## Pooling
```@docs
NNlib.maxpool
NNlib.meanpool
```
## Convolution
```@docs
NNlib.conv
NNlib.depthwiseconv
```
## Batched Operations
```@docs
NNlib.batched_mul
NNlib.batched_mul!
NNlib.batched_adjoint
NNlib.batched_transpose
```

View File

@ -77,7 +77,7 @@ If you use the `RNN(10, 5)` constructor as opposed to `RNNCell` you'll s
```julia
julia> RNN(10, 5)
Recur(RNNCell(Dense(15, 5)))
Recur(RNNCell(10, 5, tanh))
```
## Sequences
@ -101,14 +101,4 @@ m = Chain(LSTM(10, 15), Dense(15, 5))
m.(seq)
```
## Truncating Gradients
By default, calculating the gradients in a recurrent layer involves the entire history. For example, if we call the model on 100 inputs, calling `back!` will calculate the gradient for those 100 calls. If we then calculate another 10 inputs we have to calculate 110 gradients this accumulates and quickly becomes expensive.
To avoid this we can *truncate* the gradient calculation, forgetting the history.
```julia
truncate!(m)
```
Calling `truncate!` wipes the slate clean, so we can call the model with more inputs without building up an expensive gradient computation.
Finally, we can reset the hidden state of the cell back to its initial value using `reset!(m)`.

View File

@ -0,0 +1,70 @@
# Regularisation
Applying regularisation to model parameters is straightforward. We just need to
apply an appropriate regulariser, such as `norm`, to each model parameter and
add the result to the overall loss.
For example, say we have a simple regression.
```julia
using Flux: crossentropy
m = Dense(10, 5)
loss(x, y) = crossentropy(softmax(m(x)), y)
```
We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b`.
```julia
using LinearAlgebra
penalty() = norm(m.W) + norm(m.b)
loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()
```
When working with layers, Flux provides the `params` function to grab all
parameters at once. We can easily penalise everything with `sum(norm, params)`.
```julia
julia> params(m)
2-element Array{Any,1}:
param([0.355408 0.533092; … 0.430459 0.171498])
param([0.0, 0.0, 0.0, 0.0, 0.0])
julia> sum(norm, params(m))
26.01749952921026
```
Here's a larger example with a multi-layer perceptron.
```julia
m = Chain(
Dense(28^2, 128, relu),
Dense(128, 32, relu),
Dense(32, 10), softmax)
loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))
loss(rand(28^2), rand(10))
```
One can also easily add per-layer regularisation via the `activations` function:
```julia
julia> using Flux: activations
julia> c = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
julia> activations(c, rand(10))
3-element Array{Any,1}:
Float32[0.84682214, 0.6704139, 0.42177814, 0.257832, 0.36255655]
Float32[0.1501253, 0.073269576]
Float32[0.5192045, 0.48079553]
julia> sum(norm, ans)
2.1166067f0
```
```@docs
Flux.activations
```

76
docs/src/performance.md Normal file
View File

@ -0,0 +1,76 @@
# Performance Tips
All the usual [Julia performance tips apply](https://docs.julialang.org/en/v1/manual/performance-tips/).
As always [profiling your code](https://docs.julialang.org/en/v1/manual/profile/#Profiling-1) is generally a useful way of finding bottlenecks.
Below follow some Flux specific tips/reminders.
## Don't use more precision than you need
Flux works great with all kinds of number types.
But often you do not need to be working with say `Float64` (let alone `BigFloat`).
Switching to `Float32` can give you a significant speed up,
not because the operations are faster, but because the memory usage is halved.
Which means allocations occur much faster.
And you use less memory.
## Preserve inputs' types
Not only should your activation and loss functions be [type-stable](https://docs.julialang.org/en/v1/manual/performance-tips/#Write-%22type-stable%22-functions-1),
they should also preserve the type of their inputs.
A very artificial example using an activation function like
```
my_tanh(x) = Float64(tanh(x))
```
will result in performance on `Float32` input orders of magnitude slower than the normal `tanh` would,
because it results in having to use slow mixed type multiplication in the dense layers.
Similar situations can occur in the loss function during backpropagation.
Which means if you change your data say from `Float64` to `Float32` (which should give a speedup: see above),
you will see a large slow-down.
This can occur sneakily, because you can cause type-promotion by interacting with a numeric literals.
E.g. the following will have run into the same problem as above:
```
leaky_tanh(x) = 0.01*x + tanh(x)
```
While one could change the activation function (e.g. to use `0.01f0*x`), the idiomatic (and safe way) to avoid type casts whenever inputs changes is to use `oftype`:
```
leaky_tanh(x) = oftype(x/1, 0.01)*x + tanh(x)
```
## Evaluate batches as Matrices of features
While it can sometimes be tempting to process your observations (feature vectors) one at a time
e.g.
```julia
function loss_total(xs::AbstractVector{<:Vector}, ys::AbstractVector{<:Vector})
sum(zip(xs, ys)) do (x, y_target)
y_pred = model(x) # evaluate the model
return loss(y_pred, y_target)
end
end
```
It is much faster to concatenate them into a matrix,
as this will hit BLAS matrix-matrix multiplication, which is much faster than the equivalent sequence of matrix-vector multiplications.
The improvement is enough that it is worthwhile allocating new memory to store them contiguously.
```julia
x_batch = reduce(hcat, xs)
y_batch = reduce(hcat, ys)
...
function loss_total(x_batch::Matrix, y_batch::Matrix)
y_preds = model(x_batch)
sum(loss.(y_preds, y_batch))
end
```
When doing this kind of concatenation use `reduce(hcat, xs)` rather than `hcat(xs...)`.
This will avoid the splatting penalty, and will hit the optimised `reduce` method.

118
docs/src/saving.md Normal file
View File

@ -0,0 +1,118 @@
# Saving and Loading Models
You may wish to save models so that they can be loaded and run in a later
session. The easiest way to do this is via
[BSON.jl](https://github.com/MikeInnes/BSON.jl).
Save a model:
```julia
julia> using Flux
julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
julia> using BSON: @save
julia> @save "mymodel.bson" model
```
Load it again:
```julia
julia> using Flux
julia> using BSON: @load
julia> @load "mymodel.bson" model
julia> model
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
```
Models are just normal Julia structs, so it's fine to use any Julia storage
format for this purpose. BSON.jl is particularly well supported and most likely
to be forwards compatible (that is, models saved now will load in future
versions of Flux).
!!! note
If a saved model's weights are stored on the GPU, the model will not load
later on if there is no GPU support available. It's best to [move your model
to the CPU](gpu.md) with `cpu(model)` before saving it.
## Saving Model Weights
In some cases it may be useful to save only the model parameters themselves, and
rebuild the model architecture in your code. You can use `params(model)` to get
model parameters. You can also use `data.(params)` to remove tracking.
```Julia
julia> using Flux
julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
julia> weights = params(model);
julia> using BSON: @save
julia> @save "mymodel.bson" weights
```
You can easily load parameters back into a model with `Flux.loadparams!`.
```julia
julia> using Flux
julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
julia> using BSON: @load
julia> @load "mymodel.bson" weights
julia> Flux.loadparams!(model, weights)
```
The new `model` we created will now be identical to the one we saved parameters for.
## Checkpointing
In longer training runs it's a good idea to periodically save your model, so that you can resume if training is interrupted (for example, if there's a power cut). You can do this by saving the model in the [callback provided to `train!`](training/training.md).
```julia
using Flux: throttle
using BSON: @save
m = Chain(Dense(10,5,relu),Dense(5,2),softmax)
evalcb = throttle(30) do
# Show loss
@save "model-checkpoint.bson" model
end
```
This will update the `"model-checkpoint.bson"` file every thirty seconds.
You can get more advanced by saving a series of models throughout training, for example
```julia
@save "model-$(now()).bson" model
```
will produce a series of models like `"model-2018-03-06T02:57:10.41.bson"`. You
could also store the current test set loss, so that it's easy to (for example)
revert to an older copy of the model if it starts to overfit.
```julia
@save "model-$(now()).bson" model loss = testloss()
```
You can even store optimiser state alongside the model, to resume training
exactly where you left off.
```julia
opt = ADAM()
@save "model-$(now()).bson" model opt
```

View File

@ -3,52 +3,153 @@
Consider a [simple linear regression](../models/basics.md). We create some dummy data, calculate a loss, and backpropagate to calculate gradients for the parameters `W` and `b`.
```julia
W = param(rand(2, 5))
b = param(rand(2))
using Flux
predict(x) = W*x .+ b
W = rand(2, 5)
b = rand(2)
predict(x) = (W * x) .+ b
loss(x, y) = sum((predict(x) .- y).^2)
x, y = rand(5), rand(2) # Dummy data
l = loss(x, y) # ~ 3
back!(l)
θ = Params([W, b])
grads = gradient(() -> loss(x, y), θ)
```
We want to update each parameter, using the gradient, in order to improve (reduce) the loss. Here's one way to do that:
```julia
using Flux.Tracker: data, grad
using Flux.Optimise: update!
function update()
η = 0.1 # Learning Rate
for p in (W, b)
x, Δ = data(p), grad(p)
x .-= η .* Δ # Apply the update
Δ .= 0 # Clear the gradient
end
η = 0.1 # Learning Rate
for p in (W, b)
update!(p, -η * grads[p])
end
```
If we call `update`, the parameters `W` and `b` will change and our loss should go down.
There are two pieces here: one is that we need a list of trainable parameters for the model (`[W, b]` in this case), and the other is the update step. In this case the update is simply gradient descent (`x .-= η .* Δ`), but we might choose to do something more advanced, like adding momentum.
In this case, getting the variables is trivial, but you can imagine it'd be more of a pain with some complex stack of layers.
Running this will alter the parameters `W` and `b` and our loss should go down. Flux provides a more general way to do optimiser updates like this.
```julia
m = Chain(
Dense(10, 5, σ),
Dense(5, 2), softmax)
opt = Descent(0.1) # Gradient descent with learning rate 0.1
for p in (W, b)
update!(opt, p, grads[p])
end
```
Instead of having to write `[m[1].W, m[1].b, ...]`, Flux provides a params function `params(m)` that returns a list of all parameters in the model for you.
An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.
For the update step, there's nothing whatsoever wrong with writing the loop above it'll work just fine but Flux provides various *optimisers* that make it more convenient.
## Optimiser Reference
All optimisers return an object that, when passed to `train!`, will update the parameters passed to it.
```@docs
Flux.Optimise.update!
Descent
Momentum
Nesterov
RMSProp
ADAM
RADAM
AdaMax
ADAGrad
ADADelta
AMSGrad
NADAM
ADAMW
```
## Optimiser Interface
Flux's optimisers are built around a `struct` that holds all the optimiser parameters along with a definition of how to apply the update rule associated with it. We do this via the `apply!` function which takes the optimiser as the first argument followed by the parameter and its corresponding gradient.
In this manner Flux also allows one to create custom optimisers to be used seamlessly. Let's work this with a simple example.
```julia
opt = SGD([W, b], 0.1) # Gradient descent with learning rate 0.1
mutable struct Momentum
eta
rho
velocity
end
opt()
Momentum(eta::Real, rho::Real) = Momentum(eta, rho, IdDict())
```
An optimiser takes a parameter list and returns a function that does the same thing as `update` above. We can pass either `opt` or `update` to our [training loop](training.md), which will then run the optimiser after every mini-batch of data.
The `Momentum` type will act as our optimiser in this case. Notice that we have added all the parameters as fields, along with the velocity which we will use as our state dictionary. Each parameter in our models will get an entry in there. We can now define the rule applied when this optimiser is invoked.
```julia
function Flux.Optimise.apply!(o::Momentum, x, Δ)
η, ρ = o.eta, o.rho
v = get!(o.velocity, x, zero(x))::typeof(x)
@. v = ρ * v - η * Δ
@. Δ = -v
end
```
This is the basic definition of a Momentum update rule given by:
```math
v = ρ * v - η * Δ
w = w - v
```
The `apply!` defines the update rules for an optimiser `opt`, given the parameters and gradients. It returns the updated gradients. Here, every parameter `x` is retrieved from the running state `v` and subsequently updates the state of the optimiser.
Flux internally calls on this function via the `update!` function. It shares the API with `apply!` but ensures that multiple parameters are handled gracefully.
## Composing Optimisers
Flux defines a special kind of optimiser simply called `Optimiser` which takes in arbitrary optimisers as input. Its behaviour is similar to the usual optimisers, but differs in that it acts by calling the optimisers listed in it sequentially. Each optimiser produces a modified gradient
that will be fed into the next, and the resultant update will be applied to the parameter as usual. A classic use case is where adding decays is desirable. Flux defines some basic decays including `ExpDecay`, `InvDecay` etc.
```julia
opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())
```
Here we apply exponential decay to the `Descent` optimiser. The defaults of `ExpDecay` say that its learning rate will be decayed every 1000 steps.
It is then applied like any optimiser.
```julia
w = randn(10, 10)
w1 = randn(10,10)
ps = Params([w, w1])
loss(x) = Flux.mse(w * x, w1 * x)
loss(rand(10)) # around 9
for t = 1:10^5
θ = Params([w, w1])
θ̄ = gradient(() -> loss(rand(10)), θ)
Flux.Optimise.update!(opt, θ, θ̄)
end
loss(rand(10)) # around 0.9
```
In this manner it is possible to compose optimisers for some added flexibility.
## Decays
Similar to optimisers, Flux also defines some simple decays that can be used in conjunction with other optimisers, or standalone.
```@docs
ExpDecay
InvDecay
WeightDecay
```
## Gradient Clipping
Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is
```julia
opt = Optimiser(ClipValue(1e-3), ADAM(1e-3))
```
```@docs
ClipValue
ClipNorm
```

View File

@ -1,36 +1,48 @@
# Training
To actually train a model we need three things:
To actually train a model we need four things:
* A *model loss function*, that evaluates how well a model is doing given some input data.
* A collection of data points that will be provided to the loss function.
* A *objective function*, that evaluates how well a model is doing given some input data.
* The trainable parameters of the model.
* A collection of data points that will be provided to the objective function.
* An [optimiser](optimisers.md) that will update the model parameters appropriately.
With these we can call `Flux.train!`:
With these we can call `train!`:
```julia
Flux.train!(modelLoss, data, opt)
```@docs
Flux.Optimise.train!
```
There are plenty of examples in the [model zoo](https://github.com/FluxML/model-zoo).
## Loss Functions
The `loss` that we defined in [basics](../models/basics.md) is completely valid for training. We can also define a loss in terms of some model:
The objective function must return a number representing how far the model is from its target the *loss* of the model. The `loss` function that we defined in [basics](../models/basics.md) will work as an objective. We can also define an objective in terms of some model:
```julia
m = Chain(
Dense(784, 32, σ),
Dense(32, 10), softmax)
# Model loss function
loss(x, y) = Flux.mse(m(x), y)
ps = Flux.params(m)
# later
Flux.train!(loss, data, opt)
Flux.train!(loss, ps, data, opt)
```
The loss will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
The objective will almost always be defined in terms of some *cost function* that measures the distance of the prediction `m(x)` from the target `y`. Flux has several of these built in, like `mse` for mean squared error or `crossentropy` for cross entropy loss, but you can calculate it however you want.
For a list of all built-in loss functions, check out the [layer reference](../models/layers.md).
At first glance it may seem strange that the model that we want to train is not part of the input arguments of `Flux.train!` too. However the target of the optimizer is not the model itself, but the objective function that represents the departure between modelled and observed data. In other words, the model is implicitly defined in the objective function, and there is no need to give it explicitly. Passing the objective function instead of the model and a cost function separately provides more flexibility, and the possibility of optimizing the calculations.
## Model parameters
The model to be trained must have a set of tracked parameters that are used to calculate the gradients of the objective function. In the [basics](../models/basics.md) section it is explained how to create models with such parameters. The second argument of the function `Flux.train!` must be an object containing those parameters, which can be obtained from a model `m` as `params(m)`.
Such an object contains a reference to the model's parameters, not a copy, such that after their training, the model behaves according to their updated values.
Handling all the parameters on a layer by layer basis is explained in the [Layer Helpers](../models/basics.md) section. Also, for freezing model parameters, see the [Advanced Usage Guide](../models/advanced.md).
## Datasets
@ -47,7 +59,8 @@ data = [(x, y)]
```julia
data = [(x, y), (x, y), (x, y)]
# Or equivalently
data = Iterators.repeated((x, y), 3)
using IterTools: ncycle
data = ncycle([(x, y)], 3)
```
It's common to load the `x`s and `y`s separately. In this case you can use `zip`:
@ -58,12 +71,40 @@ ys = [rand( 10), rand( 10), rand( 10)]
data = zip(xs, ys)
```
Training data can be conveniently partitioned for mini-batch training using the [`Flux.Data.DataLoader`](@ref) type:
```julia
X = rand(28, 28, 60000)
Y = rand(0:9, 60000)
data = DataLoader(X, Y, batchsize=128)
```
Note that, by default, `train!` only loops over the data once (a single "epoch").
A convenient way to run multiple epochs from the REPL is provided by `@epochs`.
```julia
julia> using Flux: @epochs
julia> @epochs 2 println("hello")
INFO: Epoch 1
hello
INFO: Epoch 2
hello
julia> @epochs 2 Flux.train!(...)
# Train for two epochs
```
```@docs
Flux.@epochs
```
## Callbacks
`train!` takes an additional argument, `cb`, that's used for callbacks so that you can observe the training process. For example:
```julia
train!(loss, data, opt, cb = () -> println("training"))
train!(objective, ps, data, opt, cb = () -> println("training"))
```
Callbacks are called for every batch of training data. You can slow this down using `Flux.throttle(f, timeout)` which prevents `f` from being called more than once every `timeout` seconds.
@ -74,6 +115,41 @@ A more typical callback might look like this:
test_x, test_y = # ... create single batch of test data ...
evalcb() = @show(loss(test_x, test_y))
Flux.train!(loss, data, opt,
Flux.train!(objective, ps, data, opt,
cb = throttle(evalcb, 5))
```
Calling `Flux.stop()` in a callback will exit the training loop early.
```julia
cb = function ()
accuracy() > 0.9 && Flux.stop()
end
```
## Custom Training loops
The `Flux.train!` function can be very convenient, especially for simple problems.
Its also very flexible with the use of callbacks.
But for some problems its much cleaner to write your own custom training loop.
An example follows that works similar to the default `Flux.train` but with no callbacks.
You don't need callbacks if you just code the calls to your functions directly into the loop.
E.g. in the places marked with comments.
```julia
function my_custom_train!(loss, ps, data, opt)
ps = Params(ps)
for d in data
gs = gradient(ps) do
training_loss = loss(d...)
# Insert whatever code you want here that needs Training loss, e.g. logging
return training_loss
end
# insert what ever code you want here that needs gradient
# E.g. logging with TensorBoardLogger.jl as histogram so you can see if it is becoming huge
update!(opt, ps, gs)
# Here you might like to check validation set accuracy, and break out to do early stopping
end
end
```
You could simplify this further, for example by hard-coding in the loss function.

49
docs/src/utilities.md Normal file
View File

@ -0,0 +1,49 @@
# Utility Functions
Flux contains some utility functions for working with data; these functions
help create inputs for your models or batch your dataset.
Other functions can be used to initialize your layers or to regularly execute
callback functions.
## Working with Data
```@docs
Flux.unsqueeze
Flux.stack
Flux.unstack
Flux.chunk
Flux.frequencies
Flux.batch
Flux.batchseq
Base.rpad(v::AbstractVector, n::Integer, p)
```
## Layer Initialization
These are primarily useful if you are planning to write your own layers.
Flux initializes convolutional layers and recurrent cells with `glorot_uniform`
by default.
To change the default on an applicable layer, pass the desired function with the
`init` keyword. For example:
```jldoctest; setup = :(using Flux)
julia> conv = Conv((3, 3), 1 => 8, relu; init=Flux.glorot_normal)
Conv((3, 3), 1=>8, relu)
```
```@docs
Flux.glorot_uniform
Flux.glorot_normal
```
## Model Abstraction
```@docs
Flux.destructure
```
## Callback Helpers
```@docs
Flux.throttle
Flux.stop
```

50
paper/paper.bib Normal file
View File

@ -0,0 +1,50 @@
@misc{Julia,
author = {Jeff Bezanson and Alan Edelman and Stefan Karpinski and Viral B. Shah},
title = {Julia: A Fresh Approach to Numerical Computing},
journal = {SIAM Review},
volume = {59},
year = {2017},
doi = {10.1137/141000671},
howpublished = {\url{julialang.org/publications/julia-fresh-approach-BEKS.pdf}}
}
@article{besard:2017,
author = {Tim Besard and Christophe Foket and De Sutter, Bjorn},
title = {Effective Extensible Programming: Unleashing {Julia} on {GPUs}},
journal = {arXiv},
volume = {abs/11712.03112},
year = {2017},
url = {https://arxiv.org/abs/1712.03112},
}
@online{MLPL,
author = {Mike Innes and others},
title = {On Machine Learning and Programming Languages},
year = 2017,
url = {https://julialang.org/blog/2017/12/ml&pl},
urldate = {2018-02-16}
}
@online{CuArrays,
author = {Mike Innes and others},
title = {Generic GPU Kernels},
year = 2017,
url = {https://mikeinnes.github.io/2017/08/24/cudanative.html},
urldate = {2018-02-16}
}
@online{Zoo,
author = {Mike Innes and others},
title = {Flux Model Zoo},
year = 2018,
url = {https://github.com/FluxML/model-zoo/},
urldate = {2018-02-16}
}
@online{Minibatch,
author = {James Bradbury},
title = {Minibatch.jl},
year = 2018,
url = {https://github.com/jekbradbury/Minibatch.jl},
urldate = {2018-02-16}
}

31
paper/paper.md Normal file
View File

@ -0,0 +1,31 @@
---
title: 'Flux: Elegant machine learning with Julia'
tags:
- deep learning
- machine learning
- natural language processing
- computer vision
- reinforcement learning
- robotics
- automatic differentiation
- compiler
authors:
- name: Mike Innes
orcid: 0000-0003-0788-0242
affiliation: 1
affiliations:
- name: Julia Computing
index: 1
date: 16 February 2018
bibliography: paper.bib
---
# Summary
Flux is library for machine learning (ML), written using the numerical computing language Julia [@Julia]. The package allows models to be written using Julia's simple mathematical syntax, and applies automatic differentiation (AD) to seamlessly calculate derivatives and train the model. Meanwhile, it makes heavy use of Julia's language and compiler features to carry out code analysis and make optimisations. For example, Julia's GPU compilation support [@besard:2017] can be used to JIT-compile custom GPU kernels for model layers [@CuArrays].
The machine learning community has traditionally been divided between "static" and "dynamic" frameworks that are easy to optimise and easy to use, respectively [@MLPL]. Flux blurs the line between these two approaches, combining a highly intuitive programming model with the compiler techniques needed by ML. This enables research into advanced compiler transforms such as batching [@Minibatch] without changing any user code.
Flux has been used heavily for natural language processing, but can also support state-of-the-art research models in areas like computer vision, reinforcement learning and robotics. Many examples of such models can be found in the model zoo [@Zoo].
# References

View File

@ -1,31 +1,57 @@
__precompile__()
module Flux
# Zero Flux Given
using Juno, Requires
using Lazy: @forward
using Base: tail
using Statistics, Random, LinearAlgebra
using Zygote, MacroTools, Juno, Reexport
using MacroTools: @forward
@reexport using NNlib
using Zygote: Params, @adjoint, gradient, pullback, @nograd
export Chain, Dense, RNN, LSTM,
SGD, param, params, mapleaves
export gradient
using NNlib
export σ, relu, softmax
include("tracker/Tracker.jl")
using .Tracker
export Chain, Dense, Maxout, RNN, LSTM, GRU, SamePad, Conv, CrossCor, ConvTranspose,
GlobalMaxPool, GlobalMeanPool, MaxPool, MeanPool, flatten,
DepthwiseConv, Dropout, AlphaDropout, LayerNorm, BatchNorm, InstanceNorm, GroupNorm,
SkipConnection, params, fmap, cpu, gpu, f32, f64, testmode!, trainmode!
include("optimise/Optimise.jl")
using .Optimise
using .Optimise: @epochs
export Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM,
ADAMW, RADAM, InvDecay, ExpDecay, WeightDecay,
ClipValue, ClipNorm
using CuArrays
const use_cuda = Ref(false)
include("utils.jl")
include("zeros.jl")
include("onehot.jl")
include("tree.jl")
include("functor.jl")
include("layers/softmax.jl")
include("layers/stateless.jl")
include("layers/basic.jl")
include("layers/conv.jl")
include("layers/recurrent.jl")
include("layers/normalise.jl")
include("data/Data.jl")
include("deprecations.jl")
include("cuda/cuda.jl")
function __init__()
use_cuda[] = CuArrays.functional() # Can be overridden after load with `Flux.use_cuda[] = false`
if CuArrays.functional()
if !CuArrays.has_cudnn()
@warn "CuArrays.jl found cuda, but did not find libcudnn. Some functionality will not be available."
end
end
end
end # module

9
src/cuda/cuda.jl Normal file
View File

@ -0,0 +1,9 @@
module CUDA
using ..CuArrays
using CuArrays: CUDNN
include("curnn.jl")
include("cudnn.jl")
end

8
src/cuda/cudnn.jl Normal file
View File

@ -0,0 +1,8 @@
import ..Flux: data
import CuArrays.CUDNN: batchnorm, ∇batchnorm
(BN::Flux.BatchNorm)(x::Union{CuArray{T,2},CuArray{T,4},CuArray{T,5}}, cache = nothing) where T<:Union{Float32, Float64} =
BN.λ.(batchnorm(BN.γ, BN.β, x, BN.μ, BN.σ², BN.momentum; cache = cache, alpha = 1, beta = 0, eps = BN.ϵ, training = Flux.istraining()))
@adjoint batchnorm(g, b, x, running_mean, running_var, momentum; kw...) =
batchnorm(g, b, x, running_mean, running_var, momentum; kw...), Δ -> (∇batchnorm(g, b, x, Δ, running_mean, running_var, momentum; kw...)..., nothing, nothing, nothing)

90
src/cuda/curnn.jl Normal file
View File

@ -0,0 +1,90 @@
import ..Flux: Flux, relu
using CuArrays.CUDAnative
CuRNN{T} = Flux.RNNCell{<:Union{typeof(tanh),typeof(relu)},<:CuArray{T,2},<:CuArray{T,1}}
CuGRU{T} = Flux.GRUCell{<:CuArray{T,2},<:CuArray{T,1}}
CuLSTM{T} = Flux.LSTMCell{<:CuArray{T,2},<:CuArray{T,1}}
CuRNNs{T} = Union{CuRNN{T},CuGRU{T},CuLSTM{T}}
function CUDNN.RNNDesc(m::CuRNNs{T}) where T
h, i = length(m.h), size(m.Wi, 2)
mode = m isa CuRNN ?
(m.σ == tanh ? CUDNN.CUDNN_RNN_TANH : CUDNN.CUDNN_RNN_RELU) :
m isa CuGRU ? CUDNN.CUDNN_GRU : CUDNN.CUDNN_LSTM
r = CUDNN.RNNDesc{T}(mode, i, h)
return r
end
const descs = WeakKeyDict()
function desc(rnn)
d = haskey(descs, rnn) ? descs[rnn] : (descs[rnn] = CUDNN.RNNDesc(rnn))
CUDNN.setweights!(d, rnn.Wi, rnn.Wh, rnn.b)
return d
end
import Zygote
using Zygote: @adjoint
function (m::CuRNN{T})(h::CuArray{T}, x::CuArray{T}) where T <: Union{Float32,Float64}
y, h = CUDNN.forward(desc(m), x, h)
return h, y
end
function (m::CuGRU{T})(h::CuArray{T}, x::CuArray{T}) where T <: Union{Float32,Float64}
y, h = CUDNN.forward(desc(m), x, h)
return h, y
end
function (m::CuLSTM{T})(h::NTuple{2,CuArray{T}}, x::CuArray{T}) where T <: Union{Float32,Float64}
y, h, c = CUDNN.forward(desc(m), x, h[1], h[2])
return (h, c), y
end
(m::CuRNN{T})(h::CuArray{T}, x) where T <: Union{Float32,Float64} = m(h, CuArray{T}(x))
(m::CuGRU{T})(h::CuArray{T}, x) where T <: Union{Float32,Float64} = m(h, CuArray{T}(x))
(m::CuLSTM{T})(h::NTuple{2,CuArray{T}}, x) where T <: Union{Float32,Float64} = m(h, CuArray{T}(x))
trim(x, Δ) = reshape(Δ, ntuple(i -> size(Δ, i), Val(ndims(x))))
unbroadcast(x::AbstractArray, Δ) =
size(x) == size(Δ) ? Δ :
length(x) == length(Δ) ? trim(x, Δ) :
trim(x, sum(Δ, dims = ntuple(i -> size(x, i) == 1 ? i : ndims(Δ)+1, Val(ndims(Δ)))))
coerce_cuda(x::Union{CuArray,Nothing}) = x
coerce_cuda(x::Tuple) = coerce_cuda.(x)
coerce_cuda(x::AbstractArray) = x .+ CuArrays.fill(0)
function struct_grad!(cx::Zygote.Context, x, )
for f in fieldnames(typeof(x))
Zygote.accum_param(cx, getfield(x, f), getfield(, f))
end
dx = Zygote.grad_mut(cx, x)
dx[] = Zygote.accum(dx[], )
return dx
end
for RNN in (CuRNN, CuGRU)
@eval @adjoint function (m::$RNN{T})(h::CuArray{T}, x::CuArray{T}) where T <: Union{Float32,Float64}
(y, ho), back = CUDNN.pullback(desc(m), x, h)
(ho, y), function (Δ)
dho, dy = coerce_cuda(Δ) # Support FillArrays etc.
= back(dy, dho)
dm = struct_grad!(__context__, m, (σ=nothing,Wi=transpose(.Wi),Wh=transpose(.Wh),b=.b,h=nothing))
(dm, unbroadcast(h, .h), .x)
end
end
end
@adjoint function (m::CuLSTM)((h, c)::Tuple{CuArray{T},CuArray{T}}, x::CuArray{T}) where T <: Union{Float32,Float64}
(y, ho, co), back = CUDNN.pullback(desc(m), x, h, c)
((ho, co), y), function (Δ)
dhc, dy = coerce_cuda(Δ) # Support FillArrays etc.
dho, dco = dhc === nothing ? (nothing, nothing) : dhc
= back(dy, dho, dco)
dm = struct_grad!(__context__, m, (σ=nothing,Wi=transpose(.Wi),Wh=transpose(.Wh),b=.b,h=nothing,c=nothing))
(dm, (unbroadcast(h, .h), unbroadcast(c, .c)), .x)
end
end

56
src/data/Data.jl Normal file
View File

@ -0,0 +1,56 @@
module Data
import ..Flux
import SHA
using Random: shuffle!
using Base: @propagate_inbounds
export CMUDict, cmudict
deps(path...) = joinpath(@__DIR__, "..", "..", "deps", path...)
function download_and_verify(url, path, hash)
tmppath = tempname()
download(url, tmppath)
hash_download = open(tmppath) do f
bytes2hex(SHA.sha256(f))
end
if hash_download !== hash
msg = "Hash Mismatch!\n"
msg *= " Expected sha256: $hash\n"
msg *= " Calculated sha256: $hash_download"
error(msg)
end
mv(tmppath, path; force=true)
end
function __init__()
mkpath(deps())
end
include("dataloader.jl")
export DataLoader
include("mnist.jl")
export MNIST
include("fashion-mnist.jl")
export FashionMNIST
include("cmudict.jl")
using .CMUDict
include("tree.jl")
include("sentiment.jl")
using .Sentiment
include("iris.jl")
export Iris
include("housing.jl")
export Housing
@deprecate DataLoader(x...; kws...) DataLoader(x; kws...)
end

76
src/data/cmudict.jl Normal file
View File

@ -0,0 +1,76 @@
module CMUDict
export cmudict
using ..Data: deps, download_and_verify
const version = "0.7b"
const cache_prefix = "https://cache.julialang.org"
function load()
suffixes_and_hashes = [("" , "209a8b4cd265013e96f4658632a9878103b0c5abf62b50d4ef3ae1be226b29e4"),
(".phones" , "ffb588a5e55684723582c7256e1d2f9fadb130011392d9e59237c76e34c2cfd6"),
(".symbols", "408ccaae803641c6d7b626b6299949320c2dbca96b2220fd3fb17887b023b027")]
if isdir(deps("cmudict"))
if all(isfile(deps("cmudict", "cmudict$x")) for (x, _) in suffixes_and_hashes)
return
end
end
@info "Downloading CMUDict dataset"
mkpath(deps("cmudict"))
for (x, hash) in suffixes_and_hashes
download_and_verify("$cache_prefix/https://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-$version$x",
deps("cmudict", "cmudict$x"), hash)
end
end
"""
phones()
Return a `Vector` containing the phones used in the CMU Pronouncing Dictionary.
"""
function phones()
load()
Symbol.(first.(split.(split(read(deps("cmudict", "cmudict.phones"),String),
"\n", keepempty = false), "\t")))
end
"""
symbols()
Return a `Vector` containing the symbols used in the CMU Pronouncing Dictionary.
A symbol is a phone with optional auxiliary symbols, indicating for example the
amount of stress on the phone.
"""
function symbols()
load()
Symbol.(split(read(deps("cmudict", "cmudict.symbols"),String),
"\n", keepempty = false))
end
"""
rawdict()
Return the unfiltered CMU Pronouncing Dictionary.
"""
function rawdict()
load()
Dict(String(xs[1]) => Symbol.(xs[2:end]) for xs in
filter(!isempty, split.(split(read(deps("cmudict", "cmudict"),String), "\n"))))
end
validword(s) = isascii(s) && occursin(r"^[\w\-\.]+$", s)
"""
cmudict()
Return a filtered CMU Pronouncing Dictionary.
It is filtered so each word contains only ASCII characters and a combination of
word characters (as determined by the regex engine using `\\w`), '-' and '.'.
"""
cmudict() = filter(p -> validword(p.first), rawdict())
alphabet() = ['A':'Z'..., '0':'9'..., '_', '-', '.']
end

110
src/data/dataloader.jl Normal file
View File

@ -0,0 +1,110 @@
# Adapted from Knet's src/data.jl (author: Deniz Yuret)
struct DataLoader{D}
data::D
batchsize::Int
nobs::Int
partial::Bool
imax::Int
indices::Vector{Int}
shuffle::Bool
end
"""
DataLoader(data; batchsize=1, shuffle=false, partial=true)
An object that iterates over mini-batches of `data`, each mini-batch containing `batchsize` observations
(except possibly the last one).
Takes as input a single data tensor, or a tuple (or a named tuple) of tensors.
The last dimension in each tensor is considered to be the observation dimension.
If `shuffle=true`, shuffles the observations each time iterations are re-started.
If `partial=false`, drops the last mini-batch if it is smaller than the batchsize.
The original data is preserved in the `data` field of the DataLoader.
Usage example:
Xtrain = rand(10, 100)
train_loader = DataLoader(Xtrain, batchsize=2)
# iterate over 50 mini-batches of size 2
for x in train_loader
@assert size(x) == (10, 2)
...
end
train_loader.data # original dataset
# similar, but yielding tuples
train_loader = DataLoader((Xtrain,), batchsize=2)
for (x,) in train_loader
@assert size(x) == (10, 2)
...
end
Xtrain = rand(10, 100)
Ytrain = rand(100)
train_loader = DataLoader((Xtrain, Ytrain), batchsize=2, shuffle=true)
for epoch in 1:100
for (x, y) in train_loader
@assert size(x) == (10, 2)
@assert size(y) == (2,)
...
end
end
# train for 10 epochs
using IterTools: ncycle
Flux.train!(loss, ps, ncycle(train_loader, 10), opt)
# can use NamedTuple to name tensors
train_loader = DataLoader((images=Xtrain, labels=Ytrain), batchsize=2, shuffle=true)
for datum in train_loader
@assert size(datum.images) == (10, 2)
@assert size(datum.labels) == (2,)
end
"""
function DataLoader(data; batchsize=1, shuffle=false, partial=true)
batchsize > 0 || throw(ArgumentError("Need positive batchsize"))
n = _nobs(data)
if n < batchsize
@warn "Number of observations less than batchsize, decreasing the batchsize to $n"
batchsize = n
end
imax = partial ? n : n - batchsize + 1
DataLoader(data, batchsize, n, partial, imax, [1:n;], shuffle)
end
@propagate_inbounds function Base.iterate(d::DataLoader, i=0) # returns data in d.indices[i+1:i+batchsize]
i >= d.imax && return nothing
if d.shuffle && i == 0
shuffle!(d.indices)
end
nexti = min(i + d.batchsize, d.nobs)
ids = d.indices[i+1:nexti]
batch = _getobs(d.data, ids)
return (batch, nexti)
end
function Base.length(d::DataLoader)
n = d.nobs / d.batchsize
d.partial ? ceil(Int,n) : floor(Int,n)
end
_nobs(data::AbstractArray) = size(data)[end]
function _nobs(data::Union{Tuple, NamedTuple})
length(data) > 0 || throw(ArgumentError("Need at least one data input"))
n = _nobs(data[1])
if !all(x -> _nobs(x) == n, Base.tail(data))
throw(DimensionMismatch("All data should contain same number of observations"))
end
return n
end
_getobs(data::AbstractArray, i) = data[ntuple(i -> Colon(), Val(ndims(data) - 1))..., i]
_getobs(data::Union{Tuple, NamedTuple}, i) = map(Base.Fix2(_getobs, i), data)
Base.eltype(::DataLoader{D}) where D = D

66
src/data/fashion-mnist.jl Normal file
View File

@ -0,0 +1,66 @@
module FashionMNIST
using ..MNIST: gzopen, imageheader, rawimage, labelheader, rawlabel
using ..Data: download_and_verify
const dir = joinpath(@__DIR__, "../../deps/fashion-mnist")
function load()
mkpath(dir)
cd(dir) do
for (file, hash) in [("train-images-idx3-ubyte", "3aede38d61863908ad78613f6a32ed271626dd12800ba2636569512369268a84"),
("train-labels-idx1-ubyte", "a04f17134ac03560a47e3764e11b92fc97de4d1bfaf8ba1a3aa29af54cc90845"),
("t10k-images-idx3-ubyte" , "346e55b948d973a97e58d2351dde16a484bd415d4595297633bb08f03db6a073"),
("t10k-labels-idx1-ubyte" , "67da17c76eaffca5446c3361aaab5c3cd6d1c2608764d35dfb1850b086bf8dd5")]
isfile(file) && continue
@info "Downloading Fashion-MNIST dataset"
download_and_verify("http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/$file.gz", "$file.gz", hash)
open(file, "w") do io
write(io, gzopen(read, "$file.gz"))
end
end
end
end
const TRAINIMAGES = joinpath(dir, "train-images-idx3-ubyte")
const TRAINLABELS = joinpath(dir, "train-labels-idx1-ubyte")
const TESTIMAGES = joinpath(dir, "t10k-images-idx3-ubyte")
const TESTLABELS = joinpath(dir, "t10k-labels-idx1-ubyte")
"""
images()
images(:test)
Load the Fashion-MNIST images.
Each image is a 28×28 array of `Gray` colour values
(see [Colors.jl](https://github.com/JuliaGraphics/Colors.jl)).
Return the 60,000 training images by default; pass `:test` to retrieve the
10,000 test images.
"""
function images(set = :train)
load()
io = IOBuffer(read(set == :train ? TRAINIMAGES : TESTIMAGES))
_, N, nrows, ncols = imageheader(io)
[rawimage(io) for _ in 1:N]
end
"""
labels()
labels(:test)
Load the labels corresponding to each of the images returned from [`images()`](@ref).
Each label is a number from 0-9.
Return the 60,000 training labels by default; pass `:test` to retrieve the
10,000 test labels.
"""
function labels(set = :train)
load()
io = IOBuffer(read(set == :train ? TRAINLABELS : TESTLABELS))
_, N = labelheader(io)
[rawlabel(io) for _ = 1:N]
end
end

136
src/data/housing.jl Normal file
View File

@ -0,0 +1,136 @@
"""
1. Title: Boston Housing Data
2. Sources:
(a) Origin: This dataset was taken from the StatLib library which is
maintained at Carnegie Mellon University.
(b) Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the
demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.
(c) Date: July 7, 1993
3. Number of Instances: 506
4. Number of Attributes: 13 continuous attributes (including "class"
attribute "MEDV"), 1 binary-valued attribute.
5. Attribute Information:
1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over
25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds
river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centres
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per 10,000 dollars
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
by town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in 1000's of dollars
Downloaded From: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
"""
module Housing
using DelimitedFiles
using ..Data: deps, download_and_verify
#Uncomment if package exists
#const cache_prefix = "https://cache.julialang.org/"
const cache_prefix = ""
function load()
isfile(deps("housing.data")) && return
@info "Downloading the Boston housing Dataset"
download_and_verify("$(cache_prefix)http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data",
deps("housing.data"),
"baadf72995725d76efe787b664e1f083388c79ba21ef9a7990d87f774184735a")
#@info "Download complete. Working on the files"
path = deps()
isfile(deps("housing.data")) && touch(joinpath(path, "tempfile.data"))
open(joinpath(path, "tempfile.data"), "a") do fout
open(deps("housing.data"), "r") do fin
for line in eachline(fin)
line = replace(lstrip(line), r" +" => s",")
println(fout, line)
end
end
end
mv(joinpath(path, "tempfile.data"), deps("housing.data"), force=true)
end
"""
Gets the targets for the Boston housing dataset, a 506 element array listing the targets for each example
```jldoctest
julia> using Flux
julia> target = Flux.Data.Housing.targets()
julia> summary(target)
506×1 Array{Float64,2}
julia> target[1]
24.0
"""
function targets()
load()
housing = readdlm(deps("housing.data"), ',')
reshape(Vector{Float64}(housing[1:end,end]), (506, 1))
end
"""
Gets the names of the features provided in the dataset
"""
function feature_names()
["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"]
end
"""
Gets the features of the Boston Housing Dataset. This is a 506x13 Matrix of Float64 datatypes.
The values are in the order ["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"].
It has 506 examples.
```jldoctest
julia> using Flux
julia> features = Flux.Data.Housing.features()
julia> summary(features)
506×13 Array{Float64,2}
julia> features[1, :]
13-element Array{Float64,1}:
0.00632
18.0
2.31
0.0
0.538
296.0
15.3
396.9
4.98
"""
function features()
load()
housing = readdlm(deps("housing.data"), ',')
Matrix{Float64}(housing[1:end, 1:13])
end
end

78
src/data/iris.jl Normal file
View File

@ -0,0 +1,78 @@
"""
Fisher's classic iris dataset.
Measurements from 3 different species of iris: setosa, versicolor and
virginica. There are 50 examples of each species.
There are 4 measurements for each example: sepal length, sepal width,
petal length and petal width. The measurements are in centimeters.
The module retrieves the data from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/iris).
"""
module Iris
using DelimitedFiles
using ..Data: deps, download_and_verify
# Uncomment if the iris.data file is cached to cache.julialang.org.
const cache_prefix = "https://cache.julialang.org/"
function load()
isfile(deps("iris.data")) && return
@info "Downloading iris dataset."
download_and_verify("$(cache_prefix)https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
deps("iris.data"),
"6f608b71a7317216319b4d27b4d9bc84e6abd734eda7872b71a458569e2656c0")
end
"""
labels()
Get the labels of the iris dataset, a 150 element array of strings listing the
species of each example.
```jldoctest; setup = :(Flux.Data.Iris.load())
julia> labels = Flux.Data.Iris.labels();
julia> summary(labels)
"150-element Array{String,1}"
julia> labels[1]
"Iris-setosa"
```
"""
function labels()
load()
iris = readdlm(deps("iris.data"), ',')
Vector{String}(iris[1:end, end])
end
"""
features()
Get the features of the iris dataset. This is a 4x150 matrix of Float64
elements. It has a row for each feature (sepal length, sepal width,
petal length, petal width) and a column for each example.
```jldoctest; setup = :(Flux.Data.Iris.load())
julia> features = Flux.Data.Iris.features();
julia> summary(features)
"4×150 Array{Float64,2}"
julia> features[:, 1]
4-element Array{Float64,1}:
5.1
3.5
1.4
0.2
```
"""
function features()
load()
iris = readdlm(deps("iris.data"), ',')
Matrix{Float64}(iris[1:end, 1:4]')
end
end

116
src/data/mnist.jl Normal file
View File

@ -0,0 +1,116 @@
module MNIST
using CodecZlib, Colors
using ..Data: download_and_verify
const Gray = Colors.Gray{Colors.N0f8}
const dir = joinpath(@__DIR__, "../../deps/mnist")
function gzopen(f, file)
open(file) do io
f(GzipDecompressorStream(io))
end
end
function load()
mkpath(dir)
cd(dir) do
for (file, hash) in [("train-images-idx3-ubyte", "440fcabf73cc546fa21475e81ea370265605f56be210a4024d2ca8f203523609"),
("train-labels-idx1-ubyte", "3552534a0a558bbed6aed32b30c495cca23d567ec52cac8be1a0730e8010255c"),
("t10k-images-idx3-ubyte" , "8d422c7b0a1c1c79245a5bcf07fe86e33eeafee792b84584aec276f5a2dbc4e6"),
("t10k-labels-idx1-ubyte" , "f7ae60f92e00ec6debd23a6088c31dbd2371eca3ffa0defaefb259924204aec6")]
isfile(file) && continue
@info "Downloading MNIST dataset"
download_and_verify("https://cache.julialang.org/http://yann.lecun.com/exdb/mnist/$file.gz", "$file.gz", hash)
open(file, "w") do io
write(io, gzopen(read, "$file.gz"))
end
end
end
end
const IMAGEOFFSET = 16
const LABELOFFSET = 8
const NROWS = 28
const NCOLS = 28
const TRAINIMAGES = joinpath(dir, "train-images-idx3-ubyte")
const TRAINLABELS = joinpath(dir, "train-labels-idx1-ubyte")
const TESTIMAGES = joinpath(dir, "t10k-images-idx3-ubyte")
const TESTLABELS = joinpath(dir, "t10k-labels-idx1-ubyte")
function imageheader(io::IO)
magic_number = bswap(read(io, UInt32))
total_items = bswap(read(io, UInt32))
nrows = bswap(read(io, UInt32))
ncols = bswap(read(io, UInt32))
return magic_number, Int(total_items), Int(nrows), Int(ncols)
end
function labelheader(io::IO)
magic_number = bswap(read(io, UInt32))
total_items = bswap(read(io, UInt32))
return magic_number, Int(total_items)
end
function rawimage(io::IO)
img = Array{Gray}(undef, NCOLS, NROWS)
for i in 1:NCOLS, j in 1:NROWS
img[i, j] = reinterpret(Colors.N0f8, read(io, UInt8))
end
return img
end
function rawimage(io::IO, index::Integer)
seek(io, IMAGEOFFSET + NROWS * NCOLS * (index - 1))
return rawimage(io)
end
rawlabel(io::IO) = Int(read(io, UInt8))
function rawlabel(io::IO, index::Integer)
seek(io, LABELOFFSET + (index - 1))
return rawlabel(io)
end
getfeatures(io::IO, index::Integer) = vec(getimage(io, index))
"""
images()
images(:test)
Load the MNIST images.
Each image is a 28×28 array of `Gray` colour values
(see [Colors.jl](https://github.com/JuliaGraphics/Colors.jl)).
Return the 60,000 training images by default; pass `:test` to retrieve the
10,000 test images.
"""
function images(set = :train)
load()
io = IOBuffer(read(set == :train ? TRAINIMAGES : TESTIMAGES))
_, N, nrows, ncols = imageheader(io)
[rawimage(io) for _ in 1:N]
end
"""
labels()
labels(:test)
Load the labels corresponding to each of the images returned from [`images()`](@ref).
Each label is a number from 0-9.
Return the 60,000 training labels by default; pass `:test` to retrieve the
10,000 test labels.
"""
function labels(set = :train)
load()
io = IOBuffer(read(set == :train ? TRAINLABELS : TESTLABELS))
_, N = labelheader(io)
[rawlabel(io) for _ = 1:N]
end
end # module

67
src/data/sentiment.jl Normal file
View File

@ -0,0 +1,67 @@
"Stanford Sentiment Treebank dataset."
module Sentiment
using ZipFile
using ..Data: deps, download_and_verify
function load()
isfile(deps("sentiment.zip")) && return
@info "Downloading sentiment treebank dataset"
download_and_verify("https://cache.julialang.org/https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip",
deps("sentiment.zip"), "5c613a4f673fc74097d523a2c83f38e0cc462984d847b82c7aaf36b01cbbbfcc")
end
getfile(r, name) = r.files[findfirst(x -> x.name == name, r.files)]
function getfile(name)
r = ZipFile.Reader(deps("sentiment.zip"))
text = read(getfile(r, "trees/$name"), String)
close(r)
return text
end
using ..Data: Tree
totree_(n, w) = Tree{Any}((parse(Int, n), w))
totree_(n, a, b) = Tree{Any}((parse(Int, n), nothing), totree(a), totree(b))
totree(t::Expr) = totree_(t.args...)
function parsetree(s)
s = replace(s, "\\" => "")
s = replace(s, "\$" => "\\\$")
s = replace(s, r"[^ \n\(\)]+" => s -> "\"$s\"")
s = replace(s, " " => ", ")
return totree(Meta.parse(s))
end
function gettrees(name)
load()
ss = split(getfile("$name.txt"), '\n', keepempty = false)
return parsetree.(ss)
end
"""
train()
Return the train split of the Stanford Sentiment Treebank.
The data is in [treebank](https://en.wikipedia.org/wiki/Treebank) format.
"""
train() = gettrees("train")
"""
test()
Return the test split of the Stanford Sentiment Treebank.
The data is in [treebank](https://en.wikipedia.org/wiki/Treebank) format.
"""
test() = gettrees("test")
"""
dev()
Return the dev split of the Stanford Sentiment Treebank.
The data is in [treebank](https://en.wikipedia.org/wiki/Treebank) format.
"""
dev() = gettrees("dev")
end

42
src/data/tree.jl Normal file
View File

@ -0,0 +1,42 @@
using AbstractTrees
struct Tree{T}
value::T
children::Vector{Tree{T}}
end
Tree{T}(x::T, xs::Tree{T}...) where T = Tree{T}(x, [xs...])
Tree{T}(x) where T = Tree(convert(T, x))
Tree(x::T, xs::Tree{T}...) where T = Tree{T}(x, xs...)
AbstractTrees.children(t::Tree) = t.children
AbstractTrees.printnode(io::IO, t::Tree) = show(io, t.value)
Base.show(io::IO, t::Type{Tree}) = print(io, "Tree")
Base.show(io::IO, t::Type{Tree{T}}) where T = print(io, "Tree{", T, "}")
function Base.show(io::IO, t::Tree)
println(io, typeof(t))
print_tree(io, t)
end
using Juno
@render Juno.Inline t::Tree begin
render(t) = Juno.Tree(t.value, render.(t.children))
Juno.Tree(typeof(t), [render(t)])
end
Base.getindex(t::Tree, i::Integer) = t.children[i]
Base.getindex(t::Tree, i::Integer, is::Integer...) = t[i][is...]
# Utilities
isleaf(t) = isempty(children(t))
leaves(xs::Tree) = map(x -> x.value, Leaves(xs))
Base.map(f, t::Tree, ts::Tree...) =
Tree{Any}(f(map(t -> t.value, (t, ts...))...),
[map(f, chs...) for chs in zip(map(t -> t.children, (t, ts...))...)]...)

2
src/deprecations.jl Normal file
View File

@ -0,0 +1,2 @@
@deprecate param(x) x
@deprecate data(x) x

82
src/functor.jl Normal file
View File

@ -0,0 +1,82 @@
import Adapt: adapt, adapt_storage
using Zygote: IdSet
import Functors: @functor, functor, fmap
trainable(m) = functor(m)[1]
"""
testmode!(m, mode = true)
Set a layer or model's test mode (see below).
Using `:auto` mode will treat any gradient computation as training.
_Note_: if you manually set a model into test mode, you need to manually place
it back into train mode during training phase.
Possible values include:
- `false` for training
- `true` for testing
- `:auto` or `nothing` for Flux to detect the mode automatically
"""
testmode!(m, mode = true) = m
"""
trainmode!(m, mode = true)
Set a layer of model's train mode (see below).
Symmetric to [`testmode!`](@ref) (i.e. `trainmode!(m, mode) == testmode!(m, !mode)`).
_Note_: if you manually set a model into train mode, you need to manually place
it into test mode during testing phase.
Possible values include:
- `true` for training
- `false` for testing
- `:auto` or `nothing` for Flux to detect the mode automatically
"""
trainmode!(m, mode = true) = mode isa Bool ? testmode!(m, !mode) : testmode!(m, mode)
params!(p::Params, x::AbstractArray{<:Number}, seen = IdSet()) = push!(p, x)
function params!(p::Params, x, seen = IdSet())
x in seen && return
push!(seen, x)
for child in trainable(x)
params!(p, child, seen)
end
end
function params(m...)
ps = Params()
params!(ps, m)
return ps
end
# Deprecated stuff
macro treelike(args...)
functorm(args...)
end
mapleaves(f, x) = fmap(f, x)
function loadparams!(m, xs)
for (p, x) in zip(params(m), xs)
size(p) == size(x) ||
error("Expected param size $(size(p)), got $(size(x))")
copyto!(p, x)
end
end
# CPU/GPU movement conveniences
cpu(m) = fmap(x -> adapt(Array, x), m)
gpu(x) = use_cuda[] ? fmap(CuArrays.cu, x) : x
# Precision
adapt_storage(T::Type{<:Real}, xs::AbstractArray{<:Real}) = convert.(T, xs)
paramtype(T::Type{<:Real}, m) = fmap(x -> adapt(T, x), m)
f32(m) = paramtype(Float32, m)
f64(m) = paramtype(Float64, m)

View File

@ -4,59 +4,120 @@
Chain multiple layers / functions together, so that they are called in sequence
on a given input.
m = Chain(x -> x^2, x -> x+1)
m(5) == 26
m = Chain(Dense(10, 5), Dense(5, 2))
x = rand(10)
m(x) == m[2](m[1](x))
`Chain` also supports indexing and slicing, e.g. `m[2]` or `m[1:end-1]`.
`m[1:3](x)` will calculate the output of the first three layers.
# Examples
```jldoctest
julia> m = Chain(x -> x^2, x -> x+1);
julia> m(5) == 26
true
julia> m = Chain(Dense(10, 5), Dense(5, 2));
julia> x = rand(10);
julia> m(x) == m[2](m[1](x))
true
```
"""
type Chain
layers::Vector{Any}
Chain(xs...) = new([xs...])
struct Chain{T<:Tuple}
layers::T
Chain(xs...) = new{typeof(xs)}(xs)
end
@forward Chain.layers Base.getindex, Base.first, Base.last, Base.endof, Base.push!
@forward Chain.layers Base.start, Base.next, Base.done
@forward Chain.layers Base.getindex, Base.length, Base.first, Base.last,
Base.iterate, Base.lastindex
children(c::Chain) = c.layers
mapchildren(f, c::Chain) = Chain(f.(c.layers)...)
functor(::Type{<:Chain}, c) = c.layers, ls -> Chain(ls...)
(s::Chain)(x) = foldl((x, m) -> m(x), x, s.layers)
applychain(::Tuple{}, x) = x
applychain(fs::Tuple, x) = applychain(tail(fs), first(fs)(x))
(c::Chain)(x) = applychain(c.layers, x)
Base.getindex(c::Chain, i::AbstractArray) = Chain(c.layers[i]...)
testmode!(m::Chain, mode = true) = (map(x -> testmode!(x, mode), m.layers); m)
function Base.show(io::IO, c::Chain)
print(io, "Chain(")
join(io, c.layers, ", ")
print(io, ")")
end
"""
outdims(c::Chain, isize)
Calculate the output dimensions given the input dimensions, `isize`.
```julia
m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32))
outdims(m, (10, 10)) == (6, 6)
```
"""
outdims(c::Chain, isize) = foldl(, map(l -> (x -> outdims(l, x)), c.layers))(isize)
# This is a temporary and naive implementation
# it might be replaced in the future for better performance
# see issue https://github.com/FluxML/Flux.jl/issues/702
# Johnny Chen -- @johnnychen94
# only slightly changed to better handle interaction with Zygote @dsweber2
"""
activations(c::Chain, input)
Calculate the forward results of each layers in Chain `c` with `input` as model input.
"""
function activations(c::Chain, input)
extraChain(c.layers, input)
end
function extraChain(fs::Tuple, x)
res = first(fs)(x)
return (res, extraChain(Base.tail(fs), res)...)
end
extraChain(::Tuple{}, x) = ()
"""
Dense(in::Integer, out::Integer, σ = identity)
Creates a traditional `Dense` layer with parameters `W` and `b`.
Create a traditional `Dense` layer with parameters `W` and `b`.
y = σ.(W * x .+ b)
The input `x` must be a vector of length `in`, or a batch of vectors represented
as an `in × N` matrix. The out `y` will be a vector or batch of length `in`.
as an `in × N` matrix. The out `y` will be a vector or batch of length `out`.
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
julia> d = Dense(5, 2)
Dense(5, 2)
julia> d(rand(5))
2-element Array{Float32,1}:
-0.16210233
0.12311903```
"""
struct Dense{F,S,T}
σ::F
struct Dense{F,S<:AbstractArray,T<:AbstractArray}
W::S
b::T
σ::F
end
Dense(in::Integer, out::Integer, σ = identity; init = initn) =
Dense(σ, param(init(out, in)), param(init(out)))
Dense(W, b) = Dense(W, b, identity)
treelike(Dense)
function Dense(in::Integer, out::Integer, σ = identity;
initW = glorot_uniform, initb = zeros)
return Dense(initW(out, in), initb(out), σ)
end
function (a::Dense)(x)
@functor Dense
function (a::Dense)(x::AbstractArray)
W, b, σ = a.W, a.b, a.σ
σ.(W*x .+ b)
end
@ -66,3 +127,134 @@ function Base.show(io::IO, l::Dense)
l.σ == identity || print(io, ", ", l.σ)
print(io, ")")
end
# Try to avoid hitting generic matmul in some simple cases
# Base's matmul is so slow that it's worth the extra conversion to hit BLAS
(a::Dense{<:Any,W})(x::AbstractArray{T}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
invoke(a, Tuple{AbstractArray}, x)
(a::Dense{<:Any,W})(x::AbstractArray{<:AbstractFloat}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
"""
outdims(l::Dense, isize)
Calculate the output dimensions given the input dimensions, `isize`.
```julia
m = Dense(10, 5)
outdims(m, (5, 2)) == (5,)
outdims(m, (10,)) == (5,)
```
"""
outdims(l::Dense, isize) = (size(l.W)[1],)
"""
Diagonal(in::Integer)
Create an element-wise linear transformation layer with learnable
vectors `α` and `β`:
y = α .* x .+ β
The input `x` must be a array where `size(x, 1) == in`.
"""
struct Diagonal{T}
α::T
β::T
end
Diagonal(in::Integer; initα = ones, initβ = zeros) =
Diagonal(initα(in), initβ(in))
@functor Diagonal
function (a::Diagonal)(x)
α, β = a.α, a.β
α.*x .+ β
end
function Base.show(io::IO, l::Diagonal)
print(io, "Diagonal(", length(l.α), ")")
end
outdims(l::Diagonal, isize) = (length(l.α),)
"""
Maxout(over)
The [Maxout](https://arxiv.org/pdf/1302.4389.pdf) layer has a number of
internal layers which all receive the same input. It returns the elementwise
maximum of the internal layers' outputs.
Maxout over linear dense layers satisfies the univeral approximation theorem.
"""
struct Maxout{FS<:Tuple}
over::FS
end
"""
Maxout(f, n_alts)
Construct a Maxout layer over `n_alts` instances of the layer given by `f`.
The function takes no arguments and should return some callable layer.
Conventionally, this is a linear dense layer.
# Examples
This constructs a `Maxout` layer over 4 internal dense linear layers, each
identical in structure (784 inputs, 128 outputs):
```julia
insize = 784
outsize = 128
Maxout(()->Dense(insize, outsize), 4)
```
"""
function Maxout(f, n_alts)
over = Tuple(f() for _ in 1:n_alts)
return Maxout(over)
end
@functor Maxout
function (mo::Maxout)(input::AbstractArray)
mapreduce(f -> f(input), (acc, out) -> max.(acc, out), mo.over)
end
outdims(l::Maxout, isize) = outdims(first(l.over), isize)
"""
SkipConnection(layer, connection)
Create a skip connection which consists of a layer or `Chain` of consecutive
layers and a shortcut connection linking the block's input to the output
through a user-supplied 2-argument callable. The first argument to the callable
will be propagated through the given `layer` while the second is the unchanged,
"skipped" input.
The simplest "ResNet"-type connection is just `SkipConnection(layer, +)`,
and requires the output of the layers to be the same shape as the input.
Here is a more complicated example:
```julia
m = Conv((3,3), 4=>7, pad=(1,1))
x = ones(5,5,4,10);
size(m(x)) == (5, 5, 7, 10)
sm = SkipConnection(m, (mx, x) -> cat(mx, x, dims=3))
size(sm(x)) == (5, 5, 11, 10)
```
"""
struct SkipConnection
layers
connection #user can pass arbitrary connections here, such as (a,b) -> a + b
end
@functor SkipConnection
function (skip::SkipConnection)(input)
skip.connection(skip.layers(input), input)
end
function Base.show(io::IO, b::SkipConnection)
print(io, "SkipConnection(", b.layers, ", ", b.connection, ")")
end

592
src/layers/conv.jl Normal file
View File

@ -0,0 +1,592 @@
using NNlib: conv, ∇conv_data, depthwiseconv, output_size
# pad dims of x with dims of y until ndims(x) == ndims(y)
_paddims(x::Tuple, y::Tuple) = (x..., y[(end - (length(y) - length(x) - 1)):end]...)
_convtransoutdims(isize, ksize, ssize, dsize, pad) = (isize .- 1).*ssize .+ 1 .+ (ksize .- 1).*dsize .- (pad[1:2:end] .+ pad[2:2:end])
expand(N, i::Tuple) = i
expand(N, i::Integer) = ntuple(_ -> i, N)
"""
SamePad
Padding for convolutional layers will be calculated so that outputshape == inputshape when stride = 1.
For stride > 1 the output shape depends on the type of convolution layer.
"""
struct SamePad end
calc_padding(pad, k::NTuple{N,T}, dilation, stride) where {T,N}= expand(Val(2*N), pad)
function calc_padding(::SamePad, k::NTuple{N,T}, dilation, stride) where {N,T}
#Ref: "A guide to convolution arithmetic for deep learning" https://arxiv.org/pdf/1603.07285
# Effective kernel size, including dilation
k_eff = @. k + (k - 1) * (dilation - 1)
# How much total padding needs to be applied?
pad_amt = @. k_eff - 1
# In case amount of padding is odd we need to apply different amounts to each side.
return Tuple(mapfoldl(i -> [ceil(Int, i/2), floor(Int, i/2)], vcat, pad_amt))
end
"""
Conv(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
filter = (2,2)
in = 1
out = 16
Conv((2, 2), 1=>16, relu)
Standard convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
# Examples
Apply a `Conv` layer to a 1-channel input using a 2×2 window filter size, giving us a
16-channel output. Output is activated with ReLU.
```julia
filter = (2,2)
in = 1
out = 16
Conv(filter, in => out, relu)
```
"""
struct Conv{N,M,F,A,V}
σ::F
weight::A
bias::V
stride::NTuple{N,Int}
pad::NTuple{M,Int}
dilation::NTuple{N,Int}
end
"""
Conv(weight::AbstractArray, bias::AbstractArray)
Conv(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the convolutional layer with user defined weight and bias arrays.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
There is also a keyword-only constuctor available for all convoultional
layers.
```julia
weight = rand(Float32, 3, 3, 5)
bias = zeros(Float32, 5)
Conv(weight = weight,
bias = bias,
σ = sigmoid)
```
"""
function Conv(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return Conv(σ, w, b, stride, pad, dilation)
end
function Conv(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
Conv(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
"""
convfilter(filter::Tuple, in=>out)
Constructs a standard convolutional weight matrix with given `filter` and
channels from `in` to `out`.
Accepts the keyword `init` (default: `glorot_uniform`) to control the sampling
distribution.
See also: [`depthwiseconvfilter`](@ref)
"""
convfilter(filter::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer};
init = glorot_uniform) where N = init(filter..., ch...)
function Conv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, ch, init = init), bias = zeros(ch[2])) where N
Conv(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor Conv
function (c::Conv)(x::AbstractArray)
# TODO: breaks gpu broadcast :(
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, ntuple(_->1, length(c.stride))..., :, 1)
cdims = DenseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
σ.(conv(x, c.weight, cdims) .+ b)
end
function Base.show(io::IO, l::Conv)
print(io, "Conv(", size(l.weight)[1:ndims(l.weight)-2])
print(io, ", ", size(l.weight, ndims(l.weight)-1), "=>", size(l.weight, ndims(l.weight)))
l.σ == identity || print(io, ", ", l.σ)
print(io, ")")
end
(a::Conv{<:Any,<:Any,W})(x::AbstractArray{T}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
invoke(a, Tuple{AbstractArray}, x)
(a::Conv{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
"""
outdims(l::Conv, isize::Tuple)
Calculate the output dimensions given the input dimensions `isize`.
Batch size and channel size are ignored as per [NNlib.jl](https://github.com/FluxML/NNlib.jl).
```julia
m = Conv((3, 3), 3 => 16)
outdims(m, (10, 10)) == (8, 8)
outdims(m, (10, 10, 1, 3)) == (8, 8)
```
"""
outdims(l::Conv, isize) =
output_size(DenseConvDims(_paddims(isize, size(l.weight)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
ConvTranspose(filter, in=>out)
ConvTranspose(filter, in=>out, activation)
ConvTranspose(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard convolutional transpose layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == stride * inputsize - stride + 1.
"""
struct ConvTranspose{N,M,F,A,V}
σ::F
weight::A
bias::V
stride::NTuple{N,Int}
pad::NTuple{M,Int}
dilation::NTuple{N,Int}
end
"""
ConvTranspose(weight::AbstractArray, bias::AbstractArray)
ConvTranspose(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the convolutional transpose layer with user defined weight and bias arrays.
forward pass.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function ConvTranspose(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return ConvTranspose(σ, w, b, stride, pad, dilation)
end
function ConvTranspose(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
ConvTranspose(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
function ConvTranspose(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, reverse(ch), init = init), bias = zeros(ch[2])) where N
ConvTranspose(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor ConvTranspose
function conv_transpose_dims(c::ConvTranspose, x::AbstractArray)
# Calculate size of "input", from ∇conv_data()'s perspective...
combined_pad = (c.pad[1:2:end] .+ c.pad[2:2:end])
I = (size(x)[1:end-2] .- 1).*c.stride .+ 1 .+ (size(c.weight)[1:end-2] .- 1).*c.dilation .- combined_pad
C_in = size(c.weight)[end-1]
batch_size = size(x)[end]
# Create DenseConvDims() that looks like the corresponding conv()
return DenseConvDims((I..., C_in, batch_size), size(c.weight);
stride=c.stride,
padding=c.pad,
dilation=c.dilation,
)
end
# TODO: Find proper fix for https://github.com/FluxML/Flux.jl/issues/900
@nograd conv_transpose_dims
function (c::ConvTranspose)(x::AbstractArray)
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = conv_transpose_dims(c, x)
σ.(∇conv_data(x, c.weight, cdims) .+ b)
end
function Base.show(io::IO, l::ConvTranspose)
print(io, "ConvTranspose(", size(l.weight)[1:ndims(l.weight)-2])
print(io, ", ", size(l.weight, ndims(l.weight)), "=>", size(l.weight, ndims(l.weight)-1))
l.σ == identity || print(io, ", ", l.σ)
print(io, ")")
end
(a::ConvTranspose{<:Any,<:Any,W})(x::AbstractArray{T}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
invoke(a, Tuple{AbstractArray}, x)
(a::ConvTranspose{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
outdims(l::ConvTranspose{N}, isize) where N = _convtransoutdims(isize[1:2], size(l.weight)[1:N], l.stride, l.dilation, l.pad)
"""
DepthwiseConv(filter::Tuple, in=>out)
DepthwiseConv(filter::Tuple, in=>out, activation)
DepthwiseConv(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Depthwise convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Note that `out` must be an integer multiple of `in`.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
"""
struct DepthwiseConv{N,M,F,A,V}
σ::F
weight::A
bias::V
stride::NTuple{N,Int}
pad::NTuple{M,Int}
dilation::NTuple{N,Int}
end
"""
DepthwiseConv(weight::AbstractArray, bias::AbstractArray)
DepthwiseConv(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the `DepthwiseConv` layer with user defined weight and bias arrays.
forward pass.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function DepthwiseConv(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return DepthwiseConv(σ, w, b, stride, pad, dilation)
end
function DepthwiseConv(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
DepthwiseConv(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
"""
depthwiseconvfilter(filter::Tuple, in=>out)
Constructs a depthwise convolutional weight array defined by `filter` and channels
from `in` to `out`.
Accepts the keyword `init` (default: `glorot_uniform`) to control the sampling
distribution.
See also: [`convfilter`](@ref)
"""
depthwiseconvfilter(filter::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer};
init = glorot_uniform) where N = init(filter..., div(ch[2], ch[1]), ch[1])
function DepthwiseConv(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = depthwiseconvfilter(k, ch, init = init), bias = zeros(ch[2])) where N
@assert ch[2] % ch[1] == 0 "Output channels must be integer multiple of input channels"
return DepthwiseConv(
weight,
bias,
σ;
stride = stride,
pad = pad,
dilation = dilation
)
end
@functor DepthwiseConv
function (c::DepthwiseConv)(x)
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = DepthwiseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
σ.(depthwiseconv(x, c.weight, cdims) .+ b)
end
function Base.show(io::IO, l::DepthwiseConv)
print(io, "DepthwiseConv(", size(l.weight)[1:end-2])
print(io, ", ", size(l.weight)[end], "=>", prod(size(l.weight)[end-1:end]))
l.σ == identity || print(io, ", ", l.σ)
print(io, ")")
end
(a::DepthwiseConv{<:Any,<:Any,W})(x::AbstractArray{T}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
invoke(a, Tuple{AbstractArray}, x)
(a::DepthwiseConv{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
outdims(l::DepthwiseConv, isize) =
output_size(DepthwiseConvDims(_paddims(isize, (1, 1, size(l.weight)[end], 1)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
CrossCor(filter, in=>out)
CrossCor(filter, in=>out, activation)
CrossCor(filter, in => out, σ = identity; init = glorot_uniform,
stride = 1, pad = 0, dilation = 1)
Standard cross convolutional layer. `filter` should be a tuple like `(2, 2)`.
`in` and `out` specify the number of input and output channels respectively.
Data should be stored in WHCN order (width, height, # channels, batch size).
In other words, a 100×100 RGB image would be a `100×100×3×1` array,
and a batch of 50 would be a `100×100×3×50` array.
Accepts keyword arguments `weight` and `bias` to set the corresponding fields.
Setting `bias` to `Flux.Zeros()` will switch bias off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
# Examples
Apply a `CrossCor` layer to a 1-channel input using a 2×2 window filter size, giving us a
16-channel output. Output is activated with ReLU.
```julia
filter = (2,2)
in = 1
out = 16
CrossCor((2, 2), 1=>16, relu)
```
"""
struct CrossCor{N,M,F,A,V}
σ::F
weight::A
bias::V
stride::NTuple{N,Int}
pad::NTuple{M,Int}
dilation::NTuple{N,Int}
end
"""
CrossCor(weight::AbstractArray, bias::AbstractArray)
CrossCor(weight::AbstractArray, bias::AbstractArray, activation)
Constructs the standard cross convolutional layer with user defined weight and bias
arrays.
Setting `bias` to `Flux.Zeros()` would switch `bias` off for the layer.
Takes the keyword arguments `pad`, `stride` and `dilation`.
For keyword-only constuctor, see also [`Conv`](@ref)
"""
function CrossCor(w::AbstractArray{T,N}, b::Union{Zeros, AbstractVector{T}}, σ = identity;
stride = 1, pad = 0, dilation = 1) where {T,N}
stride = expand(Val(N-2), stride)
dilation = expand(Val(N-2), dilation)
pad = calc_padding(pad, size(w)[1:N-2], dilation, stride)
return CrossCor(σ, w, b, stride, pad, dilation)
end
function CrossCor(;weight::AbstractArray{T,N}, bias::Union{Zeros, AbstractVector{T}},
activation = identity, stride = 1, pad = 0, dilation = 1) where {T,N}
CrossCor(weight, bias, activation, stride = stride, pad = pad, dilation = dilation)
end
function CrossCor(k::NTuple{N,Integer}, ch::Pair{<:Integer,<:Integer}, σ = identity;
init = glorot_uniform, stride = 1, pad = 0, dilation = 1,
weight = convfilter(k, ch, init = init), bias = zeros(ch[2])) where N
CrossCor(weight, bias, σ,
stride = stride, pad = pad, dilation = dilation)
end
@functor CrossCor
function crosscor(x, w, ddims::DenseConvDims)
ddims = DenseConvDims(ddims, F=true)
return conv(x, w, ddims)
end
function (c::CrossCor)(x::AbstractArray)
# TODO: breaks gpu broadcast :(
# ndims(x) == ndims(c.weight)-1 && return squeezebatch(c(reshape(x, size(x)..., 1)))
σ, b = c.σ, reshape(c.bias, map(_->1, c.stride)..., :, 1)
cdims = DenseConvDims(x, c.weight; stride=c.stride, padding=c.pad, dilation=c.dilation)
σ.(crosscor(x, c.weight, cdims) .+ b)
end
function Base.show(io::IO, l::CrossCor)
print(io, "CrossCor(", size(l.weight)[1:ndims(l.weight)-2])
print(io, ", ", size(l.weight, ndims(l.weight)-1), "=>", size(l.weight, ndims(l.weight)))
l.σ == identity || print(io, ", ", l.σ)
print(io, ")")
end
(a::CrossCor{<:Any,<:Any,W})(x::AbstractArray{T}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
invoke(a, Tuple{AbstractArray}, x)
(a::CrossCor{<:Any,<:Any,W})(x::AbstractArray{<:Real}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}} =
a(T.(x))
outdims(l::CrossCor, isize) =
output_size(DenseConvDims(_paddims(isize, size(l.weight)), size(l.weight); stride = l.stride, padding = l.pad, dilation = l.dilation))
"""
GlobalMaxPool()
Global max pooling layer.
Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output,
by performing max pooling on the complete (w,h)-shaped feature maps.
"""
struct GlobalMaxPool end
function (g::GlobalMaxPool)(x)
# Input size
x_size = size(x)
# Kernel size
k = x_size[1:end-2]
# Pooling dimensions
pdims = PoolDims(x, k)
return maxpool(x, pdims)
end
function Base.show(io::IO, g::GlobalMaxPool)
print(io, "GlobalMaxPool()")
end
"""
GlobalMeanPool()
Global mean pooling layer.
Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output,
by performing mean pooling on the complete (w,h)-shaped feature maps.
"""
struct GlobalMeanPool end
function (g::GlobalMeanPool)(x)
# Input size
x_size = size(x)
# Kernel size
k = x_size[1:end-2]
# Pooling dimensions
pdims = PoolDims(x, k)
return meanpool(x, pdims)
end
function Base.show(io::IO, g::GlobalMeanPool)
print(io, "GlobalMeanPool()")
end
"""
MaxPool(k; pad = 0, stride = k)
Max pooling layer. `k` is the size of the window for each dimension of the input.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
=======
"""
struct MaxPool{N,M}
k::NTuple{N,Int}
pad::NTuple{M,Int}
stride::NTuple{N,Int}
end
function MaxPool(k::NTuple{N,Integer}; pad = 0, stride = k) where N
stride = expand(Val(N), stride)
pad = calc_padding(pad, k, 1, stride)
return MaxPool(k, pad, stride)
end
function (m::MaxPool)(x)
pdims = PoolDims(x, m.k; padding=m.pad, stride=m.stride)
return maxpool(x, pdims)
end
function Base.show(io::IO, m::MaxPool)
print(io, "MaxPool(", m.k, ", pad = ", m.pad, ", stride = ", m.stride, ")")
end
outdims(l::MaxPool{N}, isize) where N = output_size(PoolDims(_paddims(isize, (l.k..., 1, 1)), l.k; stride = l.stride, padding = l.pad))
"""
MeanPool(k; pad = 0, stride = k)
Mean pooling layer. `k` is the size of the window for each dimension of the input.
Use `pad=SamePad()` to apply padding so that outputsize == inputsize / stride.
"""
struct MeanPool{N,M}
k::NTuple{N,Int}
pad::NTuple{M,Int}
stride::NTuple{N,Int}
end
function MeanPool(k::NTuple{N,Integer}; pad = 0, stride = k) where N
stride = expand(Val(N), stride)
pad = calc_padding(pad, k, 1, stride)
return MeanPool(k, pad, stride)
end
function (m::MeanPool)(x)
pdims = PoolDims(x, m.k; padding=m.pad, stride=m.stride)
return meanpool(x, pdims)
end
function Base.show(io::IO, m::MeanPool)
print(io, "MeanPool(", m.k, ", pad = ", m.pad, ", stride = ", m.stride, ")")
end
outdims(l::MeanPool{N}, isize) where N = output_size(PoolDims(_paddims(isize, (l.k..., 1, 1)), l.k; stride = l.stride, padding = l.pad))

416
src/layers/normalise.jl Normal file
View File

@ -0,0 +1,416 @@
istraining() = false
@adjoint istraining() = true, _ -> nothing
_isactive(m) = isnothing(m.active) ? istraining() : m.active
_dropout_shape(s, ::Colon) = size(s)
_dropout_shape(s, dims) = tuple((i dims ? 1 : si for (i, si) enumerate(size(s)))...)
_dropout_kernel(y::T, p, q) where {T} = y > p ? T(1 / q) : T(0)
"""
dropout(x, p; dims = :)
The dropout function. For each input, either sets that input to `0` (with probability
`p`) or scales it by `1 / (1 - p)`. `dims` specifies the unbroadcasted dimensions,
e.g. `dims=1` applies dropout along columns and `dims=2` along rows.
This is used as a regularisation, i.e. it reduces overfitting during training.
See also the [`Dropout`](@ref) layer.
"""
dropout(x, p; dims = :) = x
@adjoint function dropout(x, p; dims = :)
y = rand!(similar(x, _dropout_shape(x, dims)))
y .= _dropout_kernel.(y, p, 1 - p)
return x .* y, Δ -> (Δ .* y, nothing)
end
"""
Dropout(p, dims = :)
Dropout layer. In the forward pass, apply the [`Flux.dropout`](@ref) function on the input.
Does nothing to the input once [`Flux.testmode!`](@ref) is `true`.
"""
mutable struct Dropout{F,D}
p::F
dims::D
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
Dropout(p, dims) = Dropout(p, dims, nothing)
function Dropout(p; dims = :)
@assert 0 p 1
Dropout{typeof(p),typeof(dims)}(p, dims, nothing)
end
function (a::Dropout)(x)
_isactive(a) || return x
return dropout(x, a.p; dims = a.dims)
end
testmode!(m::Dropout, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, d::Dropout)
print(io, "Dropout(", d.p)
d.dims != (:) && print(io, ", dims = $(repr(d.dims))")
print(io, ")")
end
"""
AlphaDropout(p)
A dropout layer. Used in
[Self-Normalizing Neural Networks](https://papers.nips.cc/paper/6698-self-normalizing-neural-networks.pdf).
The AlphaDropout layer ensures that mean and variance of activations
remain the same as before.
Does nothing to the input once [`testmode!`](@ref) is true.
"""
mutable struct AlphaDropout{F}
p::F
active::Union{Bool, Nothing}
function AlphaDropout(p, active = nothing)
@assert 0 p 1
new{typeof(p)}(p, active)
end
end
function (a::AlphaDropout)(x)
_isactive(a) || return x
λ = eltype(x)(1.0507009873554804934193349852946)
α = eltype(x)(1.6732632423543772848170429916717)
α1 = eltype(x)(-λ*α)
noise = randn(eltype(x), size(x))
x = @. x*(noise > (1 - a.p)) + α1 * (noise < (1 - a.p))
A = (a.p + a.p * (1 - a.p) * α1 ^ 2)^0.5
B = -A * α1 * (1 - a.p)
x = @. A * x + B
return x
end
testmode!(m::AlphaDropout, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
"""
LayerNorm(h::Integer)
A [normalisation layer](https://arxiv.org/pdf/1607.06450.pdf) designed to be
used with recurrent hidden states of size `h`. Normalises the mean and standard
deviation of each input before applying a per-neuron gain/bias.
"""
struct LayerNorm{T}
diag::Diagonal{T}
end
LayerNorm(h::Integer) =
LayerNorm(Diagonal(h))
@functor LayerNorm
(a::LayerNorm)(x) = a.diag(normalise(x))
function Base.show(io::IO, l::LayerNorm)
print(io, "LayerNorm(", length(l.diag.α), ")")
end
"""
BatchNorm(channels::Integer, σ = identity;
initβ = zeros, initγ = ones,
ϵ = 1e-8, momentum = .1)
[Batch Normalization](https://arxiv.org/pdf/1502.03167.pdf) layer.
`channels` should be the size of the channel dimension in your data (see below).
Given an array with `N` dimensions, call the `N-1`th the channel dimension. (For
a batch of feature vectors this is just the data dimension, for `WHCN` images
it's the usual channel dimension.)
`BatchNorm` computes the mean and variance for each each `W×H×1×N` slice and
shifts them to have a new mean and variance (corresponding to the learnable,
per-channel `bias` and `scale` parameters).
Use [`testmode!`](@ref) during inference.
# Examples
```julia
m = Chain(
Dense(28^2, 64),
BatchNorm(64, relu),
Dense(64, 10),
BatchNorm(10),
softmax)
```
"""
mutable struct BatchNorm{F,V,W,N}
λ::F # activation function
β::V # bias
γ::V # scale
μ::W # moving mean
σ²::W # moving std
ϵ::N
momentum::N
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
BatchNorm(λ, β, γ, μ, σ², ϵ, momentum) = BatchNorm(λ, β, γ, μ, σ², ϵ, momentum, nothing)
BatchNorm(chs::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
BatchNorm(λ, initβ(chs), initγ(chs),
zeros(chs), ones(chs), ϵ, momentum, nothing)
trainable(bn::BatchNorm) = (bn.β, bn.γ)
function (BN::BatchNorm)(x)
size(x, ndims(x)-1) == length(BN.β) ||
error("BatchNorm expected $(length(BN.β)) channels, got $(size(x, ndims(x)-1))")
dims = length(size(x))
channels = size(x, dims-1)
affine_shape = ntuple(i->i == ndims(x) - 1 ? size(x, i) : 1, ndims(x))
m = div(prod(size(x)), channels)
γ = reshape(BN.γ, affine_shape...)
β = reshape(BN.β, affine_shape...)
if !_isactive(BN)
μ = reshape(BN.μ, affine_shape...)
σ² = reshape(BN.σ², affine_shape...)
ϵ = BN.ϵ
else
T = eltype(x)
axes = [1:dims-2; dims] # axes to reduce along (all but channels axis)
μ = mean(x, dims = axes)
σ² = sum((x .- μ) .^ 2, dims = axes) ./ m
ϵ = convert(T, BN.ϵ)
# update moving mean/std
mtm = BN.momentum
S = eltype(BN.μ)
BN.μ = (1 - mtm) .* BN.μ .+ mtm .* S.(reshape(μ, :))
BN.σ² = (1 - mtm) .* BN.σ² .+ (mtm * m / (m - 1)) .* S.(reshape(σ², :))
end
let λ = BN.λ
= (x .- μ) ./ sqrt.(σ² .+ ϵ)
λ.(γ .* .+ β)
end
end
@functor BatchNorm
testmode!(m::BatchNorm, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, l::BatchNorm)
print(io, "BatchNorm($(join(size(l.β), ", "))")
(l.λ == identity) || print(io, ", λ = $(l.λ)")
print(io, ")")
end
expand_inst = (x, as) -> reshape(repeat(x, outer=[1, as[length(as)]]), as...)
mutable struct InstanceNorm{F,V,W,N}
λ::F # activation function
β::V # bias
γ::V # scale
μ::W # moving mean
σ²::W # moving std
ϵ::N
momentum::N
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
"""
InstanceNorm(channels::Integer, σ = identity;
initβ = zeros, initγ = ones,
ϵ = 1e-8, momentum = .1)
[Instance Normalization](https://arxiv.org/abs/1607.08022) layer.
`channels` should be the size of the channel dimension in your data (see below).
Given an array with `N` dimensions, call the `N-1`th the channel dimension. (For
a batch of feature vectors this is just the data dimension, for `WHCN` images
it's the usual channel dimension.)
`InstanceNorm` computes the mean and variance for each each `W×H×1×1` slice and
shifts them to have a new mean and variance (corresponding to the learnable,
per-channel `bias` and `scale` parameters).
Use [`testmode!`](@ref) during inference.
# Examples
```julia
m = Chain(
Dense(28^2, 64),
InstanceNorm(64, relu),
Dense(64, 10),
InstanceNorm(10),
softmax)
```
"""
InstanceNorm(λ, β, γ, μ, σ², ϵ, momentum) = InstanceNorm(λ, β, γ, μ, σ², ϵ, momentum, nothing)
InstanceNorm(chs::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
InstanceNorm(λ, initβ(chs), initγ(chs),
zeros(chs), ones(chs), ϵ, momentum, nothing)
trainable(in::InstanceNorm) = (in.β, in.γ)
function (in::InstanceNorm)(x)
size(x, ndims(x)-1) == length(in.β) ||
error("InstanceNorm expected $(length(in.β)) channels, got $(size(x, ndims(x)-1))")
ndims(x) > 2 ||
error("InstanceNorm requires at least 3 dimensions. With 2 dimensions an array of zeros would be returned")
# these are repeated later on depending on the batch size
dims = length(size(x))
c = size(x, dims-1)
bs = size(x, dims)
affine_shape = ntuple(i->i == ndims(x) - 1 || i == ndims(x) ? size(x, i) : 1, ndims(x))
m = div(prod(size(x)), c*bs)
γ, β = expand_inst(in.γ, affine_shape), expand_inst(in.β, affine_shape)
if !_isactive(in)
μ = expand_inst(in.μ, affine_shape)
σ² = expand_inst(in.σ², affine_shape)
ϵ = in.ϵ
else
T = eltype(x)
ϵ = convert(T, in.ϵ)
axes = 1:dims-2 # axes to reduce along (all but channels and batch size axes)
μ = mean(x, dims = axes)
σ² = mean((x .- μ) .^ 2, dims = axes)
S = eltype(in.μ)
# update moving mean/std
mtm = in.momentum
in.μ = dropdims(mean(repeat((1 - mtm) .* in.μ, outer=[1, bs]) .+ mtm .* S.(reshape(μ, (c, bs))), dims = 2), dims=2)
in.σ² = dropdims(mean((repeat((1 - mtm) .* in.σ², outer=[1, bs]) .+ (mtm * m / (m - 1)) .* S.(reshape(σ², (c, bs)))), dims = 2), dims=2)
end
let λ = in.λ
= (x .- μ) ./ sqrt.(σ² .+ ϵ)
λ.(γ .* .+ β)
end
end
@functor InstanceNorm
testmode!(m::InstanceNorm, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, l::InstanceNorm)
print(io, "InstanceNorm($(join(size(l.β), ", "))")
(l.λ == identity) || print(io, ", λ = $(l.λ)")
print(io, ")")
end
"""
GroupNorm(chs::Integer, G::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i),
ϵ = 1f-5, momentum = 0.1f0)
[Group Normalization](https://arxiv.org/pdf/1803.08494.pdf) layer.
This layer can outperform Batch Normalization and Instance Normalization.
`chs` is the number of channels, the channel dimension of your input.
For an array of N dimensions, the `N-1`th index is the channel dimension.
`G` is the number of groups along which the statistics are computed.
The number of channels must be an integer multiple of the number of groups.
Use [`testmode!`](@ref) during inference.
# Examples
```julia
m = Chain(Conv((3,3), 1=>32, leakyrelu;pad = 1),
GroupNorm(32,16))
# 32 channels, 16 groups (G = 16), thus 2 channels per group used
```
"""
mutable struct GroupNorm{F,V,W,N,T}
G::T # number of groups
λ::F # activation function
β::V # bias
γ::V # scale
μ::W # moving mean
σ²::W # moving std
ϵ::N
momentum::N
active::Union{Bool, Nothing}
end
# TODO: deprecate in v0.11
GroupNorm(G, λ, β, γ, μ, σ², ϵ, momentum) = GroupNorm(G, λ, β, γ, μ, σ², ϵ, momentum, nothing)
GroupNorm(chs::Integer, G::Integer, λ = identity;
initβ = (i) -> zeros(Float32, i), initγ = (i) -> ones(Float32, i), ϵ = 1f-5, momentum = 0.1f0) =
GroupNorm(G, λ, initβ(chs), initγ(chs),
zeros(G,1), ones(G,1), ϵ, momentum, nothing)
trainable(gn::GroupNorm) = (gn.β, gn.γ)
function(gn::GroupNorm)(x)
size(x,ndims(x)-1) == length(gn.β) || error("Group Norm expected $(length(gn.β)) channels, but got $(size(x,ndims(x)-1)) channels")
ndims(x) > 2 || error("Need to pass at least 3 channels for Group Norm to work")
(size(x,ndims(x) -1))%gn.G == 0 || error("The number of groups ($(gn.G)) must divide the number of channels ($(size(x,ndims(x) -1)))")
dims = length(size(x))
groups = gn.G
channels = size(x, dims-1)
batches = size(x,dims)
channels_per_group = div(channels,groups)
affine_shape = ntuple(i->i == ndims(x) - 1 ? size(x, i) : 1, ndims(x))
# Output reshaped to (W,H...,C/G,G,N)
μ_affine_shape = ntuple(i->i == ndims(x) ? groups : 1, ndims(x) + 1)
m = prod(size(x)[1:end-2]) * channels_per_group
γ = reshape(gn.γ, affine_shape...)
β = reshape(gn.β, affine_shape...)
y = reshape(x,((size(x))[1:end-2]...,channels_per_group,groups,batches))
if !_isactive(gn)
og_shape = size(x)
μ = reshape(gn.μ, μ_affine_shape...) # Shape : (1,1,...C/G,G,1)
σ² = reshape(gn.σ², μ_affine_shape...) # Shape : (1,1,...C/G,G,1)
ϵ = gn.ϵ
else
T = eltype(x)
og_shape = size(x)
axes = [(1:ndims(y)-2)...] # axes to reduce along (all but channels axis)
μ = mean(y, dims = axes)
σ² = mean((y .- μ) .^ 2, dims = axes)
ϵ = convert(T, gn.ϵ)
# update moving mean/std
mtm = gn.momentum
S = eltype(gn.μ)
gn.μ = mean((1 - mtm) .* gn.μ .+ mtm .* S.(reshape(μ, (groups,batches))),dims=2)
gn.σ² = mean((1 - mtm) .* gn.σ² .+ (mtm * m / (m - 1)) .* S.(reshape(σ², (groups,batches))),dims=2)
end
let λ = gn.λ
= (y .- μ) ./ sqrt.(σ² .+ ϵ)
# Reshape x̂
= reshape(,og_shape)
λ.(γ .* .+ β)
end
end
@functor GroupNorm
testmode!(m::GroupNorm, mode = true) =
(m.active = (isnothing(mode) || mode == :auto) ? nothing : !mode; m)
function Base.show(io::IO, l::GroupNorm)
print(io, "GroupNorm($(join(size(l.β), ", "))")
(l.λ == identity) || print(io, ", λ = $(l.λ)")
print(io, ")")
end

View File

@ -1,14 +1,36 @@
# TODO: broadcasting cat
combine(x, h) = vcat(x, h .* trues(1, size(x, 2)))
gate(h, n) = (1:h) .+ h*(n-1)
gate(x::AbstractVector, h, n) = @view x[gate(h,n)]
gate(x::AbstractMatrix, h, n) = x[gate(h,n),:]
# Stateful recurrence
"""
Recur(cell)
`Recur` takes a recurrent cell and makes it stateful, managing the hidden state
in the background. `cell` should be a model of the form:
h, y = cell(h, x...)
For example, here's a recurrent network that keeps a running total of its inputs:
```julia
accum(h, x) = (h + x, x)
rnn = Flux.Recur(accum, 0)
rnn(2) # 2
rnn(3) # 3
rnn.state # 5
rnn.(1:10) # apply to a sequence
rnn.state # 60
```
"""
mutable struct Recur{T}
cell::T
init
state
end
Recur(m) = Recur(m, hidden(m))
Recur(m, h = hidden(m)) = Recur(m, h, h)
function (m::Recur)(xs...)
h, y = m.cell(m.state, xs...)
@ -16,79 +38,149 @@ function (m::Recur)(xs...)
return y
end
treelike(Recur)
@functor Recur cell, init
Base.show(io::IO, m::Recur) = print(io, "Recur(", m.cell, ")")
_truncate(x::AbstractArray) = x
_truncate(x::TrackedArray) = x.data
_truncate(x::Tuple) = _truncate.(x)
"""
reset!(rnn)
truncate!(m) = foreach(truncate!, children(m))
truncate!(m::Recur) = (m.state = _truncate(m.state))
Reset the hidden state of a recurrent layer back to its original value.
Assuming you have a `Recur` layer `rnn`, this is roughly equivalent to:
```julia
rnn.state = hidden(rnn.cell)
```
"""
reset!(m::Recur) = (m.state = m.init)
reset!(m) = foreach(reset!, functor(m)[1])
flip(f, xs) = reverse(f.(reverse(xs)))
# Vanilla RNN
struct RNNCell{D,V}
d::D
mutable struct RNNCell{F,A,V}
σ::F
Wi::A
Wh::A
b::V
h::V
end
RNNCell(in::Integer, out::Integer, σ = tanh; init = initn) =
RNNCell(Dense(in+out, out, σ, init = init), param(init(out)))
RNNCell(in::Integer, out::Integer, σ = tanh;
init = glorot_uniform) =
RNNCell(σ, init(out, in), init(out, out),
init(out), zeros(out))
function (m::RNNCell)(h, x)
h = m.d(combine(x, h))
σ, Wi, Wh, b = m.σ, m.Wi, m.Wh, m.b
h = σ.(Wi*x .+ Wh*h .+ b)
return h, h
end
hidden(m::RNNCell) = m.h
treelike(RNNCell)
@functor RNNCell
function Base.show(io::IO, m::RNNCell)
print(io, "RNNCell(", m.d, ")")
function Base.show(io::IO, l::RNNCell)
print(io, "RNNCell(", size(l.Wi, 2), ", ", size(l.Wi, 1))
l.σ == identity || print(io, ", ", l.σ)
print(io, ")")
end
"""
RNN(in::Integer, out::Integer, σ = tanh)
The most basic recurrent layer; essentially acts as a `Dense` layer, but with the
output fed back into the input each time step.
"""
RNN(a...; ka...) = Recur(RNNCell(a...; ka...))
# LSTM
struct LSTMCell{D1,D2,V}
forget::D1
input::D1
output::D1
cell::D2
h::V; c::V
mutable struct LSTMCell{A,V}
Wi::A
Wh::A
b::V
h::V
c::V
end
function LSTMCell(in, out; init = initn)
cell = LSTMCell([Dense(in+out, out, σ, init = init) for _ = 1:3]...,
Dense(in+out, out, tanh, init = init),
param(init(out)), param(init(out)))
cell.forget.b.data .= 1
function LSTMCell(in::Integer, out::Integer;
init = glorot_uniform)
cell = LSTMCell(init(out * 4, in), init(out * 4, out), init(out * 4),
zeros(out), zeros(out))
cell.b[gate(out, 2)] .= 1
return cell
end
function (m::LSTMCell)(h_, x)
h, c = h_
x = combine(x, h)
forget, input, output, cell =
m.forget(x), m.input(x), m.output(x), m.cell(x)
function (m::LSTMCell)((h, c), x)
b, o = m.b, size(h, 1)
g = m.Wi*x .+ m.Wh*h .+ b
input = σ.(gate(g, o, 1))
forget = σ.(gate(g, o, 2))
cell = tanh.(gate(g, o, 3))
output = σ.(gate(g, o, 4))
c = forget .* c .+ input .* cell
h = output .* tanh.(c)
return (h, c), h
h = output .* tanh.(c)
return (h, c), h
end
hidden(m::LSTMCell) = (m.h, m.c)
treelike(LSTMCell)
@functor LSTMCell
Base.show(io::IO, m::LSTMCell) =
print(io, "LSTMCell(",
size(m.forget.W, 2) - size(m.forget.W, 1), ", ",
size(m.forget.W, 1), ')')
Base.show(io::IO, l::LSTMCell) =
print(io, "LSTMCell(", size(l.Wi, 2), ", ", size(l.Wi, 1)÷4, ")")
"""
LSTM(in::Integer, out::Integer)
[Long Short Term Memory](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory)
recurrent layer. Behaves like an RNN but generally exhibits a longer memory span over sequences.
See [this article](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
for a good overview of the internals.
"""
LSTM(a...; ka...) = Recur(LSTMCell(a...; ka...))
# GRU
mutable struct GRUCell{A,V}
Wi::A
Wh::A
b::V
h::V
end
GRUCell(in, out; init = glorot_uniform) =
GRUCell(init(out * 3, in), init(out * 3, out),
init(out * 3), zeros(out))
function (m::GRUCell)(h, x)
b, o = m.b, size(h, 1)
gx, gh = m.Wi*x, m.Wh*h
r = σ.(gate(gx, o, 1) .+ gate(gh, o, 1) .+ gate(b, o, 1))
z = σ.(gate(gx, o, 2) .+ gate(gh, o, 2) .+ gate(b, o, 2))
= tanh.(gate(gx, o, 3) .+ r .* gate(gh, o, 3) .+ gate(b, o, 3))
h = (1 .- z).* .+ z.*h
return h, h
end
hidden(m::GRUCell) = m.h
@functor GRUCell
Base.show(io::IO, l::GRUCell) =
print(io, "GRUCell(", size(l.Wi, 2), ", ", size(l.Wi, 1)÷3, ")")
"""
GRU(in::Integer, out::Integer)
[Gated Recurrent Unit](https://arxiv.org/abs/1406.1078) layer. Behaves like an
RNN but generally exhibits a longer memory span over sequences.
See [this article](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
for a good overview of the internals.
"""
GRU(a...; ka...) = Recur(GRUCell(a...; ka...))

View File

@ -1,23 +0,0 @@
mutable struct Softmax{T,N,A} <: AbstractArray{T,N}
logits::A
probs::A
Softmax{T,N,A}(logits::A) where {T,N,A} = new(logits)
end
Softmax(logits::AbstractVecOrMat{<:AbstractFloat}) =
Softmax{eltype(logits),ndims(logits),typeof(logits)}(logits)
@forward Softmax.logits Base.size
Base.IndexStyle(::Type{Softmax{T,N,A}}) where {T,N,A} = IndexStyle(A)
function Base.getindex(s::Softmax, i)
isdefined(s, :probs) || (s.probs = NNlib.softmax(s.logits))
Tracker.data(s.probs)[i]
end
softmax(xs::AbstractVecOrMat{<:AbstractFloat}) = Softmax(xs)
softmax(xs::AbstractVecOrMat{<:Real}) = softmax(convert.(AbstractFloat, xs))
softmax(xs::TrackedArray) = TrackedArray(Tracker.Call(NNlib.softmax, xs), Softmax(xs))

View File

@ -1,17 +1,296 @@
# Cost functions
"""
mae(, y)
mse(, y) = sum(( .- y).^2)/length(y)
Return the mean of absolute error; calculated as
`sum(abs.(ŷ .- y)) / length(y)`.
"""
mae(, y) = sum(abs.( .- y)) * 1 // length(y)
crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat) =
-sum(y .* log.()) / size(y, 2)
@deprecate logloss(x, y) crossentropy(x, y)
"""
mse(, y)
function logitcrossentropy(logŷ::AbstractVecOrMat, y::AbstractVecOrMat)
logŷ = logŷ .- maximum(logŷ, 1)
ypred = logŷ .- log.(sum(exp.(logŷ), 1))
-sum(y .* ypred) / size(y, 2)
Return the mean squared error between and y; calculated as
`sum((ŷ .- y).^2) / length(y)`.
# Examples
```jldoctest
julia> Flux.mse([0, 2], [1, 1])
1//1
```
"""
mse(, y) = sum(( .- y).^2) * 1 // length(y)
"""
msle(, y; ϵ=eps(eltype()))
Return the mean of the squared logarithmic errors; calculated as
`sum((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2) / length(y)`.
The `ϵ` term provides numerical stability.
Penalizes an under-predicted estimate greater than an over-predicted estimate.
"""
msle(, y; ϵ=eps(eltype())) = sum((log.( .+ ϵ) .- log.(y .+ ϵ)).^2) * 1 // length(y)
"""
huber_loss(, y; δ=1.0)
Return the mean of the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss)
given the prediction `` and true values `y`.
| 0.5 * | - y|, for | - y| <= δ
Huber loss = |
| δ * (| - y| - 0.5 * δ), otherwise
"""
#TODO: remove dropgrad when Zygote can handle this function with CuArrays
function huber_loss(, y; δ=eltype()(1))
abs_error = abs.( .- y)
temp = Zygote.dropgrad(abs_error .< δ)
x = eltype()(0.5)
hub_loss = sum(((abs_error.^2) .* temp) .* x .+ δ*(abs_error .- x*δ) .* (1 .- temp)) * 1 // length(y)
end
crossentropy(::Union{Softmax,TrackedArray{<:Softmax}}, y::AbstractVecOrMat) =
logitcrossentropy(Tracker.data().logits, y)
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::Nothing)
return -sum(xlogy.(y, )) * 1 // size(y, 2)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::Number)
return -sum(xlogy.(y, )) .* weight * 1 // size(y, 2)
end
function _crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat, weight::AbstractVector)
return -sum(xlogy.(y, ) .* weight) * 1 // size(y, 2)
end
"""
crossentropy(, y; weight = nothing)
Return the cross entropy between the given probability distributions;
calculated as `-sum(y .* log.(ŷ) .* weight) / size(y, 2)`.
`weight` can be `Nothing`, a `Number` or an `AbstractVector`.
`weight=nothing` acts like `weight=1` but is faster.
See also: [`Flux.logitcrossentropy`](@ref), [`Flux.binarycrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.crossentropy(softmax([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
3.085467254747739
```
"""
crossentropy(::AbstractVecOrMat, y::AbstractVecOrMat; weight=nothing) = _crossentropy(, y, weight)
"""
logitcrossentropy(, y; weight = 1)
Return the crossentropy computed after a [`Flux.logsoftmax`](@ref) operation;
calculated as `-sum(y .* logsoftmax(ŷ) .* weight) / size(y, 2)`.
`logitcrossentropy(ŷ, y)` is mathematically equivalent to
[`Flux.crossentropy(softmax(ŷ), y)`](@ref) but it is more numerically stable.
See also: [`Flux.crossentropy`](@ref), [`Flux.binarycrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.logitcrossentropy([-1.1491, 0.8619, 0.3127], [1, 1, 0])
3.085467254747738
```
"""
function logitcrossentropy(::AbstractVecOrMat, y::AbstractVecOrMat; weight = 1)
return -sum(y .* logsoftmax() .* weight) * 1 // size(y, 2)
end
"""
binarycrossentropy(, y; ϵ=eps())
Return ``-y*\\log( + ϵ) - (1-y)*\\log(1- + ϵ)``. The `ϵ` term provides numerical stability.
Typically, the prediction `` is given by the output of a [`sigmoid`](@ref) activation.
See also: [`Flux.crossentropy`](@ref), [`Flux.logitcrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.binarycrossentropy.(σ.([-1.1491, 0.8619, 0.3127]), [1, 1, 0])
3-element Array{Float64,1}:
1.424397097347566
0.35231664672364077
0.8616703662235441
```
"""
binarycrossentropy(, y; ϵ=eps()) = -xlogy(y, + ϵ) - xlogy(1 - y, 1 - + ϵ)
# Re-definition to fix interaction with CuArrays.
CuArrays.@cufunc binarycrossentropy(, y; ϵ=eps()) = -y*log( + ϵ) - (1 - y)*log(1 - + ϵ)
"""
logitbinarycrossentropy(ŷ, y)
`logitbinarycrossentropy(ŷ, y)` is mathematically equivalent to
[`Flux.binarycrossentropy(σ(ŷ), y)`](@ref) but it is more numerically stable.
See also: [`Flux.crossentropy`](@ref), [`Flux.logitcrossentropy`](@ref), [`Flux.binarycrossentropy`](@ref)
# Examples
```jldoctest
julia> Flux.logitbinarycrossentropy.([-1.1491, 0.8619, 0.3127], [1, 1, 0])
3-element Array{Float64,1}:
1.4243970973475661
0.35231664672364094
0.8616703662235443
```
"""
logitbinarycrossentropy(ŷ, y) = (1 - y)*ŷ - logσ()
# Re-definition to fix interaction with CuArrays.
CuArrays.@cufunc logitbinarycrossentropy(ŷ, y) = (1 - y)*ŷ - logσ()
"""
normalise(x; dims=1)
Normalise `x` to mean 0 and standard deviation 1 across the dimensions given by `dims`.
Defaults to normalising over columns.
```jldoctest
julia> a = reshape(collect(1:9), 3, 3)
3×3 Array{Int64,2}:
1 4 7
2 5 8
3 6 9
julia> Flux.normalise(a)
3×3 Array{Float64,2}:
-1.22474 -1.22474 -1.22474
0.0 0.0 0.0
1.22474 1.22474 1.22474
julia> Flux.normalise(a, dims=2)
3×3 Array{Float64,2}:
-1.22474 0.0 1.22474
-1.22474 0.0 1.22474
-1.22474 0.0 1.22474
```
"""
function normalise(x::AbstractArray; dims=1)
μ′ = mean(x, dims = dims)
σ = std(x, dims = dims, mean = μ′, corrected=false)
return (x .- μ′) ./ σ
end
"""
kldivergence(, y)
Return the
[Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
between the given probability distributions.
KL divergence is a measure of how much one probability distribution is different
from the other.
It is always non-negative and zero only when both the distributions are equal
everywhere.
"""
function kldivergence(, y)
entropy = sum(xlogx.(y)) * 1 //size(y,2)
cross_entropy = crossentropy(, y)
return entropy + cross_entropy
end
"""
poisson(, y)
Return how much the predicted distribution `` diverges from the expected Poisson
distribution `y`; calculated as `sum(ŷ .- y .* log.(ŷ)) / size(y, 2)`.
[More information.](https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/poisson).
"""
poisson(, y) = sum( .- xlogy.(y, )) * 1 // size(y,2)
"""
hinge(, y)
Return the [hinge loss](https://en.wikipedia.org/wiki/Hinge_loss) given the
prediction `` and true labels `y` (containing 1 or -1); calculated as
`sum(max.(0, 1 .- ŷ .* y)) / size(y, 2)`.
See also: [`squared_hinge`](@ref)
"""
hinge(, y) = sum(max.(0, 1 .- .* y)) * 1 // size(y, 2)
"""
squared_hinge(, y)
Return the squared hinge loss given the prediction `` and true labels `y`
(containing 1 or -1); calculated as `sum((max.(0, 1 .- ŷ .* y)).^2) / size(y, 2)`.
See also: [`hinge`](@ref)
"""
squared_hinge(, y) = sum((max.(0, 1 .- .* y)).^2) * 1 // size(y, 2)
"""
dice_coeff_loss(, y; smooth=1)
Return a loss based on the dice coefficient.
Used in the [V-Net](https://arxiv.org/pdf/1606.04797v1.pdf) image segmentation
architecture.
Similar to the F1_score. Calculated as:
1 - 2*sum(| .* y| + smooth) / (sum(.^2) + sum(y.^2) + smooth)`
"""
dice_coeff_loss(, y; smooth=eltype()(1.0)) = 1 - (2*sum(y .* ) + smooth) / (sum(y.^2) + sum(.^2) + smooth)
"""
tversky_loss(, y; β=0.7)
Return the [Tversky loss](https://arxiv.org/pdf/1706.05721.pdf).
Used with imbalanced data to give more weight to false negatives.
Larger β weigh recall higher than precision (by placing more emphasis on false negatives)
Calculated as:
1 - sum(|y .* | + 1) / (sum(y .* + β*(1 .- y) .* + (1 - β)*y .* (1 .- )) + 1)
"""
tversky_loss(, y; β=eltype()(0.7)) = 1 - (sum(y .* ) + 1) / (sum(y .* + β*(1 .- y) .* + (1 - β)*y .* (1 .- )) + 1)
"""
flatten(x::AbstractArray)
Transform (w, h, c, b)-shaped input into (w × h × c, b)-shaped output
by linearizing all values for each element in the batch.
"""
function flatten(x::AbstractArray)
return reshape(x, :, size(x)[end])
end
"""
xlogx(x)
Return `x * log(x)` for `x ≥ 0`, handling `x = 0` by taking the downward limit.
"""
function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
CuArrays.@cufunc function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
"""
xlogy(x, y)
Return `x * log(y)` for `y > 0` with correct limit at `x = 0`.
"""
function xlogy(x, y)
result = x * log(y)
ifelse(iszero(x), zero(result), result)
end
CuArrays.@cufunc function xlogy(x, y)
result = x * log(y)
ifelse(iszero(x), zero(result), result)
end
@adjoint function broadcasted(::typeof(xlogy), x::Zygote.Numeric, y::Zygote.Numeric)
res = xlogy.(x, y)
res, Δ -> (nothing, Zygote.unbroadcast(x, xlogy.(Δ, y)), Zygote.unbroadcast(y, Δ .* x ./ y))
end

View File

@ -1,3 +1,5 @@
import Base: *
struct OneHotVector <: AbstractVector{Bool}
ix::UInt32
of::UInt32
@ -7,7 +9,9 @@ Base.size(xs::OneHotVector) = (Int64(xs.of),)
Base.getindex(xs::OneHotVector, i::Integer) = i == xs.ix
Base.:*(A::AbstractMatrix, b::OneHotVector) = A[:, b.ix]
Base.getindex(xs::OneHotVector, ::Colon) = OneHotVector(xs.ix, xs.of)
A::AbstractMatrix * b::OneHotVector = A[:, b.ix]
struct OneHotMatrix{A<:AbstractVector{OneHotVector}} <: AbstractMatrix{Bool}
height::Int
@ -16,34 +20,106 @@ end
Base.size(xs::OneHotMatrix) = (Int64(xs.height),length(xs.data))
Base.getindex(xs::OneHotMatrix, i::Int, j::Int) = xs.data[j][i]
Base.getindex(xs::OneHotMatrix, i::Union{Integer, AbstractVector}, j::Integer) = xs.data[j][i]
Base.getindex(xs::OneHotMatrix, ::Colon, i::Integer) = xs.data[i]
Base.getindex(xs::OneHotMatrix, ::Colon, i::AbstractArray) = OneHotMatrix(xs.height, xs.data[i])
Base.getindex(xs::OneHotMatrix, ::Colon, ::Colon) = OneHotMatrix(xs.height, copy(xs.data))
Base.:*(A::AbstractMatrix, B::OneHotMatrix) = A[:, map(x->x.ix, B.data)]
Base.getindex(xs::OneHotMatrix, i::Integer, ::Colon) = map(x -> x[i], xs.data)
# remove workaround when https://github.com/JuliaGPU/CuArrays.jl/issues/676 is fixed
A::AbstractMatrix * B::OneHotMatrix = A[:, cpu(map(x->x.ix, B.data))]
Base.hcat(x::OneHotVector, xs::OneHotVector...) = OneHotMatrix(length(x), [x, xs...])
batch(xs::AbstractArray{<:OneHotVector}) = OneHotMatrix(length(first(xs)), xs)
import NNlib.adapt
import Adapt: adapt, adapt_structure
adapt(T, xs::OneHotMatrix) = OneHotMatrix(xs.height, adapt(T, xs.data))
adapt_structure(T, xs::OneHotMatrix) = OneHotMatrix(xs.height, adapt(T, xs.data))
@require CuArrays begin
import CuArrays: CuArray, cudaconvert
Base.Broadcast._containertype(::Type{<:OneHotMatrix{<:CuArray}}) = CuArray
cudaconvert(x::OneHotMatrix{<:CuArray}) = OneHotMatrix(x.height, cudaconvert(x.data))
end
import .CuArrays: CuArray, CuArrayStyle, cudaconvert
import Base.Broadcast: BroadcastStyle, ArrayStyle
BroadcastStyle(::Type{<:OneHotMatrix{<:CuArray}}) = CuArrayStyle{2}()
cudaconvert(x::OneHotMatrix{<:CuArray}) = OneHotMatrix(x.height, cudaconvert(x.data))
"""
onehot(l, labels[, unk])
Create a `OneHotVector` with its `l`-th element `true` based on the
possible set of `labels`.
If `unk` is given, return `onehot(unk, labels)` if the input label `l` is not found
in `labels`; otherwise, it will raise an error.
# Examples
```jldoctest
julia> Flux.onehot(:b, [:a, :b, :c])
3-element Flux.OneHotVector:
0
1
0
julia> Flux.onehot(:c, [:a, :b, :c])
3-element Flux.OneHotVector:
0
0
1
```
"""
function onehot(l, labels)
i = findfirst(labels, l)
i = something(findfirst(isequal(l), labels), 0)
i > 0 || error("Value $l is not in labels")
OneHotVector(i, length(labels))
end
onehotbatch(ls, labels) = OneHotMatrix(length(labels), [onehot(l, labels) for l in ls])
function onehot(l, labels, unk)
i = something(findfirst(isequal(l), labels), 0)
i > 0 || return onehot(unk, labels)
OneHotVector(i, length(labels))
end
argmax(y::AbstractVector, labels = 1:length(y)) =
labels[findfirst(y, maximum(y))]
"""
onehotbatch(ls, labels[, unk...])
argmax(y::AbstractMatrix, l...) =
squeeze(mapslices(y -> argmax(y, l...), y, 1), 1)
Create a `OneHotMatrix` with a batch of labels based on the
possible set of `labels`.
If `unk` is given, return [`onehot(unk, labels)`](@ref) if one of the input
labels `ls` is not found in `labels`; otherwise it will error.
# Examples
```jldoctest
julia> Flux.onehotbatch([:b, :a, :b], [:a, :b, :c])
3×3 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
0 1 0
1 0 1
0 0 0
```
"""
onehotbatch(ls, labels, unk...) =
OneHotMatrix(length(labels), [onehot(l, labels, unk...) for l in ls])
Base.argmax(xs::OneHotVector) = xs.ix
"""
onecold(y[, labels = 1:length(y)])
Inverse operations of [`onehot`](@ref).
# Examples
```jldoctest
julia> Flux.onecold([true, false, false], [:a, :b, :c])
:a
julia> Flux.onecold([0.3, 0.2, 0.5], [:a, :b, :c])
:c
```
"""
onecold(y::AbstractVector, labels = 1:length(y)) = labels[Base.argmax(y)]
onecold(y::AbstractMatrix, labels...) =
dropdims(mapslices(y -> onecold(y, labels...), y, dims=1), dims=1)
onecold(y::OneHotMatrix, labels...) =
mapreduce(x -> Flux.onecold(x, labels...), |, y.data, dims = 2, init = 0)
@nograd onecold, onehot, onehotbatch

View File

@ -1,21 +1,14 @@
module Optimise
export update!, params, train!,
SGD, ADAM, Momentum, Nesterov, RMSProp, ADAGrad, ADADelta
using LinearAlgebra
struct Param{T}
x::T
Δ::T
end
Base.convert(::Type{Param}, x::AbstractArray) = Param(x, zeros(x))
export train!, update!,
Descent, ADAM, Momentum, Nesterov, RMSProp,
ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, ADAMW,RADAM,
InvDecay, ExpDecay, WeightDecay, stop, Optimiser,
ClipValue, ClipNorm
include("optimisers.jl")
include("interface.jl")
include("train.jl")
using Flux.Tracker: TrackedArray
Base.convert(::Type{Param}, x::TrackedArray) = Param(x.data, x.grad[])
end

View File

@ -1,18 +0,0 @@
call(f, xs...) = f(xs...)
function optimiser(ps, fs...)
ps = [Param(p) for p in ps]
fs = map(ps) do p
os = map(f -> f(p), fs)
() -> foreach(call, os)
end
() -> foreach(call, fs)
end
SGD(ps, η = 1) = optimiser(ps, p -> descent(p, η))
ADAM(ps, η = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 1e-08, decay = 0.0) = optimiser(ps, p -> adam(p; η = η, β1 = β1, β2 = β2, ϵ = ϵ), p -> invdecay(p, decay), p -> descent(p, 1))
Momentum(ps,ρ, decay = 0.0) = optimiser(ps, p -> momentum(p, ρ), p -> invdecay(p, decay), p -> descent(p, 1))
Nesterov(ps,ρ, decay = 0.0) = optimiser(ps, p -> nesterov(p, ρ), p -> invdecay(p, decay), p -> descent(p, 1))
RMSProp(ps, η = 0.001, ρ = 0.9, ϵ = 1e-8, decay = 0.0) = optimiser(ps, p -> rmsprop(p; η = η, ρ = ρ, ϵ = ϵ), p -> invdecay(p, decay), p -> descent(p, 1))
ADAGrad(ps, η = 0.01, ϵ = 1e-8, decay = 0.0) = optimiser(ps, p -> adagrad(p; η = η, ϵ = ϵ), p -> invdecay(p, decay), p -> descent(p, 1))
ADADelta(ps, η = 0.01, ρ = 0.95, ϵ = 1e-8, decay = 0.0) = optimiser(ps, p -> adadelta(p; ρ = ρ, ϵ = ϵ), p -> invdecay(p, decay), p -> descent(p, 1))

View File

@ -1,74 +1,563 @@
function descent(p::Param, η::Real)
function ()
p.x .-= p.Δ .* η
p.Δ .= 0
using Flux
using MacroTools: @forward
const ϵ = 1e-8
# TODO: should use weak refs
"""
Descent(η = 0.1)
Classic gradient descent optimiser with learning rate `η`.
For each parameter `p` and its gradient `δp`, this runs `p -= η*δp`
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
# Examples
```julia
opt = Descent()
opt = Descent(0.3)
ps = params(model)
gs = gradient(ps) do
loss(x, y)
end
Flux.Optimise.update!(opt, ps, gs)
```
"""
mutable struct Descent
eta::Float64
end
Descent() = Descent(0.1)
function apply!(o::Descent, x, Δ)
Δ .*= o.eta
end
"""
Momentum(η = 0.01, ρ = 0.9)
Gradient descent optimizer with learning rate `η` and momentum `ρ`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Momentum (`ρ`): Controls the acceleration of gradient descent in the
prominent direction, in effect dampening oscillations.
# Examples
```julia
opt = Momentum()
opt = Momentum(0.01, 0.99)
```
"""
mutable struct Momentum
eta::Float64
rho::Float64
velocity::IdDict
end
Momentum(η = 0.01, ρ = 0.9) = Momentum(η, ρ, IdDict())
function apply!(o::Momentum, x, Δ)
η, ρ = o.eta, o.rho
v = get!(o.velocity, x, zero(x))::typeof(x)
@. v = ρ * v - η * Δ
@. Δ = -v
end
"""
Nesterov(η = 0.001, ρ = 0.9)
Gradient descent optimizer with learning rate `η` and Nesterov momentum `ρ`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Nesterov momentum (`ρ`): Controls the acceleration of gradient descent in the
prominent direction, in effect dampening oscillations.
# Examples
```julia
opt = Nesterov()
opt = Nesterov(0.003, 0.95)
```
"""
mutable struct Nesterov
eta::Float64
rho::Float64
velocity::IdDict
end
Nesterov(η = 0.001, ρ = 0.9) = Nesterov(η, ρ, IdDict())
function apply!(o::Nesterov, x, Δ)
η, ρ = o.eta, o.rho
v = get!(o.velocity, x, zero(x))::typeof(x)
d = @. ρ^2 * v - (1+ρ) * η * Δ
@. v = ρ*v - η*Δ
@. Δ = -d
end
"""
RMSProp(η = 0.001, ρ = 0.9)
Optimizer using the
[RMSProp](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
algorithm. Often a good choice for recurrent networks. Parameters other than learning rate
generally don't need tuning.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Momentum (`ρ`): Controls the acceleration of gradient descent in the
prominent direction, in effect dampening oscillations.
# Examples
```julia
opt = RMSProp()
opt = RMSProp(0.002, 0.95)
```
"""
mutable struct RMSProp
eta::Float64
rho::Float64
acc::IdDict
end
RMSProp(η = 0.001, ρ = 0.9) = RMSProp(η, ρ, IdDict())
function apply!(o::RMSProp, x, Δ)
η, ρ = o.eta, o.rho
acc = get!(o.acc, x, zero(x))::typeof(x)
@. acc = ρ * acc + (1 - ρ) * Δ^2
@. Δ *= η / (acc + ϵ)
end
"""
ADAM(η = 0.001, β::Tuple = (0.9, 0.999))
[ADAM](https://arxiv.org/abs/1412.6980v8) optimiser.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = ADAM()
opt = ADAM(0.001, (0.9, 0.8))
```
"""
mutable struct ADAM
eta::Float64
beta::Tuple{Float64,Float64}
state::IdDict
end
ADAM(η = 0.001, β = (0.9, 0.999)) = ADAM(η, β, IdDict())
function apply!(o::ADAM, x, Δ)
η, β = o.eta, o.beta
mt, vt, βp = get!(o.state, x, (zero(x), zero(x), β))
@. mt = β[1] * mt + (1 - β[1]) * Δ
@. vt = β[2] * vt + (1 - β[2]) * Δ^2
@. Δ = mt / (1 - βp[1]) / ((vt / (1 - βp[2])) + ϵ) * η
o.state[x] = (mt, vt, βp .* β)
return Δ
end
"""
RADAM(η = 0.001, β::Tuple = (0.9, 0.999))
[Rectified ADAM](https://arxiv.org/pdf/1908.03265v1.pdf) optimizer.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = RADAM()
opt = RADAM(0.001, (0.9, 0.8))
```
"""
mutable struct RADAM
eta::Float64
beta::Tuple{Float64,Float64}
state::IdDict
end
RADAM(η = 0.001, β = (0.9, 0.999)) = RADAM(η, β, IdDict())
function apply!(o::RADAM, x, Δ)
η, β = o.eta, o.beta
ρ∞ = 2/(1-β[2])-1
mt, vt, βp, t = get!(o.state, x, (zero(x), zero(x), β, 1))
@. mt = β[1] * mt + (1 - β[1]) * Δ
@. vt = β[2] * vt + (1 - β[2]) * Δ^2
ρ = ρ∞ - 2t*βp[2]/(1-βp[2])
if ρ > 4
r = sqrt((ρ-4)*(ρ-2)*ρ∞/((ρ∞-4)*(ρ∞-2)*ρ))
@. Δ = mt / (1 - βp[1]) / ((vt / (1 - βp[2])) + ϵ) * η * r
else
@. Δ = mt / (1 - βp[1]) * η
end
o.state[x] = (mt, vt, βp .* β, t+1)
return Δ
end
function momentum(p::Param, ρ::Real)
mo = zeros(p.x)
() -> p.Δ .= mo .= ρ .* mo .+ p.Δ
"""
AdaMax(η = 0.001, β::Tuple = (0.9, 0.999))
[AdaMax](https://arxiv.org/abs/1412.6980v9) is a variant of ADAM based on the -norm.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = AdaMax()
opt = AdaMax(0.001, (0.9, 0.995))
```
"""
mutable struct AdaMax
eta::Float64
beta::Tuple{Float64,Float64}
state::IdDict
end
function nesterov(p::Param, ρ::Real)
mo = zeros(p.x)
function ()
mo .= ρ .* mo .+ p.Δ
p.Δ .= ρ .* mo .+ p.Δ
AdaMax(η = 0.001, β = (0.9, 0.999)) = AdaMax(η, β, IdDict())
function apply!(o::AdaMax, x, Δ)
η, β = o.eta, o.beta
mt, ut, βp = get!(o.state, x, (zero(x), zero(x), β))
@. mt = β[1] * mt + (1 - β[1]) * Δ
@. ut = max(β[2] * ut, abs(Δ))
@. Δ = (η/(1 - βp[1])) * mt/(ut + ϵ)
o.state[x] = (mt, ut, βp .* β)
return Δ
end
"""
ADAGrad(η = 0.1)
[ADAGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) optimizer. It has
parameter specific learning rates based on how frequently it is updated.
Parameters don't need tuning.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
# Examples
```julia
opt = ADAGrad()
opt = ADAGrad(0.001)
```
"""
mutable struct ADAGrad
eta::Float64
acc::IdDict
end
ADAGrad(η = 0.1) = ADAGrad(η, IdDict())
function apply!(o::ADAGrad, x, Δ)
η = o.eta
acc = get!(o.acc, x, fill!(zero(x), ϵ))::typeof(x)
@. acc += Δ^2
@. Δ *= η / (acc + ϵ)
end
"""
ADADelta(ρ = 0.9)
[ADADelta](https://arxiv.org/abs/1212.5701) is a version of ADAGrad adapting its learning
rate based on a window of past gradient updates.
Parameters don't need tuning.
# Parameters
- Rho (`ρ`): Factor by which the gradient is decayed at each time step.
# Examples
```julia
opt = ADADelta()
opt = ADADelta(0.89)
```
"""
mutable struct ADADelta
rho::Float64
state::IdDict
end
ADADelta(ρ = 0.9) = ADADelta(ρ, IdDict())
function apply!(o::ADADelta, x, Δ)
ρ = o.rho
acc, Δacc = get!(o.state, x, (zero(x), zero(x)))
@. acc = ρ * acc + (1 - ρ) * Δ^2
@. Δ *= Δacc/ (acc + ϵ)
@. Δacc = ρ * Δacc + (1 - ρ) * Δ^2
return Δ
end
"""
AMSGrad(η = 0.001, β::Tuple = (0.9, 0.999))
The [AMSGrad](https://openreview.net/forum?id=ryQu7f-RZ) version of the ADAM
optimiser. Parameters don't need tuning.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = AMSGrad()
opt = AMSGrad(0.001, (0.89, 0.995))
```
"""
mutable struct AMSGrad
eta::Float64
beta::Tuple{Float64, Float64}
state::IdDict
end
AMSGrad(η = 0.001, β = (0.9, 0.999)) = AMSGrad(η, β, IdDict())
function apply!(o::AMSGrad, x, Δ)
η, β = o.eta, o.beta
mt, vt, v̂t = get!(o.state, x, (fill!(zero(x), ϵ), fill!(zero(x), ϵ), fill!(zero(x), ϵ)))
@. mt = β[1] * mt + (1 - β[1]) * Δ
@. vt = β[2] * vt + (1 - β[2]) * Δ ^ 2
@. v̂t = max(v̂t, vt)
@. Δ = η * mt / (v̂t + ϵ)
end
"""
NADAM(η = 0.001, β::Tuple = (0.9, 0.999))
[NADAM](http://cs229.stanford.edu/proj2015/054_report.pdf) is a Nesterov variant of ADAM.
Parameters don't need tuning.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
# Examples
```julia
opt = NADAM()
opt = NADAM(0.002, (0.89, 0.995))
```
"""
mutable struct NADAM
eta::Float64
beta::Tuple{Float64, Float64}
state::IdDict
end
NADAM(η = 0.001, β = (0.9, 0.999)) = NADAM(η, β, IdDict())
function apply!(o::NADAM, x, Δ)
η, β = o.eta, o.beta
mt, vt, (β1p, β2p) = get!(o.state, x, (zero(x), zero(x), o.beta))
@. mt = β[1] * mt + (1 - β[1]) * Δ
@. vt = β[2] * vt + (1 - β[2]) * Δ^2
@. Δ = (β[1] * mt / (1 - β[1] * β1p) + (1 - β[1]) * Δ / (1 - β1p)) / ((vt * β[2] / (1 - β2p)) + ϵ) * η
o.state[x] = (mt, vt, (β1p * β[1], β2p * β[2]))
return Δ
end
"""
ADAMW(η = 0.001, β::Tuple = (0.9, 0.999), decay = 0)
[ADAMW](https://arxiv.org/abs/1711.05101) is a variant of ADAM fixing (as in repairing) its
weight decay regularization.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- Decay of momentums (`β::Tuple`): Exponential decay for the first (β1) and the
second (β2) momentum estimate.
- `decay`: Decay applied to weights during optimisation.
# Examples
```julia
opt = ADAMW()
opt = ADAMW(0.001, (0.89, 0.995), 0.1)
```
"""
ADAMW(η = 0.001, β = (0.9, 0.999), decay = 0) =
Optimiser(ADAM(η, β), WeightDecay(decay))
# Compose optimizers
"""
Optimiser(a, b, c...)
Combine several optimisers into one; each optimiser produces a modified gradient
that will be fed into the next, and this is finally applied to the parameter as
usual.
"""
mutable struct Optimiser
os::Vector{Any}
end
Optimiser(o...) = Optimiser(Any[o...])
@forward Optimiser.os Base.getindex, Base.first, Base.last, Base.lastindex, Base.push!, Base.setindex!
@forward Optimiser.os Base.iterate
Base.getindex(c::Optimiser, i::AbstractArray) = Optimiser(c.os[i]...)
function apply!(o::Optimiser, x, Δ)
for opt in o.os
Δ = apply!(opt, x, Δ)
end
return Δ
end
function clip(p::Param, thresh::Real)
() -> clamp!(p.Δ, -thresh, thresh)
"""
InvDecay(γ = 0.001)
Apply inverse time decay to an optimiser, so that the effective step size at
iteration `n` is `eta / (1 + γ * n)` where `eta` is the initial step size.
The wrapped optimiser's step size is not modified.
# Examples
```julia
Optimiser(InvDecay(..), Opt(..))
```
"""
mutable struct InvDecay
gamma::Float64
state::IdDict
end
function weightdecay(p::Param, γ::Real)
() -> p.Δ .+= γ .* p.x
InvDecay(γ = 0.001) = InvDecay(γ, IdDict())
function apply!(o::InvDecay, x, Δ)
γ = o.gamma
n = get!(o.state, x, 1)
Δ .*= 1 / (1 + γ * n)
o.state[x] = n + 1
return Δ
end
function invdecay(p::Param, γ::Real)
n = 0
function ()
p.Δ .*= 1 / (1 + γ * n)
n += 1
"""
ExpDecay(η = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4)
Discount the learning rate `η` by the factor `decay` every `decay_step` steps till
a minimum of `clip`.
# Parameters
- Learning rate (`η`): Amount by which gradients are discounted before updating
the weights.
- `decay`: Factor by which the learning rate is discounted.
- `decay_step`: Schedule decay operations by setting the number of steps between
two decay operations.
- `clip`: Minimum value of learning rate.
# Examples
To apply exponential decay to an optimiser:
```julia
Optimiser(ExpDecay(..), Opt(..))
opt = Optimiser(ExpDecay(), ADAM())
```
"""
mutable struct ExpDecay
eta::Float64
decay::Float64
step::Int64
clip::Float64
current::IdDict
end
ExpDecay(opt = 0.001, decay = 0.1, decay_step = 1000, clip = 1e-4) = ExpDecay(opt, decay, decay_step, clip, IdDict())
function apply!(o::ExpDecay, x, Δ)
η, s, decay = o.eta, o.step, o.decay
n = o.current[x] = get(o.current, x, 0) + 1
if o.current[x]%s == 0 && count(x -> x%s == 0, values(o.current)) == 1
η = max(η * decay, o.clip)
o.eta = η
end
@. Δ *= η
end
function rmsprop(p::Param; η::Real = 0.001, ρ::Real = 0.9, ϵ::Real = 1e-8)
acc = zeros(p.x) .+ ϵ
function ()
@. acc = ρ * acc + (1 - ρ) * p.Δ ^ 2
@. p.Δ /= acc * η
end
"""
WeightDecay(wd = 0)
Decay weights by `wd`.
# Parameters
- Weight decay (`wd`)
"""
mutable struct WeightDecay
wd::Real
end
function adagrad(p::Param; η::Real = 0.01, ϵ::Real = 1e-8)
acc = zeros(p.x) .+ ϵ
function ()
@. acc += p.Δ ^ 2
@. p.Δ /= acc * η
end
WeightDecay() = WeightDecay(0)
function apply!(o::WeightDecay, x, Δ)
wd = o.wd
@. Δ += wd * x
end
function adadelta(p::Param; ρ::Real = 0.95, ϵ::Real = 1e-8)
acc = zeros(p.x) .+ ϵ
Δacc = zeros(p.x) .+ ϵ
function ()
@. acc = ρ * acc + (1 - ρ) * p.Δ ^ 2
@. p.Δ *= Δacc / acc
@. Δacc = ρ * Δacc + (1 - ρ) * p.Δ ^ 2
end
"""
ClipValue(thresh)
Clip gradients when their absolute value exceeds `thresh`.
"""
mutable struct ClipValue{T}
thresh::T
end
function adam(p::Param; η::Real = 0.001, β1::Real = 0.9, β2::Real = 0.999, ϵ::Real = 1e-8)
mt = zeros(p.x)
vt = zeros(p.x) .+ ϵ
β1p, β2p = β1, β2
function ()
@. mt = β1 * mt + (1 - β1) * p.Δ
@. vt = β2 * vt + (1 - β2) * p.Δ ^ 2
@. p.Δ = (1 - β2p) / (1 - β1p) * mt / vt * η
β1p *= β1
β2p *= β2
end
apply!(o::ClipValue, x, Δ) = clamp!(Δ, -o.thresh, o.thresh)
"""
ClipNorm(thresh)
Clip gradients when their L2 norm exceeds `thresh`.
"""
mutable struct ClipNorm{T}
thresh::T
end
function apply!(o::ClipNorm, x, Δ)
Δnrm = norm(Δ)
if Δnrm > o.thresh
rmul!(Δ, o.thresh / Δnrm)
end
return Δ
end

View File

@ -1,26 +1,123 @@
using Juno
using Flux.Tracker: back!
import Zygote: Params, gradient
tocb(f) = f
tocb(fs::AbstractVector) = () -> foreach(call, fs)
"""
train!(loss, data, opt; cb = () -> ())
update!(x, )
For each datapoint `d` in `data` computes the gradient of `loss(d...)` through
backpropagation and calls the optimizer `opt` and the callback `cb`
(i.e. `opt()` and `cb()`).
Multiple callbacks can be passed to `cb` as an array.
Update the array `x` according to `x .-= x̄`.
"""
function train!(loss, data, opt; cb = () -> ())
cb = tocb(cb)
@progress for d in data
l = loss(d...)
isinf(l.data[]) && error("Loss is Inf")
isnan(l.data[]) && error("Loss is NaN")
back!(l)
opt()
cb()
function update!(x::AbstractArray, )
x .-=
end
"""
update!(opt, p, g)
update!(opt, ps::Params, gs)
Perform an update step of the parameters `ps` (or the single parameter `p`)
according to optimizer `opt` and the gradients `gs` (the gradient `g`).
As a result, the parameters are mutated and the optimizer's internal state may change.
"""
function update!(opt, x, )
x .-= apply!(opt, x, )
end
function update!(opt, xs::Params, gs)
for x in xs
gs[x] == nothing && continue
update!(opt, x, gs[x])
end
end
# Callback niceties
call(f, xs...) = f(xs...)
runall(f) = f
runall(fs::AbstractVector) = () -> foreach(call, fs)
struct StopException <: Exception end
"""
stop()
Call `Flux.stop()` in a callback to indicate when a callback condition is met.
This will trigger the train loop to stop and exit.
# Examples
```julia
cb = function ()
accuracy() > 0.9 && Flux.stop()
end
```
"""
function stop()
throw(StopException())
end
"""
train!(loss, params, data, opt; cb)
For each datapoint `d` in `data` compute the gradient of `loss(d...)` through
backpropagation and call the optimizer `opt`.
In case datapoints `d` are of numeric array type, assume no splatting is needed
and compute the gradient of `loss(d)`.
A callback is given with the keyword argument `cb`. For example, this will print
"training" every 10 seconds (using [`Flux.throttle`](@ref)):
train!(loss, params, data, opt, cb = throttle(() -> println("training"), 10))
The callback can call [`Flux.stop`](@ref) to interrupt the training loop.
Multiple optimisers and callbacks can be passed to `opt` and `cb` as arrays.
"""
function train!(loss, ps, data, opt; cb = () -> ())
ps = Params(ps)
cb = runall(cb)
@progress for d in data
try
if d isa AbstractArray{<:Number}
gs = gradient(ps) do
loss(d)
end
else
gs = gradient(ps) do
loss(d...)
end
end
update!(opt, ps, gs)
cb()
catch ex
if ex isa StopException
break
else
rethrow(ex)
end
end
end
end
"""
@epochs N body
Run `body` `N` times. Mainly useful for quickly doing multiple epochs of
training in a REPL.
# Examples
```jldoctest
julia> Flux.@epochs 2 println("hello")
[ Info: Epoch 1
hello
[ Info: Epoch 2
hello
```
"""
macro epochs(n, ex)
:(@progress for i = 1:$(esc(n))
@info "Epoch $i"
$(esc(ex))
end)
end

View File

@ -1,80 +0,0 @@
module Tracker
using Base: RefValue
export TrackedArray, param, back!
data(x) = x
istracked(x) = false
struct Call{F,As<:Tuple}
func::F
args::As
end
Call(f, args...) = Call{typeof(f),typeof(args)}(f, args)
(c::Call)() = c.func(data.(c.args)...)
struct TrackedArray{T,N,A} <: AbstractArray{T,N}
ref::RefValue{UInt32}
f::Call
data::A
grad::RefValue{A}
end
TrackedScalar{T,A} = TrackedArray{T,0,A}
TrackedVector{T,A} = TrackedArray{T,1,A}
TrackedMatrix{T,A} = TrackedArray{T,2,A}
TrackedVecOrMat{T,A} = Union{TrackedVector{T,A},TrackedMatrix{T,A}}
TrackedArray(c::Call, x::A, Δ::Ref{A}) where A <: AbstractArray =
TrackedArray{eltype(A),ndims(A),A}(Ref(UInt32(0)), c, x, Δ)
TrackedArray(c::Call, x::AbstractArray) = TrackedArray(c, x, RefValue{typeof(x)}())
TrackedArray(c::Call) = TrackedArray(c, c())
TrackedArray(x::AbstractArray) = TrackedArray(Call(nothing), x, RefValue(zeros(x)))
param(xs) = TrackedArray(AbstractFloat.(xs))
istracked(x::TrackedArray) = true
data(x::TrackedArray) = x.data
grad(x::TrackedArray) = x.grad[]
# Fallthrough methods
for f in :[Base.size, Base.ndims].args
@eval @inline $f(x::TrackedArray, a...) = $f(data(x), a...)
end
Base.similar(x::TrackedArray, dims::Union{AbstractUnitRange,Integer}...) =
similar(data(x), dims...)
Base.similar(x::TrackedArray, T::Type) = similar(data(x), T)
Base.show(io::IO, ::Type{TrackedArray{T,N,A}}) where {T,N,A<:AbstractArray{T,N}} =
print(io, "TrackedArray{…,$A}")
function Base.showarray(io::IO, X::TrackedArray, repr::Bool = true; header = true)
if repr
print(io, "param(")
Base.showarray(io, data(X), true)
print(io, ")")
else
header && print(io, "Tracked ")
Base.showarray(io, data(X), false, header = header)
end
end
include("back.jl")
include("lib.jl")
include("numeric.jl")
import NNlib.adapt
adapt(T, xs::TrackedArray) =
TrackedArray(xs.f, adapt(T, xs.data),
RefValue(adapt(T, grad(xs))))
end

View File

@ -1,43 +0,0 @@
scan(x) = nothing
scan(c::Call) = foreach(scan, c.args)
function scan(x::TrackedArray)
ref = x.ref[] += 1
if ref == 1
scan(x.f)
else
isassigned(x.grad) || (x.grad[] = zeros(x.data))
end
return
end
back(c::Call, Δ) = back(c.func, Δ, c.args...)
back(::Call{Void}, Δ) = nothing
function back(x::TrackedArray, Δ)
ref = x.ref[] -= 1
if isassigned(x.grad)
x.grad[] .+= Δ
ref == 0 && back(x.f, x.grad[])
else
ref == 0 && back(x.f, Δ)
end
return
end
macro back(x, Δ)
quote
x = $(esc(x))
istracked(x) && back(x, $(esc(Δ)))
end
end
# Interface methods
function back!(x::TrackedArray, Δ)
scan(x)
back(x, Δ)
end
back!(x::TrackedScalar) = back!(x, 1)

View File

@ -1,129 +0,0 @@
import Base: *
toarray(xs::AbstractArray, ys::AbstractArray) = ys
toarray(xs::AbstractArray, y) = similar(xs, typeof(y), ()) .= y
unarray(xs) = xs
unarray(xs::AbstractArray{T,0} where T) = xs[]
Base.getindex(xs::TrackedArray, i...) =
TrackedArray(Call(getindex, xs, i...), toarray(xs.data, xs.data[i...]))
function back(::typeof(getindex), Δ, xs::TrackedArray, i...)
Δ′ = zeros(xs.data)
Δ′[i...] = unarray(Δ)
@back(xs, Δ′)
end
Base.:-(xs::TrackedArray) = TrackedArray(Call(-, xs))
back(::typeof(-), Δ, xs::TrackedArray) = back(xs, -Δ)
Base.transpose(xs::TrackedArray) = TrackedArray(Call(transpose, xs))
Base.ctranspose(xs::TrackedArray) = TrackedArray(Call(ctranspose, xs))
back(::typeof(transpose), Δ, xs) = @back(xs, trim(xs, Δ.'))
back(::typeof(ctranspose), Δ, xs) = @back(xs, trim(xs, Δ'))
Base.repmat(x::TrackedVecOrMat, a::Integer...) = TrackedArray(Call(repmat, x, a...))
Base.repmat(x::TrackedVecOrMat, a::Int64...) = TrackedArray(Call(repmat, x, a...))
Base.vcat(a::TrackedVector, b::TrackedVector) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::TrackedVector, b::AbstractVector) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::AbstractVector, b::TrackedVector) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::TrackedVecOrMat, b::TrackedVecOrMat) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::TrackedVecOrMat, b::AbstractVecOrMat) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::AbstractVecOrMat, b::TrackedVecOrMat) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::TrackedMatrix, b::TrackedMatrix) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::TrackedMatrix, b::AbstractMatrix) = TrackedArray(Call(vcat, a, b))
Base.vcat(a::AbstractMatrix, b::TrackedMatrix) = TrackedArray(Call(vcat, a, b))
function back(::typeof(vcat), Δ, xs, ys)
i = Base.tail(map(_ -> :, size(Δ)))
@back(xs, Δ[1:size(xs,1), i...])
@back(ys, Δ[size(xs,1)+1:end, i...])
end
# Reductions
Base.sum(xs::TrackedArray, dim) = TrackedArray(Call(sum, xs, dim))
Base.sum(xs::TrackedArray) = TrackedArray(Call(sum, xs), toarray(xs.data, sum(xs.data)))
Base.sum(xs::TrackedScalar, dim...) = xs
back(::typeof(sum), Δ, xs::TrackedArray, dim...) = back(xs, similar(xs.data) .= Δ)
Base.maximum(xs::TrackedArray, args...) = maximum(xs.data, args...)
Base.findfirst(xs::TrackedArray, args...) = findfirst(xs.data, args...)
# BLAS
a::TrackedMatrix * b::TrackedMatrix = TrackedArray(Call(*, a, b))
a::TrackedMatrix * b::AbstractMatrix = TrackedArray(Call(*, a, b))
a::AbstractMatrix * b::TrackedMatrix = TrackedArray(Call(*, a, b))
a::TrackedMatrix * b::TrackedVector = TrackedArray(Call(*, a, b))
a::TrackedMatrix * b::AbstractVector = TrackedArray(Call(*, a, b))
a::AbstractMatrix * b::TrackedVector = TrackedArray(Call(*, a, b))
function back(::typeof(*), Δ, a::AbstractMatrix, b::AbstractVecOrMat)
@back(a, A_mul_Bt(Δ, data(b)))
@back(b, At_mul_B(data(a), Δ))
end
# NNlib
import NNlib: softmax, ∇softmax
softmax(xs::TrackedArray) = TrackedArray(Call(softmax, xs))
back(::typeof(softmax), Δ, xs) = @back(xs, ∇softmax(Δ, data(xs)))
# Broadcasting
using ForwardDiff: Dual, partials
struct Broadcasted{T}
data::T
end
(b::Broadcasted)(xs...) = map(x -> x.value, b.data)
dualify(xs, n) = xs
dualify(xs::TrackedArray, ps) = map(x -> Dual(x, ps), data(xs))
function tracked_broadcast(f, args::Vararg{Any,N}) where N
dargs = map((x,i) -> dualify(x, ntuple(j -> i==j, Val{N})), args, ntuple(identity, Val{N}))
# TrackedArray(Call(Broadcasted(broadcast(f, dargs...)), args...))
# Works around a 0.6 type inference issue
b = Broadcasted(broadcast(f, dargs...))
TrackedArray(Call(b, args...), b())
end
trim(x, Δ) = reshape(Δ, ntuple(i -> size(Δ, i), Val{ndims(x)}))
unbroadcast(x, Δ) =
size(x) == size(Δ) ? Δ :
trim(x, sum(Δ, filter(n -> size(x, n) == 1, 1:ndims(Δ))))
function getpartial(Δ, x, i)
@inbounds p = getindex(partials(x), i)
return Δ * p
end
function back(b::Broadcasted, Δ, args::Vararg{Any,N}) where N
Δargs = ntuple(i -> getpartial.(Δ, b.data, i), Val{N})
foreach((x, Δ) -> @back(x, unbroadcast(x, Δ)), args, Δargs)
end
Base.Broadcast._containertype(::Type{<:TrackedArray}) = TrackedArray
Base.Broadcast.promote_containertype(::Type{TrackedArray}, ::Type{TrackedArray}) = TrackedArray
Base.Broadcast.promote_containertype(::Type{Array}, ::Type{TrackedArray}) = TrackedArray
Base.Broadcast.promote_containertype(::Type{TrackedArray}, ::Type{Array}) = TrackedArray
Base.Broadcast.promote_containertype(::Type{TrackedArray}, ct) = TrackedArray
Base.Broadcast.promote_containertype(ct, ::Type{TrackedArray}) = TrackedArray
Base.Broadcast.broadcast_indices(::Type{TrackedArray}, A::Ref) = ()
Base.Broadcast.broadcast_indices(::Type{TrackedArray}, A) = indices(A)
Base.Broadcast.broadcast_c(f, ::Type{TrackedArray}, A, Bs...) = tracked_broadcast(f, A, Bs...)

View File

@ -1,22 +0,0 @@
function gradient(f, xs::AbstractArray...)
xs = param.(xs)
back!(f(xs...))
grad.(xs)
end
function ngradient(f, xs::AbstractArray...)
grads = zeros.(xs)
for (x, Δ) in zip(xs, grads), i in 1:length(x)
δ = sqrt(eps())
tmp = x[i]
x[i] = tmp - δ/2
y1 = f(xs...)
x[i] = tmp + δ/2
y2 = f(xs...)
x[i] = tmp
Δ[i] = (y2-y1)/δ
end
return grads
end
gradcheck(f, xs...) = all(isapprox.(ngradient(f, xs...), gradient(f, xs...), rtol = 1e-6))

View File

@ -1,34 +0,0 @@
children(x) = ()
mapchildren(f, x) = x
function treelike(T, fs = fieldnames(T))
@eval begin
children(x::$T) = ($([:(x.$f) for f in fs]...),)
mapchildren(f, x::$T) = $T(f.(children(x))...)
end
end
isleaf(x) = isempty(children(x))
function mapleaves(f, x; cache = ObjectIdDict())
haskey(cache, x) && return cache[x]
cache[x] = isleaf(x) ? f(x) : mapchildren(x -> mapleaves(f, x, cache = cache), x)
end
export mapparams
@deprecate mapparams(f, x) mapleaves(f, x)
using DataFlow: OSet
function forleaves(f, x; seen = OSet())
x seen && return
push!(seen, x)
isleaf(x) ? f(x) : foreach(x -> forleaves(f, x, seen = seen), children(x))
return
end
function params(m)
ps = []
forleaves(p -> p isa TrackedArray && push!(ps, p), m)
return ps
end

View File

@ -1,67 +1,315 @@
# Arrays
nfan() = 1, 1 # fan_in, fan_out
nfan(n) = 1, n # A vector is treated as a n×1 matrix
nfan(n_out, n_in) = n_in, n_out # In case of Dense kernels: arranged as matrices
nfan(dims...) = prod(dims[1:end-2]) .* (dims[end-1], dims[end]) # In case of convolution kernels
initn(dims...) = randn(dims...)/100
"""
glorot_uniform(dims...)
flatten(xs) = reshape(xs, size(xs, 1), :)
Return an `Array` of size `dims` containing random variables taken from a uniform
distribution in the interval ``[-x, x]``, where `x = sqrt(24 / sum(dims)) / 2`.
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
julia> Flux.glorot_uniform(2, 3)
2×3 Array{Float32,2}:
0.601094 -0.57414 -0.814925
0.900868 0.805994 0.057514
```
"""
glorot_uniform(dims...) = (rand(Float32, dims...) .- 0.5f0) .* sqrt(24.0f0 / sum(nfan(dims...)))
"""
glorot_normal(dims...)
Return an `Array` of size `dims` containing random variables taken from a normal
distribution with mean 0 and standard deviation `sqrt(2 / sum(dims))`.
# Examples
```jldoctest; setup = :(using Random; Random.seed!(0))
julia> Flux.glorot_normal(3, 2)
3×2 Array{Float32,2}:
0.429505 -0.0852891
0.523935 0.371009
-0.223261 0.188052
```
"""
glorot_normal(dims...) = randn(Float32, dims...) .* sqrt(2.0f0 / sum(nfan(dims...)))
ones(T::Type, dims...) = Base.ones(T, dims...)
zeros(T::Type, dims...) = Base.zeros(T, dims...)
ones(dims...) = Base.ones(Float32, dims...)
zeros(dims...) = Base.zeros(Float32, dims...)
"""
unsqueeze(xs, dim)
Return `xs` reshaped into an `Array` one dimensionality higher than `xs`,
where `dim` indicates in which dimension `xs` is extended.
# Examples
```jldoctest
julia> xs = [[1, 2], [3, 4], [5, 6]]
3-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
julia> Flux.unsqueeze(xs, 1)
1×3 Array{Array{Int64,1},2}:
[1, 2] [3, 4] [5, 6]
julia> Flux.unsqueeze([1 2; 3 4], 2)
2×1×2 Array{Int64,3}:
[:, :, 1] =
1
3
[:, :, 2] =
2
4
```
"""
unsqueeze(xs, dim) = reshape(xs, (size(xs)[1:dim-1]..., 1, size(xs)[dim:end]...))
stack(xs, dim) = cat(dim, unsqueeze.(xs, dim)...)
unstack(xs, dim) = [slicedim(xs, dim, i) for i = 1:size(xs, dim)]
"""
stack(xs, dim)
batchindex(xs, i) = (reverse(Base.tail(reverse(indices(xs))))..., i)
Concatenate the given `Array` of `Array`s `xs` into a single `Array` along the
given dimension `dim`.
# Examples
```jldoctest
julia> xs = [[1, 2], [3, 4], [5, 6]]
3-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
julia> Flux.stack(xs, 1)
3×2 Array{Int64,2}:
1 2
3 4
5 6
julia> cat(xs, dims=1)
3-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
```
"""
stack(xs, dim) = cat(unsqueeze.(xs, dim)..., dims=dim)
"""
unstack(xs, dim)
Unroll the given `xs` into an `Array` of `Array`s along the given dimension `dim`.
# Examples
```jldoctest
julia> Flux.unstack([1 3 5 7; 2 4 6 8], 2)
4-element Array{Array{Int64,1},1}:
[1, 2]
[3, 4]
[5, 6]
[7, 8]
```
"""
unstack(xs, dim) = [copy(selectdim(xs, dim, i)) for i in 1:size(xs, dim)]
"""
chunk(xs, n)
Split `xs` into `n` parts.
# Examples
```jldoctest
julia> Flux.chunk(1:10, 3)
3-element Array{UnitRange{Int64},1}:
1:4
5:8
9:10
julia> Flux.chunk(collect(1:10), 3)
3-element Array{SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true},1}:
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10]
```
"""
chunk(xs, n) = collect(Iterators.partition(xs, ceil(Int, length(xs)/n)))
batchindex(xs, i) = (reverse(Base.tail(reverse(axes(xs))))..., i)
"""
frequencies(xs)
Count the number of times that each element of `xs` appears.
# Examples
```jldoctest
julia> Flux.frequencies(['a','b','b'])
Dict{Char,Int64} with 2 entries:
'a' => 1
'b' => 2
```
"""
function frequencies(xs)
fs = Dict{eltype(xs),Int}()
for x in xs
fs[x] = get(fs, x, 0) + 1
end
return fs
end
head(x::Tuple) = reverse(Base.tail(reverse(x)))
squeezebatch(x) = reshape(x, head(size(x)))
"""
batch(xs)
Batch the arrays in `xs` into a single array.
# Examples
```jldoctest
julia> Flux.batch([[1,2,3],[4,5,6]])
3×2 Array{Int64,2}:
1 4
2 5
3 6
```
"""
function batch(xs)
data = similar(first(xs), size(first(xs))..., length(xs))
data = first(xs) isa AbstractArray ?
similar(first(xs), size(first(xs))..., length(xs)) :
Vector{eltype(xs)}(undef, length(xs))
for (i, x) in enumerate(xs)
data[batchindex(data, i)...] = x
end
return data
end
"""
Return the given sequence padded with `p` up to a maximum length of `n`.
# Examples
```jldoctest
julia> rpad([1, 2], 4, 0)
4-element Array{Int64,1}:
1
2
0
0
julia> rpad([1, 2, 3], 2, 0)
3-element Array{Int64,1}:
1
2
3
```
"""
Base.rpad(v::AbstractVector, n::Integer, p) = [v; fill(p, max(n - length(v), 0))]
function batchseq(xs, pad, n = maximum(length(x) for x in xs))
"""
batchseq(seqs, pad)
Take a list of `N` sequences, and turn them into a single sequence where each
item is a batch of `N`. Short sequences will be padded by `pad`.
# Examples
```jldoctest
julia> Flux.batchseq([[1, 2, 3], [4, 5]], 0)
3-element Array{Array{Int64,1},1}:
[1, 4]
[2, 5]
[3, 0]
```
"""
function batchseq(xs, pad = nothing, n = maximum(length(x) for x in xs))
xs_ = [rpad(x, n, pad) for x in xs]
[batch([xs_[j][i] for j = 1:length(xs_)]) for i = 1:n]
end
# Other
# Flattening models to weight vectors, and back
function accuracy(m, data)
n = 0
correct = 0
for (x, y) in data
x, y = tobatch.((x, y))
n += size(x, 1)
correct += sum(argmax(m(x)) .== argmax(y))
function _restructure(m, xs)
i = 0
fmap(m) do x
x isa AbstractArray || return x
x = reshape(xs[i.+(1:length(x))], size(x))
i += length(x)
return x
end
return correct/n
end
@adjoint function _restructure(m, xs)
_restructure(m, xs), dm -> (nothing,destructure(dm)[1])
end
"""
Returns a function that when invoked, will only be triggered at most once
during `timeout` seconds. Normally, the throttled function will run
as much as it can, without ever going more than once per `wait` duration;
but if you'd like to disable the execution on the leading edge, pass
`leading=false`. To enable execution on the trailing edge, ditto.
destructure(m)
Flatten a model's parameters into a single weight vector.
julia> m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
julia> θ, re = destructure(m);
julia> θ
67-element Array{Float32,1}:
-0.1407104
...
The second return value `re` allows you to reconstruct the original network after making
modifications to the weight vector (for example, with a hypernetwork).
julia> re(θ .* 2)
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
"""
function destructure(m)
xs = Zygote.Buffer([])
fmap(m) do x
x isa AbstractArray && push!(xs, x)
return x
end
return vcat(vec.(copy(xs))...), p -> _restructure(m, p)
end
# Other
"""
throttle(f, timeout; leading=true, trailing=false)
Return a function that when invoked, will only be triggered at most once
during `timeout` seconds.
Normally, the throttled function will run as much as it can, without ever
going more than once per `wait` duration; but if you'd like to disable the
execution on the leading edge, pass `leading=false`. To enable execution on
the trailing edge, pass `trailing=true`.
"""
function throttle(f, timeout; leading=true, trailing=false)
cooldown = true
later = nothing
result = nothing
function throttled(args...; kwargs...)
yield()
if cooldown
if leading
f(args...; kwargs...)
result = f(args...; kwargs...)
else
later = () -> f(args...; kwargs...)
end
cooldown = false
@schedule try
@async try
while (sleep(timeout); later != nothing)
later()
later = nothing
@ -70,9 +318,24 @@ function throttle(f, timeout; leading=true, trailing=false)
cooldown = true
end
elseif trailing
later = () -> f(args...; kwargs...)
later = () -> (result = f(args...; kwargs...))
end
nothing
return result
end
end
"""
@jit ...
The `@jit` annotation can be applied to any code, and the code will be compiled
for performance.
@jit f(x) = @jit(x) + @jit(x)
Note that compilation happens regardless of the `@jit` macro, so it should only
be used for aesthetic purposes, or by recovering Python users.
"""
macro jit(ex)
esc(ex)
end

106
src/zeros.jl Normal file
View File

@ -0,0 +1,106 @@
import Base: +, -, *, reshape, size
import Base.Broadcast: broadcasted, Broadcasted, BroadcastStyle
"""
Zeros()
Zeros(size...)
Zeros(Type, size...)
Acts as a stand-in for an array of zeros that can be
used during training which is ignored by the optimisers.
Useful to turn bias off for a forward pass of a layer.
## Examples
```julia
julia> Flux.Zeros(3,3)
3×3 Flux.Zeros{Bool,2}:
false false false
false false false
false false false
julia> Flux.Zeros(Float32, 3,3)
3×3 Flux.Zeros{Float32,2}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> rand(3,3) .+ Flux.Zeros()
3×3 Array{Float64,2}:
0.198739 0.490459 0.785386
0.779074 0.39986 0.66383
0.854981 0.447292 0.314497
julia> bias_less_conv = Conv((2,2), 1=>3, bias = Flux.Zeros())
Conv((2, 2), 1=>3)
```
"""
struct Zeros{T,N} <: AbstractArray{T,N}
size::Tuple
end
Zeros(::Type{T}, sz...) where T = Zeros{T,length(sz)}(sz)
Zeros(sz::Integer...) = Zeros(Bool, sz...)
Base.size(xs::Zeros) = xs.size
Base.axes(xs::Zeros) = Base.OneTo.(size(xs))
Base.IndexStyle(::Type{<:Zeros}) = IndexLinear()
Base.getindex(xs::Zeros{T,N}, I::Int) where {T,N} = zero(T)
Base.getindex(xs::Zeros{T,N}, inds::Union{Base.OneTo, Base.UnitRange}) where {T,N} =
Zeros(T, length(inds))
Base.collect(xs::Zeros{T,N}) where {T,N} = fill(zero(T), size(xs))
@adjoint reshape(xs::Zeros{T}, dims...) where T =
reshape(xs, dims...), _ -> nothing
# Define basic ops
for f in (:+, :-)
@eval @inline function $f(a::Union{AbstractArray{<:Number}, Zeros}, b::Zeros)
@assert size(a) == size(b) throw(DimensionMismatch("dimensions must match"))
a
end
end
+(a::Zeros, b::AbstractArray) = b + a
-(a::Zeros, b::AbstractArray) = -b + a
Base.copy(xs::Zeros{T,N}) where {T,N} = xs
# Define broadcasting behaviour
for op in (:+, :-)
@eval function broadcasted(::typeof($op), a::AbstractArray, b::Zeros)
bs = Broadcast.broadcast_shape(size(a), size(b))
size(a) == bs && return a
sz = similar(a, bs)
sz .= a
end
end
broadcasted(::typeof(+), a::Zeros, b::AbstractArray) = broadcasted(+, b, a)
broadcasted(::typeof(-), a::Zeros, b::AbstractArray) = broadcasted(+, -b, a)
function broadcasted(::typeof(*), a::AbstractArray, b::Zeros)
Zeros(Broadcast.broadcast_shape(size(a), size(b))...)
end
broadcasted(::typeof(*), a::Zeros, b::AbstractArray) = broadcasted(*, b, a)
for op in (:+, :-, :*)
@eval broadcasted(::typeof($op), a::Zeros, b::Zeros) = Zeros(Broadcast.broadcast_shape(size(a), size(b))...)
end
# Some opportunities to avoid scalar indexing, intermediaries
# Since it replicates a little of what we expect Base to do,
# it should be possible to remove in the future, but for now,
# these help with performance.
broadcasted(::typeof(+), a::AbstractArray, b::Zeros{T,0}) where T = a
broadcasted(::typeof(+), a::Zeros{T,0}, b::AbstractArray) where T = b
broadcasted(::typeof(-), a::AbstractArray, b::Zeros{T,0}) where T = a
broadcasted(::typeof(-), a::Zeros{T,0}, b::AbstractArray) where T = -b
broadcasted(::typeof(*), a::AbstractArray, b::Zeros{T,0}) where T = zero(a)
broadcasted(::typeof(*), a::Zeros{T,0}, b::AbstractArray) where T = zero(b)
broadcasted(::typeof(/), a::Zeros{T,0}, b::AbstractArray) where T = zero(b)

75
test/cuda/cuda.jl Normal file
View File

@ -0,0 +1,75 @@
using Flux, Test
using Flux.CuArrays
using Flux: gpu
@info "Testing GPU Support"
@testset "CuArrays" begin
CuArrays.allowscalar(false)
x = randn(5, 5)
cx = gpu(x)
@test cx isa CuArray
@test Flux.onecold(gpu([1.0, 2.0, 3.0])) == 3
x = Flux.onehotbatch([1, 2, 3], 1:3)
cx = gpu(x)
@test cx isa Flux.OneHotMatrix && cx.data isa CuArray
@test (cx .+ 1) isa CuArray
m = Chain(Dense(10, 5, tanh), Dense(5, 2), softmax)
cm = gpu(m)
@test all(p isa CuArray for p in params(cm))
@test cm(gpu(rand(10, 10))) isa CuArray{Float32,2}
x = [1.,2.,3.]
cx = gpu(x)
@test Flux.crossentropy(x,x) Flux.crossentropy(cx,cx)
@test Flux.crossentropy(x,x, weight=1.0) Flux.crossentropy(cx,cx, weight=1.0)
@test Flux.crossentropy(x,x, weight=[1.0;2.0;3.0]) Flux.crossentropy(cx,cx, weight=cu([1.0;2.0;3.0]))
x = [-1.1491, 0.8619, 0.3127]
y = [1, 1, 0.]
@test Flux.binarycrossentropy.(σ.(x),y) Array(Flux.binarycrossentropy.(cu(σ.(x)),cu(y)))
@test Flux.logitbinarycrossentropy.(x,y) Array(Flux.logitbinarycrossentropy.(cu(x),cu(y)))
xs = rand(5, 5)
ys = Flux.onehotbatch(1:5,1:5)
@test collect(cu(xs) .+ cu(ys)) collect(xs .+ ys)
c = gpu(Conv((2,2),3=>4))
x = gpu(rand(10, 10, 3, 2))
l = c(gpu(rand(10,10,3,2)))
@test gradient(x -> sum(c(x)), x)[1] isa CuArray
c = gpu(CrossCor((2,2),3=>4))
x = gpu(rand(10, 10, 3, 2))
l = c(gpu(rand(10,10,3,2)))
@test gradient(x -> sum(c(x)), x)[1] isa CuArray
end
@testset "onecold gpu" begin
y = Flux.onehotbatch(ones(3), 1:10) |> gpu;
@test Flux.onecold(y) isa CuArray
@test y[3,:] isa CuArray
end
@testset "restructure gpu" begin
dudt = Dense(1,1) |> gpu
p,re = Flux.destructure(dudt)
foo(x) = sum(re(p)(x))
@test gradient(foo, cu(rand(1)))[1] isa CuArray
end
if CuArrays.has_cudnn()
@info "Testing Flux/CUDNN"
include("cudnn.jl")
include("curnn.jl")
include("layers.jl")
else
@warn "CUDNN unavailable, not testing GPU DNN support"
end

44
test/cuda/cudnn.jl Normal file
View File

@ -0,0 +1,44 @@
using Flux, CuArrays, Test
using Flux: pullback
@testset "CUDNN BatchNorm" begin
@testset "4D Input" begin
x = Float64.(collect(reshape(1:12, 2, 2, 3, 1)))
m = BatchNorm(3)
cx = gpu(x)
cm = gpu(m)
y, back = pullback((m, x) -> m(x), m, x)
cy, cback = pullback((m, x) -> m(x), cm, cx)
@test cpu(cy) y
Δ = randn(size(y))
dm, dx = back(Δ)
cdm, cdx = cback(gpu(Δ))
@test dm[].γ cpu(cdm[].γ)
@test dm[].β cpu(cdm[].β)
@test dx cpu(cdx)
end
@testset "2D Input" begin
x = Float64.(collect(reshape(1:12, 3, 4)))
m = BatchNorm(3)
cx = gpu(x)
cm = gpu(m)
y, back = pullback((m, x) -> m(x), m, x)
cy, cback = pullback((m, x) -> m(x), cm, cx)
@test cpu(cy) y
Δ = randn(size(y))
dm, dx = back(Δ)
cdm, cdx = cback(gpu(Δ))
@test dm[].γ cpu(cdm[].γ)
@test dm[].β cpu(cdm[].β)
@test dx cpu(cdx)
end
end

63
test/cuda/curnn.jl Normal file
View File

@ -0,0 +1,63 @@
using Flux, CuArrays, Test
using Flux: pullback
@testset for R in [RNN, GRU, LSTM]
m = R(10, 5) |> gpu
x = gpu(rand(10))
(,) = gradient(m -> sum(m(x)), m)
Flux.reset!(m)
θ = gradient(() -> sum(m(x)), params(m))
@test collect([].cell[].Wi) == collect(θ[m.cell.Wi])
end
@testset "RNN" begin
@testset for R in [RNN, GRU, LSTM], batch_size in (1, 5)
rnn = R(10, 5)
curnn = fmap(gpu, rnn)
Flux.reset!(rnn)
Flux.reset!(curnn)
x = batch_size == 1 ?
rand(10) :
rand(10, batch_size)
cux = gpu(x)
y, back = pullback((r, x) -> r(x), rnn, x)
cuy, cuback = pullback((r, x) -> r(x), curnn, cux)
@test y collect(cuy)
@test haskey(Flux.CUDA.descs, curnn.cell)
= randn(size(y))
, = back()
cum̄, cux̄ = cuback(gpu())
[].cell[].Wi
[].state
cum̄[].state
@test collect(cux̄)
@test [].cell[].Wi collect(cum̄[].cell[].Wi)
@test [].cell[].Wh collect(cum̄[].cell[].Wh)
@test [].cell[].b collect(cum̄[].cell[].b)
if [].state isa Tuple
for (x, cx) in zip([].state, cum̄[].state)
@test x collect(cx)
end
else
@test [].state collect(cum̄[].state)
end
Flux.reset!(rnn)
Flux.reset!(curnn)
ohx = batch_size == 1 ?
Flux.onehot(rand(1:10), 1:10) :
Flux.onehotbatch(rand(1:10, batch_size), 1:10)
cuohx = gpu(ohx)
y = (rnn(ohx); rnn(ohx))
cuy = (curnn(cuohx); curnn(cuohx))
@test y collect(cuy)
end
end

98
test/cuda/layers.jl Normal file
View File

@ -0,0 +1,98 @@
# Test layers and data/model movements on and off the GPU
# Add tests for layers and their gradients on the GPU
# Most of the forward passes should be fine being applied
# to bitstype objects, but this gives higher coverage for our use-cases
# Check that getting the gradients does not throw
# generic movement tests
@testset "Basic GPU Movement" begin
@test gradient(x -> sum(gpu(x)), rand(3,3)) isa Tuple
@test gradient(x -> sum(cpu(x)), gpu(rand(3,3))) isa Tuple
end
# TODO: These layers get into scalar indexing
# `AlphaDropout` throws a compilation error on GPUs,
# whereas, the rest are scalar indexing issues.
const BROKEN_LAYERS = [DepthwiseConv,
AlphaDropout,
InstanceNorm,
GroupNorm]
function gradtest(name::String, layers::Vector, xs = nothing, args...)
isnothing(xs) && error("Missing input to test the layers against.")
@testset "$name GPU grad tests" begin
for layer in layers
@testset "$layer GPU grad test" begin
l = gpu(layer(args...))
xs = gpu(xs)
if any(x -> isa(l, x), BROKEN_LAYERS)
ps = Flux.params(l)
@test_broken gradient(() -> sum(l(xs)), ps) isa Flux.Zygote.Grads
else
ps = Flux.params(l)
@test gradient(() -> sum(l(xs)), ps) isa Flux.Zygote.Grads
gs = gradient(() -> sum(l(xs)), ps)
# Handle pooling layers
if !isempty(ps)
@test gs[first(ps)] isa Flux.CuArrays.CuArray
end
end
end
end
end
end
# Repeats from Conv, CrossCor
r = rand(Float32, 28, 28, 1, 1)
conv_layers = [Conv, ConvTranspose, CrossCor, DepthwiseConv]
gradtest("Conv", conv_layers, r, (2,2), 1=>3)
pooling_layers = [MaxPool, MeanPool]
gradtest("Pooling", pooling_layers, r, (2,2))
dropout_layers = [Dropout, AlphaDropout]
gradtest("Dropout", dropout_layers, r, 0.5f0)
norm_layers = [LayerNorm, BatchNorm]
gradtest("Normalising", norm_layers, rand(Float32, 28,28,3,1), 1)
instancenorm = [InstanceNorm]
gradtest("InstanceNorm", instancenorm, r, 1)
groupnorm = [GroupNorm]
gradtest("GroupNorm", groupnorm, rand(Float32, 28,28,3,1), 3, 1)
const stateless_layers = [Flux.mse,
Flux.crossentropy,
Flux.logitcrossentropy,
Flux.normalise]
const stateless_layers_broadcasted = [Flux.binarycrossentropy,
Flux.logitbinarycrossentropy]
function stateless_gradtest(f, args...)
@test gradient((args...) -> sum(f(args...)), args...)[1] isa CuArray
end
function stateless_gradtest_broadcasted(f, args...)
@test gradient((args...) -> sum(f.(args...)), args...)[1] isa CuArray
end
@testset "Stateless GPU grad tests" begin
x = gpu(rand(3,3))
y = gpu(rand(3,3))
for layer in stateless_layers
if layer == Flux.normalise
stateless_gradtest(layer, x)
else
stateless_gradtest(layer, x, y)
end
end
for layer in stateless_layers_broadcasted
stateless_gradtest_broadcasted(layer, x, y)
end
end

116
test/data.jl Normal file
View File

@ -0,0 +1,116 @@
@testset "DataLoader" begin
X = reshape([1:10;], (2, 5))
Y = [1:5;]
d = DataLoader(X, batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == typeof(X)
@test length(batches) == 3
@test batches[1] == X[:,1:2]
@test batches[2] == X[:,3:4]
@test batches[3] == X[:,5:5]
d = DataLoader(X, batchsize=2, partial=false)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == typeof(X)
@test length(batches) == 2
@test batches[1] == X[:,1:2]
@test batches[2] == X[:,3:4]
d = DataLoader((X,), batchsize=2, partial=false)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == Tuple{typeof(X)}
@test length(batches) == 2
@test batches[1] == (X[:,1:2],)
@test batches[2] == (X[:,3:4],)
d = DataLoader((X, Y), batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == Tuple{typeof(X), typeof(Y)}
@test length(batches) == 3
@test length(batches[1]) == 2
@test length(batches[2]) == 2
@test length(batches[3]) == 2
@test batches[1][1] == X[:,1:2]
@test batches[1][2] == Y[1:2]
@test batches[2][1] == X[:,3:4]
@test batches[2][2] == Y[3:4]
@test batches[3][1] == X[:,5:5]
@test batches[3][2] == Y[5:5]
# test with NamedTuple
d = DataLoader((x=X, y=Y), batchsize=2)
@inferred first(d)
batches = collect(d)
@test eltype(batches) == eltype(d) == NamedTuple{(:x, :y), Tuple{typeof(X), typeof(Y)}}
@test length(batches) == 3
@test length(batches[1]) == 2
@test length(batches[2]) == 2
@test length(batches[3]) == 2
@test batches[1][1] == batches[1].x == X[:,1:2]
@test batches[1][2] == batches[1].y == Y[1:2]
@test batches[2][1] == batches[2].x == X[:,3:4]
@test batches[2][2] == batches[2].y == Y[3:4]
@test batches[3][1] == batches[3].x == X[:,5:5]
@test batches[3][2] == batches[3].y == Y[5:5]
# test interaction with `train!`
θ = ones(2)
X = zeros(2, 10)
loss(x) = sum((x .- θ).^2)
d = DataLoader(X)
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1))
@test norm(θ) < 1e-4
# test interaction with `train!`
θ = zeros(2)
X = ones(2, 10)
Y = fill(2, 10)
loss(x, y) = sum((y - x'*θ).^2)
d = DataLoader((X, Y))
Flux.train!(loss, [θ], ncycle(d, 10), Descent(0.1))
@test norm(θ .- 1) < 1e-10
end
@testset "CMUDict" begin
@test cmudict()["CATASTROPHE"] == :[K,AH0,T,AE1,S,T,R,AH0,F,IY0].args
@test length(CMUDict.phones()) == 39
@test length(CMUDict.symbols()) == 84
end
@testset "MNIST" begin
@test MNIST.images()[1] isa Matrix
@test MNIST.labels() isa Vector{Int64}
end
@testset "FashionMNIST" begin
@test FashionMNIST.images()[1] isa Matrix
@test FashionMNIST.labels() isa Vector{Int64}
end
@testset "Sentiment" begin
@test Data.Sentiment.train() isa Vector{Data.Tree{Any}}
end
@testset "Iris" begin
@test Iris.features() isa Matrix
@test size(Iris.features()) == (4,150)
@test Iris.labels() isa Vector{String}
@test size(Iris.labels()) == (150,)
end
@testset "Housing" begin
@test Housing.features() isa Matrix # test broken due to SSL certifate expiration problem
@test size(Housing.features()) == (506, 13)
@test Housing.targets() isa Array{Float64}
@test size(Housing.targets()) == (506, 1)
end

117
test/layers/basic.jl Normal file
View File

@ -0,0 +1,117 @@
using Test, Random
import Flux: activations
@testset "basic" begin
@testset "helpers" begin
@testset "activations" begin
dummy_model = Chain(x->x.^2, x->x .- 3, x -> tan.(x))
x = randn(10)
@test activations(dummy_model, x)[1] == x.^2
@test activations(dummy_model, x)[2] == (x.^2 .- 3)
@test activations(dummy_model, x)[3] == tan.(x.^2 .- 3)
@test activations(Chain(), x) == ()
@test activations(Chain(identity, x->:foo), x)[2] == :foo # results include `Any` type
end
end
@testset "Chain" begin
@test_nowarn Chain(Dense(10, 5, σ), Dense(5, 2))(randn(10))
@test_throws DimensionMismatch Chain(Dense(10, 5, σ),Dense(2, 1))(randn(10))
# numeric test should be put into testset of corresponding layer
end
@testset "Activations" begin
c = Chain(Dense(3,5,relu), Dense(5,1,relu))
X = Float32.([1.0; 1.0; 1.0])
@test_nowarn gradient(()->Flux.activations(c, X)[2][1], params(c))
end
@testset "Dense" begin
@testset "constructors" begin
@test size(Dense(10, 100).W) == (100, 10)
@test Dense(rand(100,10), rand(10)).σ == identity
@test_throws MethodError Dense(10, 10.5)
@test_throws MethodError Dense(10, 10.5, tanh)
end
@test length(Dense(10, 5)(randn(10))) == 5
@test_throws DimensionMismatch Dense(10, 5)(randn(1))
@test_throws MethodError Dense(10, 5)(1) # avoid broadcasting
@test_throws MethodError Dense(10, 5).(randn(10)) # avoid broadcasting
@test Dense(10, 1, identity, initW = ones, initb = zeros)(ones(10,1)) == 10*ones(1, 1)
@test Dense(10, 1, identity, initW = ones, initb = zeros)(ones(10,2)) == 10*ones(1, 2)
@test Dense(10, 2, identity, initW = ones, initb = zeros)(ones(10,1)) == 10*ones(2, 1)
@test Dense(10, 2, identity, initW = ones, initb = zeros)([ones(10,1) 2*ones(10,1)]) == [10 20; 10 20]
end
@testset "Diagonal" begin
@test length(Flux.Diagonal(10)(randn(10))) == 10
@test length(Flux.Diagonal(10)(1)) == 10
@test length(Flux.Diagonal(10)(randn(1))) == 10
@test_throws DimensionMismatch Flux.Diagonal(10)(randn(2))
@test Flux.Diagonal(2)([1 2]) == [1 2; 1 2]
@test Flux.Diagonal(2)([1,2]) == [1,2]
@test Flux.Diagonal(2)([1 2; 3 4]) == [1 2; 3 4]
end
@testset "Maxout" begin
# Note that the normal common usage of Maxout is as per the docstring
# These are abnormal constructors used for testing purposes
@testset "Constructor" begin
mo = Maxout(() -> identity, 4)
input = rand(40)
@test mo(input) == input
end
@testset "simple alternatives" begin
mo = Maxout((x -> x, x -> 2x, x -> 0.5x))
input = rand(40)
@test mo(input) == 2*input
end
@testset "complex alternatives" begin
mo = Maxout((x -> [0.5; 0.1]*x, x -> [0.2; 0.7]*x))
input = [3.0 2.0]
target = [0.5, 0.7].*input
@test mo(input) == target
end
@testset "params" begin
mo = Maxout(()->Dense(32, 64), 4)
ps = params(mo)
@test length(ps) == 8 #4 alts, each with weight and bias
end
end
@testset "SkipConnection" begin
@testset "zero sum" begin
input = randn(10, 10, 10, 10)
@test SkipConnection(x -> zeros(size(x)), (a,b) -> a + b)(input) == input
end
@testset "concat size" begin
input = randn(10, 2)
@test size(SkipConnection(Dense(10,10), (a,b) -> cat(a, b, dims = 2))(input)) == (10,4)
end
end
@testset "output dimensions" begin
m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32))
@test Flux.outdims(m, (10, 10)) == (6, 6)
m = Dense(10, 5)
@test Flux.outdims(m, (5, 2)) == (5,)
@test Flux.outdims(m, (10,)) == (5,)
m = Flux.Diagonal(10)
@test Flux.outdims(m, (10,)) == (10,)
m = Maxout(() -> Conv((3, 3), 3 => 16), 2)
@test Flux.outdims(m, (10, 10)) == (8, 8)
end
end

218
test/layers/conv.jl Normal file
View File

@ -0,0 +1,218 @@
using Flux, Test
using Flux: maxpool, meanpool
using Flux: gradient
@testset "Pooling" begin
x = randn(Float32, 10, 10, 3, 2)
gmp = GlobalMaxPool()
@test size(gmp(x)) == (1, 1, 3, 2)
gmp = GlobalMeanPool()
@test size(gmp(x)) == (1, 1, 3, 2)
mp = MaxPool((2, 2))
@test mp(x) == maxpool(x, PoolDims(x, 2))
mp = MeanPool((2, 2))
@test mp(x) == meanpool(x, PoolDims(x, 2))
end
@testset "CNN" begin
r = zeros(Float32, 28, 28, 1, 5)
m = Chain(
Conv((2, 2), 1=>16, relu),
MaxPool((2,2)),
Conv((2, 2), 16=>8, relu),
MaxPool((2,2)),
x -> reshape(x, :, size(x, 4)),
Dense(288, 10), softmax)
@test size(m(r)) == (10, 5)
# Test bias switch
bias = Conv(ones(Float32, 2, 2, 1, 3), ones(Float32, 3))
ip = zeros(Float32, 28,28,1,1)
op = bias(ip)
@test sum(op) == prod(size(op))
bias = Conv((2,2), 1=>3, bias = Flux.Zeros())
op = bias(ip)
@test sum(op) === 0.f0
gs = gradient(() -> sum(bias(ip)), Flux.params(bias))
@test gs[bias.bias] == nothing
# Train w/o bias and make sure no convergence happens
# when only bias can be converged
bias = Conv((2, 2), 1=>3, bias = Flux.Zeros());
ip = zeros(Float32, 28,28,1,1)
op = zeros(Float32, 27,27,3,1) .+ 2.f0
opt = Descent()
for _ = 1:10^3
gs = gradient(params(bias)) do
Flux.mse(bias(ip), op)
end
Flux.Optimise.update!(opt, params(bias), gs)
end
@test Flux.mse(bias(ip), op) 4.f0
end
@testset "asymmetric padding" begin
r = ones(Float32, 28, 28, 1, 1)
m = Conv((3, 3), 1=>1, relu; pad=(0,1,1,2))
m.weight[:] .= 1.0
m.bias[:] .= 0.0
y_hat = m(r)[:,:,1,1]
@test size(y_hat) == (27, 29)
@test y_hat[1, 1] 6.0
@test y_hat[2, 2] 9.0
@test y_hat[end, 1] 4.0
@test y_hat[1, end] 3.0
@test y_hat[1, end-1] 6.0
@test y_hat[end, end] 2.0
end
@testset "Depthwise Conv" begin
r = zeros(Float32, 28, 28, 3, 5)
m1 = DepthwiseConv((2, 2), 3=>15)
@test size(m1(r), 3) == 15
m3 = DepthwiseConv((2, 3), 3=>9)
@test size(m3(r), 3) == 9
# Test that we cannot ask for non-integer multiplication factors
@test_throws AssertionError DepthwiseConv((2,2), 3=>10)
end
@testset "ConvTranspose" begin
x = zeros(Float32, 28, 28, 1, 1)
y = Conv((3,3), 1 => 1)(x)
x_hat = ConvTranspose((3, 3), 1 => 1)(y)
@test size(x_hat) == size(x)
m = ConvTranspose((3,3), 1=>1)
# Test that the gradient call does not throw: #900
@test gradient(()->sum(m(x)), params(m)) isa Flux.Zygote.Grads
end
@testset "CrossCor" begin
x = rand(Float32, 28, 28, 1, 1)
w = rand(2,2,1,1)
y = CrossCor(w, [0.0])
@test isapprox(sum(w .* x[1:2, 1:2, :, :]), y(x)[1, 1, 1, 1], rtol=1e-7)
r = zeros(Float32, 28, 28, 1, 5)
m = Chain(
CrossCor((2, 2), 1=>16, relu),
MaxPool((2,2)),
CrossCor((2, 2), 16=>8, relu),
MaxPool((2,2)),
x -> reshape(x, :, size(x, 4)),
Dense(288, 10), softmax)
@test size(m(r)) == (10, 5)
@test y(x) != Conv(w, [0.0])(x)
@test CrossCor(w[end:-1:1, end:-1:1, :, :], [0.0])(x) == Conv(w, [0.0])(x)
end
@testset "Conv with non quadratic window #700" begin
data = zeros(Float32, 7,7,1,1)
data[4,4,1,1] = 1
l = Conv((3,3), 1=>1)
expected = zeros(eltype(l.weight),5,5,1,1)
expected[2:end-1,2:end-1,1,1] = l.weight
@test expected l(data)
l = Conv((3,1), 1=>1)
expected = zeros(eltype(l.weight),5,7,1,1)
expected[2:end-1,4,1,1] = l.weight
@test expected l(data)
l = Conv((1,3), 1=>1)
expected = zeros(eltype(l.weight),7,5,1,1)
expected[4,2:end-1,1,1] = l.weight
@test expected l(data)
@test begin
# we test that the next expression does not throw
randn(Float32, 10,10,1,1) |> Conv((6,1), 1=>1, Flux.σ)
true
end
end
@testset "conv output dimensions" begin
m = Conv((3, 3), 3 => 16)
@test Flux.outdims(m, (10, 10)) == (8, 8)
m = Conv((3, 3), 3 => 16; stride = 2)
@test Flux.outdims(m, (5, 5)) == (2, 2)
m = Conv((3, 3), 3 => 16; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = Conv((3, 3), 3 => 16; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = ConvTranspose((3, 3), 3 => 16)
@test Flux.outdims(m, (8, 8)) == (10, 10)
m = ConvTranspose((3, 3), 3 => 16; stride = 2)
@test Flux.outdims(m, (2, 2)) == (5, 5)
m = ConvTranspose((3, 3), 3 => 16; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = ConvTranspose((3, 3), 3 => 16; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (4, 4)) == (5, 5)
m = DepthwiseConv((3, 3), 3 => 6)
@test Flux.outdims(m, (10, 10)) == (8, 8)
m = DepthwiseConv((3, 3), 3 => 6; stride = 2)
@test Flux.outdims(m, (5, 5)) == (2, 2)
m = DepthwiseConv((3, 3), 3 => 6; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = DepthwiseConv((3, 3), 3 => 6; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = CrossCor((3, 3), 3 => 16)
@test Flux.outdims(m, (10, 10)) == (8, 8)
m = CrossCor((3, 3), 3 => 16; stride = 2)
@test Flux.outdims(m, (5, 5)) == (2, 2)
m = CrossCor((3, 3), 3 => 16; stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = CrossCor((3, 3), 3 => 16; stride = 2, pad = 3, dilation = 2)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MaxPool((2, 2))
@test Flux.outdims(m, (10, 10)) == (5, 5)
m = MaxPool((2, 2); stride = 1)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MaxPool((2, 2); stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
m = MeanPool((2, 2))
@test Flux.outdims(m, (10, 10)) == (5, 5)
m = MeanPool((2, 2); stride = 1)
@test Flux.outdims(m, (5, 5)) == (4, 4)
m = MeanPool((2, 2); stride = 2, pad = 3)
@test Flux.outdims(m, (5, 5)) == (5, 5)
end
@testset "$ltype SamePad kernelsize $k" for ltype in (Conv, ConvTranspose, DepthwiseConv, CrossCor), k in ( (1,), (2,), (3,), (4,5), (6,7,8))
data = ones(Float32, (k .+ 3)..., 1,1)
l = ltype(k, 1=>1, pad=SamePad())
@test size(l(data)) == size(data)
l = ltype(k, 1=>1, pad=SamePad(), dilation = k 2)
@test size(l(data)) == size(data)
stride = 3
l = ltype(k, 1=>1, pad=SamePad(), stride = stride)
if ltype == ConvTranspose
@test size(l(data))[1:end-2] == stride .* size(data)[1:end-2] .- stride .+ 1
else
@test size(l(data))[1:end-2] == ceil.(Int, size(data)[1:end-2] ./ stride)
end
end
@testset "$ltype SamePad windowsize $k" for ltype in (MeanPool, MaxPool), k in ( (1,), (2,), (3,), (4,5), (6,7,8))
data = ones(Float32, (k .+ 3)..., 1,1)
l = ltype(k, pad=SamePad())
@test size(l(data))[1:end-2] == ceil.(Int, size(data)[1:end-2] ./ k)
end

View File

@ -0,0 +1,296 @@
using Flux, Test, Statistics
using Zygote: pullback
evalwgrad(f, x...) = pullback(f, x...)[1]
@testset "Dropout" begin
x = [1.,2.,3.]
@test x == Dropout(0.1)(x)
@test x == evalwgrad(Dropout(0), x)
@test zero(x) == evalwgrad(Dropout(1), x)
x = rand(100)
m = Dropout(0.9)
y = evalwgrad(m, x)
@test count(a->a==0, y) > 50
testmode!(m, true)
y = evalwgrad(m, x) # should override istraining
@test count(a->a==0, y) == 0
testmode!(m, false)
y = evalwgrad(m, x)
@test count(a->a==0, y) > 50
x = rand(Float32, 100)
m = Chain(Dense(100,100),
Dropout(0.9))
y = evalwgrad(m, x)
@test count(a->a == 0, y) > 50
testmode!(m, true)
y = evalwgrad(m, x) # should override istraining
@test count(a->a == 0, y) == 0
x = rand(100, 50)
m = Dropout(0.5, dims = 2)
y = m(x)
c = map(i->count(a->a==0, @view y[i, :]), 1:100)
@test minimum(c) == maximum(c)
m = Dropout(0.5, dims = 1)
y = m(x)
c = map(i->count(a->a==0, @view y[:, i]), 1:50)
@test minimum(c) == maximum(c)
end
@testset "BatchNorm" begin
let m = BatchNorm(2), x = [1.0 3.0 5.0;
2.0 4.0 6.0]
@test length(params(m)) == 2
@test m.β == [0, 0] # initβ(2)
@test m.γ == [1, 1] # initγ(2)
# initial m.σ is 1
# initial m.μ is 0
y = evalwgrad(m, x)
@test isapprox(y, [-1.22474 0 1.22474; -1.22474 0 1.22474], atol = 1.0e-5)
# julia> x
# 2×3 Array{Float64,2}:
# 1.0 3.0 5.0
# 2.0 4.0 6.0
#
# μ of batch will be
# (1. + 3. + 5.) / 3 = 3
# (2. + 4. + 6.) / 3 = 4
#
# ∴ update rule with momentum:
# .1 * 3 + 0 = .3
# .1 * 4 + 0 = .4
@test m.μ reshape([0.3, 0.4], 2, 1)
# julia> .1 .* var(x, dims = 2, corrected=false) .* (3 / 2).+ .9 .* [1., 1.]
# 2×1 Array{Float64,2}:
# 1.3
# 1.3
@test m.σ² .1 .* var(x, dims = 2, corrected=false) .* (3 / 2).+ .9 .* [1., 1.]
x = m(x)
@test isapprox(x[1], (1 .- 0.3) / sqrt(1.3), atol = 1.0e-5)
end
# with activation function
let m = BatchNorm(2, sigmoid), x = [1.0 3.0 5.0;
2.0 4.0 6.0]
y = m(x)
@test isapprox(y, sigmoid.((x .- m.μ) ./ sqrt.(m.σ² .+ m.ϵ)), atol = 1.0e-7)
end
let m = trainmode!(BatchNorm(2)), x = reshape(Float32.(1:6), 3, 2, 1)
y = reshape(permutedims(x, [2, 1, 3]), 2, :)
y = permutedims(reshape(m(y), 2, 3, 1), [2, 1, 3])
@test m(x) == y
end
let m = trainmode!(BatchNorm(2)), x = reshape(Float32.(1:12), 2, 3, 2, 1)
y = reshape(permutedims(x, [3, 1, 2, 4]), 2, :)
y = permutedims(reshape(m(y), 2, 2, 3, 1), [2, 3, 1, 4])
@test m(x) == y
end
let m = trainmode!(BatchNorm(2)), x = reshape(Float32.(1:24), 2, 2, 3, 2, 1)
y = reshape(permutedims(x, [4, 1, 2, 3, 5]), 2, :)
y = permutedims(reshape(m(y), 2, 2, 2, 3, 1), [2, 3, 4, 1, 5])
@test m(x) == y
end
let m = BatchNorm(32), x = randn(Float32, 416, 416, 32, 1);
m(x)
@test (@allocated m(x)) < 100_000_000
end
end
@testset "InstanceNorm" begin
# helper functions
expand_inst = (x, as) -> reshape(repeat(x, outer=[1, as[length(as)]]), as...)
# begin tests
let m = InstanceNorm(2), sizes = (3, 2, 2),
x = reshape(collect(1:prod(sizes)), sizes)
@test length(params(m)) == 2
x = Float64.(x)
@test m.β == [0, 0] # initβ(2)
@test m.γ == [1, 1] # initγ(2)
y = evalwgrad(m, x)
#julia> x
#[:, :, 1] =
# 1.0 4.0
# 2.0 5.0
# 3.0 6.0
#
#[:, :, 2] =
# 7.0 10.0
# 8.0 11.0
# 9.0 12.0
#
# μ will be
# (1. + 2. + 3.) / 3 = 2.
# (4. + 5. + 6.) / 3 = 5.
#
# (7. + 8. + 9.) / 3 = 8.
# (10. + 11. + 12.) / 3 = 11.
#
# ∴ update rule with momentum:
# (1. - .1) * 0 + .1 * (2. + 8.) / 2 = .5
# (1. - .1) * 0 + .1 * (5. + 11.) / 2 = .8
@test m.μ [0.5, 0.8]
# momentum * var * num_items / (num_items - 1) + (1 - momentum) * sigma_sq
# julia> reshape(mean(.1 .* var(x, dims = 1, corrected=false) .* (3 / 2), dims=3), :) .+ .9 .* 1.
# 2-element Array{Float64,1}:
# 1.
# 1.
@test m.σ² reshape(mean(.1 .* var(x, dims = 1, corrected=false) .* (3 / 2), dims=3), :) .+ .9 .* 1.
x = m(x)
@test isapprox(x[1], (1 - 0.5) / sqrt(1. + 1f-5), atol = 1.0e-5)
end
# with activation function
let m = InstanceNorm(2, sigmoid), sizes = (3, 2, 2),
x = reshape(collect(1:prod(sizes)), sizes)
x = Float64.(x)
affine_shape = collect(sizes)
affine_shape[1] = 1
y = m(x)
@test isapprox(y, sigmoid.((x .- expand_inst(m.μ, affine_shape)) ./ sqrt.(expand_inst(m.σ², affine_shape) .+ m.ϵ)), atol = 1.0e-7)
end
let m = trainmode!(InstanceNorm(2)), sizes = (2, 4, 1, 2, 3),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
y = reshape(permutedims(x, [3, 1, 2, 4, 5]), :, 2, 3)
y = reshape(m(y), sizes...)
@test m(x) == y
end
# check that μ, σ², and the output are the correct size for higher rank tensors
let m = InstanceNorm(2), sizes = (5, 5, 3, 4, 2, 6),
x = reshape(Float32.(collect(1:prod(sizes))), sizes)
y = evalwgrad(m, x)
@test size(m.μ) == (sizes[end - 1], )
@test size(m.σ²) == (sizes[end - 1], )
@test size(y) == sizes
end
# show that instance norm is equal to batch norm when channel and batch dims are squashed
let m_inorm = trainmode!(InstanceNorm(2)), m_bnorm = trainmode!(BatchNorm(12)), sizes = (5, 5, 3, 4, 2, 6),
x = reshape(Float32.(collect(1:prod(sizes))), sizes)
@test m_inorm(x) == reshape(m_bnorm(reshape(x, (sizes[1:end - 2]..., :, 1))), sizes)
end
let m = InstanceNorm(32), x = randn(Float32, 416, 416, 32, 1);
m(x)
@test (@allocated m(x)) < 100_000_000
end
end
if VERSION >= v"1.1"
@testset "GroupNorm" begin
# begin tests
squeeze(x) = dropdims(x, dims = tuple(findall(size(x) .== 1)...)) # To remove all singular dimensions
let m = GroupNorm(4,2), sizes = (3,4,2),
x = reshape(collect(1:prod(sizes)), sizes)
@test length(params(m)) == 2
x = Float64.(x)
@test m.β == [0, 0, 0, 0] # initβ(32)
@test m.γ == [1, 1, 1, 1] # initγ(32)
y = evalwgrad(m, x)
#julia> x
#[:, :, 1] =
# 1.0 4.0 7.0 10.0
# 2.0 5.0 8.0 11.0
# 3.0 6.0 9.0 12.0
#
#[:, :, 2] =
# 13.0 16.0 19.0 22.0
# 14.0 17.0 20.0 23.0
# 15.0 18.0 21.0 24.0
#
# μ will be
# (1. + 2. + 3. + 4. + 5. + 6.) / 6 = 3.5
# (7. + 8. + 9. + 10. + 11. + 12.) / 6 = 9.5
#
# (13. + 14. + 15. + 16. + 17. + 18.) / 6 = 15.5
# (19. + 20. + 21. + 22. + 23. + 24.) / 6 = 21.5
#
# μ =
# 3.5 15.5
# 9.5 21.5
#
# ∴ update rule with momentum:
# (1. - .1) * 0 + .1 * (3.5 + 15.5) / 2 = 0.95
# (1. - .1) * 0 + .1 * (9.5 + 21.5) / 2 = 1.55
@test m.μ [0.95, 1.55]
# julia> mean(var(reshape(x,3,2,2,2),dims=(1,2)).* .1,dims=2) .+ .9*1.
# 2-element Array{Float64,1}:
# 1.25
# 1.25
@test m.σ² mean(squeeze(var(reshape(x,3,2,2,2),dims=(1,2))).*.1,dims=2) .+ .9*1.
x = m(x)
@test isapprox(x[1], (1 - 0.95) / sqrt(1.25 + 1f-5), atol = 1.0e-5)
end
# with activation function
let m = GroupNorm(4,2, sigmoid), sizes = (3, 4, 2),
x = reshape(collect(1:prod(sizes)), sizes)
x = Float64.(x)
μ_affine_shape = ones(Int,length(sizes) + 1)
μ_affine_shape[end-1] = 2 # Number of groups
affine_shape = ones(Int,length(sizes) + 1)
affine_shape[end-2] = 2 # Channels per group
affine_shape[end-1] = 2 # Number of groups
affine_shape[1] = sizes[1]
affine_shape[end] = sizes[end]
og_shape = size(x)
y = m(x)
x_ = reshape(x,affine_shape...)
out = reshape(sigmoid.((x_ .- reshape(m.μ,μ_affine_shape...)) ./ sqrt.(reshape(m.σ²,μ_affine_shape...) .+ m.ϵ)),og_shape)
@test isapprox(y, out, atol = 1.0e-7)
end
let m = trainmode!(GroupNorm(2,2)), sizes = (2, 4, 1, 2, 3),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
y = reshape(permutedims(x, [3, 1, 2, 4, 5]), :, 2, 3)
y = reshape(m(y), sizes...)
@test m(x) == y
end
# check that μ, σ², and the output are the correct size for higher rank tensors
let m = GroupNorm(4,2), sizes = (5, 5, 3, 4, 4, 6),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
y = evalwgrad(m, x)
@test size(m.μ) == (m.G,1)
@test size(m.σ²) == (m.G,1)
@test size(y) == sizes
end
# show that group norm is the same as instance norm when the group size is the same as the number of channels
let IN = trainmode!(InstanceNorm(4)), GN = trainmode!(GroupNorm(4,4)), sizes = (2,2,3,4,5),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
@test IN(x) GN(x)
end
# show that group norm is the same as batch norm for a group of size 1 and batch of size 1
let BN = trainmode!(BatchNorm(4)), GN = trainmode!(GroupNorm(4,4)), sizes = (2,2,3,4,1),
x = Float32.(reshape(collect(1:prod(sizes)), sizes))
@test BN(x) GN(x)
end
end
end

144
test/layers/stateless.jl Normal file
View File

@ -0,0 +1,144 @@
using Test
using Flux: onehotbatch, mse, crossentropy, logitcrossentropy,
σ, binarycrossentropy, logitbinarycrossentropy, flatten,
xlogx, xlogy
const ϵ = 1e-7
@testset "xlogx & xlogy" begin
@test iszero(xlogx(0))
@test isnan(xlogx(NaN))
@test xlogx(2) 2.0 * log(2.0)
@inferred xlogx(2)
@inferred xlogx(0)
@test iszero(xlogy(0, 1))
@test isnan(xlogy(NaN, 1))
@test isnan(xlogy(1, NaN))
@test isnan(xlogy(NaN, NaN))
@test xlogy(2, 3) 2.0 * log(3.0)
@inferred xlogy(2, 3)
@inferred xlogy(0, 1)
end
@testset "losses" begin
# First, regression-style y's
y = [1, 1, 0, 0]
ŷ = [.9, .1, .1, .9]
@testset "mse" begin
@test mse(ŷ, y) (.1^2 + .9^2)/2
end
@testset "mae" begin
@test Flux.mae(ŷ, y) 1/2
end
@testset "huber_loss" begin
@test Flux.huber_loss(ŷ, y) 0.20500000000000002
end
y = [123.0,456.0,789.0]
ŷ = [345.0,332.0,789.0]
@testset "msle" begin
@test Flux.msle(ŷ, y) 0.38813985859136585
end
# Now onehot y's
y = onehotbatch([1, 1, 0, 0], 0:1)
ŷ = [.1 .9; .9 .1; .9 .1; .1 .9]'
v = log(.1 / .9)
logŷ = [v 0.0; 0.0 v; 0.0 v; v 0.0]'
lossvalue = 1.203972804325936
@testset "crossentropy" begin
@test crossentropy([0.1,0.0,0.9], [0.1,0.0,0.9]) crossentropy([0.1,0.9], [0.1,0.9])
@test crossentropy(ŷ, y) lossvalue
end
@testset "logitcrossentropy" begin
@test logitcrossentropy(logŷ, y) lossvalue
end
@testset "weighted_crossentropy" begin
@test crossentropy(ŷ, y, weight = ones(2)) lossvalue
@test crossentropy(ŷ, y, weight = [.5, .5]) lossvalue/2
@test crossentropy(ŷ, y, weight = [2, .5]) 1.5049660054074199
end
@testset "weighted_logitcrossentropy" begin
@test logitcrossentropy(logŷ, y, weight = ones(2)) lossvalue
@test logitcrossentropy(logŷ, y, weight = [.5, .5]) lossvalue/2
@test logitcrossentropy(logŷ, y, weight = [2, .5]) 1.5049660054074199
end
logŷ, y = randn(3), rand(3)
@testset "binarycrossentropy" begin
@test binarycrossentropy.(σ.(logŷ), y; ϵ=0) -y.*log.(σ.(logŷ)) - (1 .- y).*log.(1 .- σ.(logŷ))
@test binarycrossentropy.(σ.(logŷ), y) -y.*log.(σ.(logŷ) .+ eps.(σ.(logŷ))) - (1 .- y).*log.(1 .- σ.(logŷ) .+ eps.(σ.(logŷ)))
end
@testset "logitbinarycrossentropy" begin
@test logitbinarycrossentropy.(logŷ, y) binarycrossentropy.(σ.(logŷ), y; ϵ=0)
end
y = [1 2 3]
ŷ = [4.0 5.0 6.0]
@testset "kldivergence" begin
@test Flux.kldivergence([0.1,0.0,0.9], [0.1,0.0,0.9]) Flux.kldivergence([0.1,0.9], [0.1,0.9])
@test Flux.kldivergence(ŷ, y) -1.7661057888493457
@test Flux.kldivergence(y, y) 0
end
y = [1 2 3 4]
ŷ = [5.0 6.0 7.0 8.0]
@testset "hinge" begin
@test Flux.hinge(ŷ, y) 0
@test Flux.hinge(y, 0.5 .* y) 0.125
end
@testset "squared_hinge" begin
@test Flux.squared_hinge(ŷ, y) 0
@test Flux.squared_hinge(y, 0.5 .* y) 0.0625
end
y = [0.1 0.2 0.3]
ŷ = [0.4 0.5 0.6]
@testset "poisson" begin
@test Flux.poisson(ŷ, y) 0.6278353988097339
@test Flux.poisson(y, y) 0.5044459776946685
end
y = [1.0 0.5 0.3 2.4]
ŷ = [0 1.4 0.5 1.2]
@testset "dice_coeff_loss" begin
@test Flux.dice_coeff_loss(ŷ, y) 0.2799999999999999
@test Flux.dice_coeff_loss(y, y) 0.0
end
@testset "tversky_loss" begin
@test Flux.tversky_loss(ŷ, y) -0.06772009029345383
@test Flux.tversky_loss(ŷ, y, β = 0.8) -0.09490740740740744
@test Flux.tversky_loss(y, y) -0.5576923076923075
end
@testset "no spurious promotions" begin
for T in (Float32, Float64)
y = rand(T, 2)
ŷ = rand(T, 2)
for f in (mse, crossentropy, logitcrossentropy, Flux.kldivergence, Flux.hinge, Flux.poisson,
Flux.mae, Flux.huber_loss, Flux.msle, Flux.squared_hinge, Flux.dice_coeff_loss, Flux.tversky_loss)
fwd, back = Flux.pullback(f, , y)
@test fwd isa T
@test eltype(back(one(T))[1]) == T
end
end
end
end
@testset "helpers" begin
@testset "flatten" begin
x = randn(Float32, 10, 10, 3, 2)
@test size(flatten(x)) == (300, 2)
end
end

19
test/onehot.jl Normal file
View File

@ -0,0 +1,19 @@
using Flux:onecold
using Test
@testset "onecold" begin
a = [1, 2, 5, 3.]
A = [1 20 5; 2 7 6; 3 9 10; 2 1 14]
labels = ['A', 'B', 'C', 'D']
@test onecold(a) == 3
@test onecold(A) == [3, 1, 4]
@test onecold(a, labels) == 'C'
@test onecold(A, labels) == ['C', 'A', 'D']
end
@testset "onehotbatch indexing" begin
y = Flux.onehotbatch(ones(3), 1:10)
@test y[:,1] isa Flux.OneHotVector
@test y[:,:] isa Flux.OneHotMatrix
end

113
test/optimise.jl Normal file
View File

@ -0,0 +1,113 @@
using Flux.Optimise
using Flux.Optimise: runall
using Flux: Params, gradient
using Test
@testset "Optimise" begin
w = randn(10, 10)
@testset for opt in [ADAMW(), ADAGrad(0.1), AdaMax(), ADADelta(0.9), AMSGrad(),
NADAM(), RADAM(), Descent(0.1), ADAM(), Nesterov(), RMSProp(),
Momentum()]
w = randn(10, 10)
loss(x) = Flux.mse(w*x, w*x)
for t = 1: 10^5
θ = Params([w])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
Optimise.update!(opt, θ, θ̄)
end
@test loss(rand(10, 10)) < 0.01
end
end
@testset "Optimiser" begin
w = randn(10, 10)
@testset for Opt in [InvDecay, WeightDecay, ExpDecay]
w = randn(10, 10)
loss(x) = Flux.mse(w*x, w*x)
opt = Optimiser(Opt(), ADAM(0.001))
for t = 1:10^5
θ = Params([w])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
Optimise.update!(opt, θ, θ̄)
end
@test loss(rand(10, 10)) < 0.01
end
end
@testset "Training Loop" begin
i = 0
l = 1
Flux.train!(() -> (sleep(0.1); i += 1; l),
(),
Iterators.repeated((), 100),
Descent(),
cb = Flux.throttle(() -> (i > 3 && Flux.stop()), 1))
@test 3 < i < 50
# Test multiple callbacks
x = 0
fs = [() -> (), () -> x = 1]
cbs = runall(fs)
cbs()
@test x == 1
end
@testset "ExpDecay" begin
@testset "Sanity Check" begin
o = ExpDecay(0.2, 0.5, 1, 1e-3)
p = [0.0]
steps = 1:8
eta_expected = @. max(o.eta * 0.5 ^ steps, o.clip)
eta_actual = [Optimise.apply!(o, p, [1.0])[1] for _ in steps]
@test eta_actual == eta_expected
end
w = randn(10, 10)
o = ExpDecay(0.1, 0.1, 1000, 1e-4)
w1 = randn(10,10)
loss(x) = Flux.mse(w*x, w1*x)
flag = 1
decay_steps = []
for t = 1:10^5
prev_eta = o.eta
θ = Params([w1])
x = rand(10)
θ̄ = gradient(() -> loss(x), θ)
prev_grad = collect(θ̄[w1])
delta = Optimise.apply!(o, w1, θ̄[w1])
w1 .-= delta
new_eta = o.eta
if new_eta != prev_eta
push!(decay_steps, t)
end
array = fill(o.eta, size(prev_grad))
if array .* prev_grad != delta
flag = 0
end
end
@test flag == 1
# Test to check if decay happens at decay steps. Eta reaches clip value (1e-4) after 4000 steps (decay by 0.1 every 1000 steps starting at 0.1).
ground_truth = []
for i in 1:4
push!(ground_truth, 1000*i) # Expected decay steps for this example.
end
@test decay_steps == ground_truth
@test o.eta == o.clip
end
@testset "Clipping" begin
w = randn(10, 10)
loss(x) = sum(w * x)
θ = Params([w])
x = 1000 * randn(10)
= gradient(() -> loss(x), θ)[w]
w̄_value = Optimise.apply!(ClipValue(1.0), w, copy())
@test all(w̄_value .<= 1)
w̄_norm = Optimise.apply!(ClipNorm(1.0), w, copy())
@test norm(w̄_norm) <= 1
end

View File

@ -1,8 +1,46 @@
using Flux, Base.Test
using Flux
using Flux.Data
using Test
using Random, Statistics, LinearAlgebra
using IterTools: ncycle
@testset "Flux" begin
include("utils.jl")
include("tracker.jl")
Random.seed!(0)
@testset "Utils" begin
include("utils.jl")
end
@testset "Onehot" begin
include("onehot.jl")
end
@testset "Optimise" begin
include("optimise.jl")
end
@testset "Data" begin
include("data.jl")
end
@testset "Layers" begin
include("layers/basic.jl")
include("layers/normalisation.jl")
include("layers/stateless.jl")
include("layers/conv.jl")
end
@testset "CUDA" begin
if Flux.use_cuda[]
include("cuda/cuda.jl")
else
@warn "CUDA unavailable, not testing GPU support"
end
end
@static if VERSION >= v"1.4"
using Documenter
@testset "Docs" begin
DocMeta.setdocmeta!(Flux, :DocTestSetup, :(using Flux); recursive=true)
doctest(Flux)
end
end

View File

@ -1,30 +0,0 @@
using Flux.Tracker, Base.Test, NNlib
using Flux.Tracker: gradcheck
gradtest(f, xs::AbstractArray...) = gradcheck((xs...) -> sum(f(xs...)), xs...)
gradtest(f, dims...) = gradtest(f, rand.(dims)...)
@testset "Tracker" begin
@test gradtest((x, W, b) -> σ.(W*x .+ b), 5, (2,5), 2)
@test gradtest((x, W, b) -> σ.(W*x .+ b), (5,3), (2,5), 2)
@test gradtest(x -> sin.(sum(x, (2, 3))), (3,4,5))
@test gradtest(x -> NNlib.softmax(x).*(1:3), 3)
@test gradtest(x -> NNlib.softmax(x).*(1:3), (3,5))
@test gradtest(Flux.mse, rand(5,5), rand(5, 5))
@test gradtest(Flux.crossentropy, rand(5,5), rand(5, 5))
@test gradtest(x -> x', rand(5))
@test gradtest(vcat, rand(5), rand(3))
@test gradtest(vcat, rand(2,3), rand(3,3))
@test gradtest(rand(5)) do x
y = x.^2
2y + x
end
end

View File

@ -1,9 +1,13 @@
using Flux: throttle
using Flux
using Flux: throttle, nfan, glorot_uniform, glorot_normal, stack, unstack
using StatsBase: var
using Random
using Test
@testset "Throttle" begin
@testset "default behaviour" begin
a = []
f = throttle(()->push!(a, now()), 1, leading=true, trailing=false)
f = throttle(()->push!(a, time()), 1, leading=true, trailing=false)
f()
f()
f()
@ -13,7 +17,7 @@ using Flux: throttle
@testset "leading behaviour" begin
a = []
f = throttle(()->push!(a, now()), 1, leading=true, trailing=false)
f = throttle(()->push!(a, time()), 1, leading=true, trailing=false)
f()
@test length(a) == 1
f()
@ -25,7 +29,7 @@ using Flux: throttle
@testset "trailing behaviour" begin
a = []
f = throttle(()->push!(a, now()), 1, leading=false, trailing=true)
f = throttle(()->push!(a, time()), 1, leading=false, trailing=true)
f()
@test length(a) == 0
f()
@ -47,3 +51,70 @@ using Flux: throttle
@test a == [1, 3]
end
end
@testset "Initialization" begin
# Set random seed so that these tests don't fail randomly
Random.seed!(0)
@testset "Fan in/out" begin
@test nfan() == (1, 1) #For a constant
@test nfan(100) == (1, 100) #For vector
@test nfan(100, 200) == (200, 100) #For Dense layer
@test nfan(2, 30, 40) == (2 * 30, 2 * 40) #For 1D Conv layer
@test nfan(2, 3, 40, 50) == (2 * 3 * 40, 2 * 3 * 50) #For 2D Conv layer
@test nfan(2, 3, 4, 50, 60) == (2 * 3 * 4 * 50, 2 * 3 * 4 * 60) #For 3D Conv layer
end
@testset "glorot" begin
# glorot_uniform and glorot_normal should both yield a kernel with
# variance ≈ 2/(fan_in + fan_out)
for dims [(1000,), (100, 100), (100, 400), (2, 3, 32, 64), (2, 3, 4, 32, 64)]
for init [glorot_uniform, glorot_normal]
v = init(dims...)
fan_in, fan_out = nfan(dims...)
σ2 = 2 / (fan_in + fan_out)
@test 0.9σ2 < var(v) < 1.1σ2
end
end
end
end
@testset "Params" begin
m = Dense(10, 5)
@test size.(params(m)) == [(5, 10), (5,)]
m = RNN(10, 5)
@test size.(params(m)) == [(5, 10), (5, 5), (5,), (5,)]
# Layer duplicated in same chain, params just once pls.
c = Chain(m, m)
@test size.(params(c)) == [(5, 10), (5, 5), (5,), (5,)]
# Self-referential array. Just want params, no stack overflow pls.
r = Any[nothing,m]
r[1] = r
@test size.(params(r)) == [(5, 10), (5, 5), (5,), (5,)]
end
@testset "Basic Stacking" begin
x = randn(3,3)
stacked = stack([x, x], 2)
@test size(stacked) == (3,2,3)
end
@testset "Precision" begin
m = Chain(Dense(10, 5, relu), Dense(5, 2))
x = rand(10)
@test eltype(m[1].W) == Float32
@test eltype(m(x)) == Float32
@test eltype(f64(m)(x)) == Float64
@test eltype(f64(m)[1].W) == Float64
@test eltype(f32(f64(m))[1].W) == Float32
end
@testset "Stacking" begin
stacked_array=[ 8 9 3 5; 9 6 6 9; 9 1 7 2; 7 4 10 6 ]
unstacked_array=[[8, 9, 9, 7], [9, 6, 1, 4], [3, 6, 7, 10], [5, 9, 2, 6]]
@test unstack(stacked_array, 2) == unstacked_array
@test stack(unstacked_array, 2) == stacked_array
@test stack(unstack(stacked_array, 1), 1) == stacked_array
end