936: Avoid unnecessary conversion r=MikeInnes a=baggepinnen
This initialization works for both cpu and gpu
Co-authored-by: Fredrik Bagge Carlson <baggepinnen@gmail.com>
932: Travis: test on 1.0 r=MikeInnes a=MikeInnes
Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
Co-authored-by: Mike Innes <mike.j.innes@gmail.com>
907: Change `gate` function to `view` instead of copy r=MikeInnes a=janEbert
This speeds up code with large inputs by quite a lot. I only added it to the function accepting an `AbstractVector` as input as copying matrices may be faster than viewing them due to caching (they are sliced per row so will the data will not necessarily have a low stride).
Co-authored-by: janEbert <janpublicebert@posteo.net>
898: Fix problem in crossentropy breaking GPU compilation r=MikeInnes a=kshyatt
Trying to run this simple example
```
using Flux, CuArrays
using Flux: crossentropy
model = Chain(
Dense(728, 128, σ),
LSTM(128, 256),
LSTM(256, 128),
Dense(128, 10),
softmax) |> gpu
data = [rand(728) for i in 1:100];
out = [rand(10) for i in 1:100];
loss(x, y) = crossentropy(model(x), y);
Flux.train!(loss, params(model), zip(gpu.(data), gpu.(out)), ADAM())
```
Old version of `crossentropy`:
```
ERROR: GPU compilation of #23(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}) failed
KernelError: passing and using non-bitstype argument
Argument 4 to your kernel function is of type Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel. .args is of type Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}} which is not isbits.
.1 is of type Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}} which is not isbits.
.x is of type Array{Float32,1} which is not isbits.
Stacktrace:
[1] check_invocation(::CUDAnative.CompilerJob, ::LLVM.Function) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/validation.jl:70
[2] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/driver.jl:187 [inlined]
[3] macro expansion at /mnt/home/khyatt/.julia/packages/TimerOutputs/7zSea/src/TimerOutput.jl:216 [inlined]
[4] #codegen#136(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.codegen), ::Symbol, ::CUDAnative.CompilerJob) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/driver.jl:186
[5] #codegen at ./none:0 [inlined]
[6] #compile#135(::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::typeof(CUDAnative.compile), ::Symbol, ::CUDAnative.CompilerJob) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/compiler/driver.jl:47
[7] #compile#134 at ./none:0 [inlined]
[8] #compile at ./none:0 [inlined] (repeats 2 times)
[9] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:389 [inlined]
[10] #cufunction#176(::Nothing, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CUDAnative.cufunction), ::GPUArrays.var"#23#24", ::Type{Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}}}}) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:357
[11] cufunction(::Function, ::Type) at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:357
[12] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:174 [inlined]
[13] macro expansion at ./gcutils.jl:91 [inlined]
[14] macro expansion at /mnt/home/khyatt/.julia/dev/CUDAnative/src/execution.jl:171 [inlined]
[15] _gpu_call(::CuArrays.CuArrayBackend, ::Function, ::CuArray{Float32,1}, ::Tuple{CuArray{Float32,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(*),Tuple{Base.Broadcast.Extruded{Array{Float32,1},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Broadcasted{Base.Broadcast.ArrayStyle{CuArray},Nothing,typeof(conj),Tuple{Base.Broadcast.Extruded{CuArray{Float32,1},Tuple{Bool},Tuple{Int64}}}}}}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /mnt/home/khyatt/.julia/dev/CuArrays/src/gpuarray_interface.jl:60
[16] gpu_call at /mnt/home/khyatt/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:151 [inlined]
[17] gpu_call at /mnt/home/khyatt/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:128 [inlined]
[18] copyto! at /mnt/home/khyatt/.julia/dev/GPUArrays/src/broadcast.jl:48 [inlined]
[19] copyto! at ./broadcast.jl:863 [inlined]
[20] copy at ./broadcast.jl:839 [inlined]
[21] materialize at ./broadcast.jl:819 [inlined]
[22] (::Zygote.var"#1310#1311"{CuArray{Float32,1},CuArray{Float32,1}})(::Array{Float32,1}) at /mnt/home/khyatt/.julia/dev/Zygote/src/lib/broadcast.jl:68
```
New version:
```
julia> Flux.train!(loss, params(model), zip(gpu.(data), gpu.(out)), ADAM())
julia> # everyone finished happily and went on with their lives
```
Co-authored-by: Katharine Hyatt <khyatt@flatironinstitute.org>