Merge #1141

1141: Speedup matmul of CuMatrix and OneHotMatrix r=CarloLucibello a=AStupidBear This solves #189. ```julia julia> using Flux julia> using Flux: CuArrays julia> A = zeros(300, 10000) |> gpu; julia> B = Flux.onehotbatch(rand(1:10000, 256), 1:10000) |> gpu; julia> A * B; CuArrays.@time A * B; ┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)` └ @ GPUArrays ~/shared/.julia/packages/GPUArrays/OXvxB/src/host/indexing.jl:43 0.002824 seconds (951 CPU allocations: 38.156 KiB) (2 GPU allocations: 301.000 KiB, 2.32% gc time of which 46.42% spent allocating) julia> import Base: * julia> A::AbstractMatrix * B::Flux.OneHotMatrix = @inbounds A[:, map(x->x.ix, B.data)] * (generic function with 522 methods) julia> A * B; CuArrays.@time A * B; 0.000343 seconds (169 CPU allocations: 5.000 KiB) (2 GPU allocations: 301.000 KiB, 15.53% gc time of which 65.97% spent allocating) ``` Co-authored-by: Yao Lu <luyaocns@gmail.com>
2020-06-06 17:00:01 +00:00 · 2020-06-06 17:00:01 +00:00 · 9ebbe8cb4c
commit 9ebbe8cb4c
parent 792a1c54f8 5a9eb7411a
1 changed files with 2 additions and 1 deletions
--- a/src/onehot.jl
+++ b/src/onehot.jl
@ -27,7 +27,8 @@ Base.getindex(xs::OneHotMatrix, ::Colon, ::Colon) = OneHotMatrix(xs.height, copy

 Base.getindex(xs::OneHotMatrix, i::Integer, ::Colon) = map(x -> x[i], xs.data)

-A::AbstractMatrix * B::OneHotMatrix = A[:, map(x->x.ix, B.data)]
+# remove workaround when https://github.com/JuliaGPU/CuArrays.jl/issues/676 is fixed
+A::AbstractMatrix * B::OneHotMatrix = A[:, cpu(map(x->x.ix, B.data))]

 Base.hcat(x::OneHotVector, xs::OneHotVector...) = OneHotMatrix(length(x), [x, xs...])