![]() 1141: Speedup matmul of CuMatrix and OneHotMatrix r=CarloLucibello a=AStupidBear This solves #189. ```julia julia> using Flux julia> using Flux: CuArrays julia> A = zeros(300, 10000) |> gpu; julia> B = Flux.onehotbatch(rand(1:10000, 256), 1:10000) |> gpu; julia> A * B; CuArrays.@time A * B; ┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)` └ @ GPUArrays ~/shared/.julia/packages/GPUArrays/OXvxB/src/host/indexing.jl:43 0.002824 seconds (951 CPU allocations: 38.156 KiB) (2 GPU allocations: 301.000 KiB, 2.32% gc time of which 46.42% spent allocating) julia> import Base: * julia> A::AbstractMatrix * B::Flux.OneHotMatrix = @inbounds A[:, map(x->x.ix, B.data)] * (generic function with 522 methods) julia> A * B; CuArrays.@time A * B; 0.000343 seconds (169 CPU allocations: 5.000 KiB) (2 GPU allocations: 301.000 KiB, 15.53% gc time of which 65.97% spent allocating) ``` Co-authored-by: Yao Lu <luyaocns@gmail.com> |
||
---|---|---|
.github | ||
docs | ||
paper | ||
src | ||
test | ||
.gitattributes | ||
.gitignore | ||
.gitlab-ci.yml | ||
.travis.yml | ||
CITATION.bib | ||
LICENSE.md | ||
Manifest.toml | ||
NEWS.md | ||
Project.toml | ||
README.md | ||
bors.toml |
README.md
Flux is an elegant approach to machine learning. It's a 100% pure-Julia stack, and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable.
] add Flux
See the documentation or the model zoo for examples.
If you use Flux in your research, please cite our work.