This commit is contained in:
Mike Innes 2016-04-01 22:11:42 +01:00 committed by Mike J Innes
parent 484d9f45ab
commit e5856d8b27
16 changed files with 512 additions and 3 deletions

View File

@ -1,3 +1,27 @@
# Flux
# Флукс
[![Build Status](https://travis-ci.org/one-more-minute/Flux.jl.svg?branch=master)](https://travis-ci.org/one-more-minute/Flux.jl)
## What?
Flux is an experimental machine perception / ANN library for Julia. It's designed to make experimenting with novel layer types and architectures really fast, without sacrificing runtime speed.
## Why?
Flux has a few key differences from other libraries:
* Flux's [graph-based DSL](https://github.com/MikeInnes/Flow.jl), which provides optimisations and automatic differentiation, is very tightly integrated with the language. This means nice syntax for your equations (`σ(W*x+b)` anyone?) and no unwieldy `compile` steps.
* The graph DSL directly is used to represent models (not just computations), so custom architectures  and in particular, recurrent models  are easy to express.
* Those fancy features are completely optional. You can implement functionality in a Torch-like fashion if you wish, since layers are simply objects that satisfy a small interface.
* Flux is written in [Julia](http://julialang.org), which means there's no "dropping down" to C. It's Julia all the way down, and you can prototype both high-level architectures and high-performance GPU kernels from the same language. This also makes the library itself very easy to understand and extend.
Future work will also include:
* Integration with other backends, so that models can be described using Flux and run using (say) TensorFlow.
* Carrying out runtime optimisations of the graph, in particular to handle small matrices efficiently.
## How?
See [the design docs](design.md).
## Is it any good?
Yes.

View File

@ -1 +1,2 @@
julia 0.4
Flow

53
design.md Normal file
View File

@ -0,0 +1,53 @@
# Flux
Flux tries to provide the best of both worlds from do-it-yourself frameworks like Torch/NN and do-it-for-you frameworks like Keras. It has much in common with, and much different from, both.
At the core is the abstract type `Model`, which is analogous to Torch's `module` essentially, it's a function which (a) has some internal state and (b) can be differentiated and update its state accordingly.
```julia
model(x) -> y # Map input -> output (e.g. image -> classification)
back!(model, ∇) # Back-propagate and accumulate errors for the parameters
update!(model) # Update the model's parameters using the accumulated errors
```
That's it! The `Model` abstraction extends upwards in a nice way  that is, you can stack a bunch of models together like pancakes and you just get a more powerful `Model` back, which can then be reused in the same way.
(It extends downwards, too. Elementary functions like `exp` or `*` a really just `Model`s with zero parameters. Turtles all the way up, turtles all the way down.)
So far this is all very Torch-esque. The downside of Torch's DIY philosophy is that you have to take care of managing memory, pre-allocating temporaries, differentiation and so on yourself. In Flux, however, we can define a type like this:
```julia
@flux type Perceptron <: Model
W; b
x -> σ( W * x + b )
end
Perceptron(in::Integer, out::Integer) =
Perceptron(randn(out, in), randn(out))
```
We've defined a simple Julia type with a couple of parameters, and added a convenient constructor in the usual way. We also defined what should happen when the model is called with an input vector `x`.
The difference is that the `back!` and `update!` functions are now defined for `Perceptron` objects. Flux differentiated the `σ( W * x + b )` expression automatically and figured out handling of temporaries and so on. That's forty or so lines of code that you *could* have written yourself, but it's much nicer not to have to  and the benefits multiply with more complex layers.
Like symbolic frameworks, then, we aim for a very declarative way of defining new layers and architectures. The key difference is that we don't require *all* computation to happen in an opaque, custom runtime. In contrast, Flux simply writes a little code for you and then gets out of the way, making it very easy to understand and extend.
## Recurrence
What really sets Flux apart is how easy it makes it to compose models together in some arbitrary graph. For one thing this makes it very easy to express network architecture; splits, merges, networks running in parallel and so on. But it's also great for recurrence:
```julia
@flux type Recurrent
Wxh; Whh; Bh
Wxy; Why; By
function (x)
hidden = σ( Wxh*x + Whh*hidden + Bh )
σ( Wxy*x + Why*hidden + By )
end
end
```
Above, `hidden` is a variable that depends on itself; it creates a cycle in the network. Flux can resolve this cycle by unrolling the network in time.
[recurrence is still very preliminary, I haven't worked out the details of the design yet.]

13
examples/MNIST.jl Normal file
View File

@ -0,0 +1,13 @@
using Flux, MNIST
const data = collect(zip([trainfeatures(i) for i = 1:60_000],
[onehot(trainlabel(i), 1:10) for i = 1:60_000]))
const train = data[1:50_000]
const test = data[50_001:60_000]
const m = Sequence(
Input(784),
Dense(30), Sigmoid(),
Dense(10), Sigmoid())
@time Flux.train!(m, train, test, epoch = 30)

57
examples/sketch.jl Normal file
View File

@ -0,0 +1,57 @@
# Simple Perceptron Layer
@flux type Simple
weight
bias
x -> σ( weight*x + bias )
end
Simple(nx::Integer, ny::Integer; init = randn) =
Simple(init(nx, ny), init(ny))
# Time Delay Node
type Delay
n::Int
next
end
# feed(l::Delay, x) = ...
# back(l::Delay, y) = ...
# Simple Recurrent
@flux type RecurrentU
Wxh; Whh; Bh
Wxy; Why; By
function feed(x, hidden)
hidden = σ( Wxh*x + Whh*hidden + Bh )
y = σ( Wxy*x + Why*hidden + By )
y, hidden
end
end
Recurrent(nx, ny, nh; init = randn) =
Recurrent(init(nx, nh), init(nh, nh), init(nh),
init(nx, ny), init(nh, ny), init(ny))
@flux type Looped{T}
delay::Delay
layer::T
function (x)
y, hidden = layer(x, delay(hidden))
return y
end
end
type Recurrent
layer::Looped{RecurrentU}
end
Recurrent(nx, ny, nh; init = randn, delay = 10) =
Looped(Delay(delay, init(nh)), RecurrentU(nx, ny, nh))
@forward Recurrent.layer feed

View File

@ -1,5 +1,28 @@
module Flux
# package code goes here
using MacroTools, Lazy, Flow
# Zero Flux Given
export Model, back!, update!
abstract Model
abstract Activation <: Model
back!(m::Model, ) = error("Backprop not implemented for $(typeof(m))")
update!(m::Model, η) = m
include("capacitor.jl")
include("compiler/diff.jl")
include("compiler/loop.jl")
include("compiler/code.jl")
include("cost.jl")
include("activation.jl")
include("layers/input.jl")
include("layers/dense.jl")
include("layers/sequence.jl")
include("utils.jl")
end # module

28
src/activation.jl Normal file
View File

@ -0,0 +1,28 @@
export Sigmoid
σ(x) = 1/(1+exp(-x))
σ(x) = σ(x)*(1-σ(x))
∇₁(::typeof(σ)) = σ
type Sigmoid <: Activation
in::Vector{Float32}
out::Vector{Float32}
∇in::Vector{Float32}
end
Sigmoid(size::Integer) = Sigmoid(zeros(size), zeros(size), zeros(size))
function (l::Sigmoid)(x)
l.in = x
map!(σ, l.out, x)
end
function back!(l::Sigmoid, )
map!(σ, l.∇in, l.in)
map!(*, l.∇in, l.∇in, )
end
shape(l::Sigmoid) = length(l.in)
Sigmoid() = Init(in -> Sigmoid(in[1]))

8
src/capacitor.jl Normal file
View File

@ -0,0 +1,8 @@
type Capacitor{T}
Δs::Vector{T}
end
type Patch{T}
η::Float32
Δs::Capacitor{T}
end

84
src/compiler/code.jl Normal file
View File

@ -0,0 +1,84 @@
function process_func(ex, params)
@capture(shortdef(ex), (args__,) -> body_)
body = il(graphm(body))
body = map(x -> x in params ? :(self.$x) : x, body)
return args, body
end
function build_type(T, params)
quote
type $T
$(params...)
$([symbol("Δ", s) for s in params]...)
end
$T($(params...)) = $T($(params...),
$((:(zeros($p)) for p in params)...))
end
end
function build_forward(body, args)
body = cut_forward(body, args)
cse(body)
end
function build_backward(body, x, params)
Δs, Δloops = cut_backward(body, [x])
back = IVertex{Any}(Flow.Do())
for param in params
haskey(Δs, :(self.$param)) || continue
k = symbol("Δ", param)
ksym = Expr(:quote, k)
ex = Δs[:(self.$param)]
for Δloop in Δloops
ex = addΔ(ex, get(Δloop, :(self.$param), vertex(0)))
end
thread!(back, @v(setfield!(:self, ksym, :(self.$k) + ex)))
end
ex = Δs[x]
for Δloop in Δloops
ex = addΔ(ex, get(Δloop, x, vertex(0)))
end
thread!(back, @flow(tuple($ex)))
cse(back)
end
function build_update(T, params)
updates = []
for p in params
Δp = symbol("Δ", p)
push!(updates, :(self.$p += self.$Δp; fill!(self.$Δp, 0)))
end
:(update!(self::$T) = $(updates...))
end
function process_type(ex)
@capture(ex, type T_ fs__ end)
@destruct [params = true || [],
funcs = false || []] = groupby(x->isa(x, Symbol), fs)
@assert length(funcs) == 1
args, body = process_func(funcs[1], params)
@assert length(args) == 1
quote
$(build_type(T, params))
(self::$T)($(args...),) = $(syntax(build_forward(body, args)))
back!(self::$T, Δ, $(args...)) = $(syntax(build_backward(body, args[1], params)))
$(build_update(T, params))
end |> longdef
end
# process_type(:(type Sigmoid
# W
# b
# bp
# x -> σ(W*x+b)
# end)) |> prettify
process_type(:(type Recurrent
Wxh; Whh; Bh
Why; By
function (x)
hidden = σ( Wxh*x + Whh*Delay(hidden) + Bh )
y = σ( Why*hidden + By )
end
end)) |> prettify

34
src/compiler/diff.jl Normal file
View File

@ -0,0 +1,34 @@
import Flow: isconstant, il, dl, cse, prewalk, graphm, syntax, @v
vertex(a...) = IVertex{Any}(a...)
addΔ(a, b) = vertex(:+, a, b)
# Special case a couple of operators to clean up output code
const symbolic = Dict()
symbolic[:+] = (Δ, args...) -> map(_->Δ, args)
function ∇v(v::Vertex, Δ)
haskey(symbolic, value(v)) && return symbolic[value(v)](Δ, inputs(v)...)
Δ = vertex(:back!, vertex(value(v)), Δ, inputs(v)...)
map(i -> @flow(getindex($Δ, $i)), 1:Flow.nin(v))
end
function invert(v::IVertex, Δ = vertex(), out = d())
@assert !iscyclic(v)
if isconstant(v)
@assert !haskey(out, value(v))
out[value(v)] = il(Δ)
else
Δs = ∇v(v, Δ)
for (v, Δ′) in zip(inputs(v), Δs)
invert(v, Δ′, out)
end
end
return out
end
back!(::typeof(+), Δ, args...) = map(_ -> Δ, args)
back!(::typeof(*), Δ, a, b) = Δ*b', Δ*a'

59
src/compiler/loop.jl Normal file
View File

@ -0,0 +1,59 @@
function delays(v::IVertex)
ds = []
Flow.prefor(v) do w
value(w) == :Delay &&
push!(ds, w)
end
return ds
end
function cut(v::IVertex, f = _ -> il(@flow(last(self.delay))))
prewalk(v) do v
value(v) == :Delay ? f(v) : v
end
end
replaceall(d::Dict, args...) = Dict(k => replace(v, args...) for (k, v) in d)
# Create the forward function; a single delay node becomes an
# input and an output node.
function cut_forward(v::IVertex, params, ds = delays(v))
pushes = map(x->vertex(:push!, vertex(:(self.delay)), x[1], map(vertex, params)...), ds)
isempty(pushes) && return v
@assert length(pushes) == 1
v = vertex(Flow.Do(), pushes..., v)
cut(v)
end
# Given a delay node, give the parameter gradients with respect to
# the node and a function which will propagate gradients around
# the loop.
function invertloop(v::IVertex, params)
@gensym input
v = cut(v[1], v -> vertex(input))
Δs = invert(v, @flow(Δloop))
Δs = replaceall(Δs, vertex(input), il(@flow(last(self.delay))))
Δs, :((Δ, $input, $(params...)) -> $(syntax(cse(Δs[input]))))
end
# Returns:
# Parameter gradients with respect to the function
# Parameter gradients with respect to each delay node
function cut_backward(v::IVertex, params, ds = delays(v))
isempty(ds) && return invert(v), []
@assert length(ds) == 1
@gensym input
Δs = invert(cut(v, _ -> vertex(input)))
Δs = replaceall(Δs, vertex(input), il(@flow(last(self.delay))))
Δloop, ∇loop = invertloop(ds[1], params)
Δh = vertex(:back!, vertex(:(self.delay)), Δs[input], vertex(∇loop))
Δloop = replaceall(Δloop, vertex(:Δloop), Δh)
Δs, [Δloop]
end
# g = il(@flow begin
# hidden = σ( Wxh*x + Whh*Delay(hidden) + bh )
# y = σ( Why*hidden + by )
# end)
# cut_backward(g, [:x])[1]

8
src/cost.jl Normal file
View File

@ -0,0 +1,8 @@
export mse, mse!
function mse!(, pred, target)
map!(-, , pred, target)
sumabs2()/2
end
mse(pred, target) = mse(similar(pred), pred, target)

41
src/layers/dense.jl Normal file
View File

@ -0,0 +1,41 @@
export Dense
type Dense <: Model
W::Matrix{Float32}
b::Vector{Float32}
∇W::Matrix{Float32}
∇b::Vector{Float32}
in::Vector{Float32}
out::Vector{Float32}
∇in::Vector{Float32}
end
Dense(in::Integer, out::Integer) =
Dense(randn(out, in), randn(out),
zeros(out, in), zeros(out),
zeros(in), zeros(out), zeros(in))
Dense(out::Integer) = Init(in -> Dense(in[1], out))
function (l::Dense)(x)
l.in = x
A_mul_B!(l.out, l.W, x)
map!(+, l.out, l.out, l.b)
end
function back!(l::Dense, )
map!(+, l.∇b, l.∇b, )
# l.∇W += ∇ * l.in'
BLAS.gemm!('N', 'T', eltype()(1), , l.in, eltype()(1), l.∇W)
At_mul_B!(l.∇in, l.W, )
end
function update!(l::Dense, η)
map!((x, ∇x) -> x - η*∇x, l.W, l.W, l.∇W)
map!((x, ∇x) -> x - η*∇x, l.b, l.b, l.∇b)
fill!(l.∇W, 0)
fill!(l.∇b, 0)
end
shape(d::Dense) = size(d.b)

26
src/layers/input.jl Normal file
View File

@ -0,0 +1,26 @@
export Input
typealias Dims{N} NTuple{N,Int}
dims(d::Dims) = d
dims(i...) = (i...,)
type Input{N} <: Model
dims::Dims{N}
end
Input(i) = Input(dims(i))
(::Input)(x) = x
back!(::Input, ) =
shape(i::Input) = i.dims
# Initialise placeholder
type Init{F}
f::F
end
(f::Init)(args...) = f.f(args...)

23
src/layers/sequence.jl Normal file
View File

@ -0,0 +1,23 @@
export Sequence
type Sequence
layers::Vector{Model}
end
Sequence() = Sequence([])
@forward Sequence.layers Base.getindex, Base.first, Base.last
Base.push!(s::Sequence, m::Model) = push!(s.layers, m)
Base.push!(s::Sequence, f::Init) = push!(s, f(shape(last(s))))
function Sequence(ms...)
s = Sequence()
foreach(m -> push!(s, m), ms)
return s
end
(s::Sequence)(x) = foldl((x, m) -> m(x), x, s.layers)
back!(s::Sequence, ) = foldr((m, ) -> back!(m, ), , s.layers)
update!(s::Sequence, η) = foreach(l -> update!(l, η), s.layers)

27
src/utils.jl Normal file
View File

@ -0,0 +1,27 @@
export onehot, onecold
onehot(label, labels) = [i == label for i in labels]
onecold(pred, labels = 1:length(pred)) = labels[findfirst(pred, maximum(pred))]
function train!(m::Model, train, test = []; epoch = 1, batch = 10, η = 0.1)
i = 0
= zeros(length(train[1][2]))
for _ in 1:epoch
for (x, y) in shuffle!(train)
i += 1
err = mse!(, m(x), y)
back!(m, )
i % batch == 0 && update!(m, η/batch)
end
@show accuracy(m, test)
end
return m
end
function accuracy(m::Model, data)
correct = 0
for (x, y) in data
onecold(m(x)) == onecold(y) && (correct += 1)
end
return correct/length(data)
end