init

2016-04-01 22:11:42 +01:00 · 2016-04-01 22:11:42 +01:00 · e5856d8b27
commit e5856d8b27
parent 484d9f45ab
16 changed files with 512 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,27 @@
-# Flux
+# Флукс

-[![Build Status](https://travis-ci.org/one-more-minute/Flux.jl.svg?branch=master)](https://travis-ci.org/one-more-minute/Flux.jl)
+## What?
+
+Flux is an experimental machine perception / ANN library for Julia. It's designed to make experimenting with novel layer types and architectures really fast, without sacrificing runtime speed.
+
+## Why?
+
+Flux has a few key differences from other libraries:
+
+* Flux's [graph-based DSL](https://github.com/MikeInnes/Flow.jl), which provides optimisations and automatic differentiation, is very tightly integrated with the language. This means nice syntax for your equations (`σ(W*x+b)` anyone?) and no unwieldy `compile` steps.
+* The graph DSL directly is used to represent models (not just computations), so custom architectures – and in particular, recurrent models – are easy to express.
+* Those fancy features are completely optional. You can implement functionality in a Torch-like fashion if you wish, since layers are simply objects that satisfy a small interface.
+* Flux is written in [Julia](http://julialang.org), which means there's no "dropping down" to C. It's Julia all the way down, and you can prototype both high-level architectures and high-performance GPU kernels from the same language. This also makes the library itself very easy to understand and extend.
+
+Future work will also include:
+
+* Integration with other backends, so that models can be described using Flux and run using (say) TensorFlow.
+* Carrying out runtime optimisations of the graph, in particular to handle small matrices efficiently.
+
+## How?
+
+See [the design docs](design.md).
+
+## Is it any good?
+
+Yes.
--- a/1
+++ b/1
@ -1 +1,2 @@
 julia 0.4
+Flow
--- a/design.md
+++ b/design.md
@ -0,0 +1,53 @@
+# Flux
+
+Flux tries to provide the best of both worlds from do-it-yourself frameworks like Torch/NN and do-it-for-you frameworks like Keras. It has much in common with, and much different from, both.
+
+At the core is the abstract type `Model`, which is analogous to Torch's `module` – essentially, it's a function which (a) has some internal state and (b) can be differentiated and update its state accordingly.
+
+```julia
+model(x) -> y   # Map input -> output (e.g. image -> classification)
+back!(model, ∇) # Back-propagate and accumulate errors for the parameters
+update!(model)  # Update the model's parameters using the accumulated errors
+```
+
+That's it! The `Model` abstraction extends upwards in a nice way – that is, you can stack a bunch of models together like pancakes and you just get a more powerful `Model` back, which can then be reused in the same way.
+
+(It extends downwards, too. Elementary functions like `exp` or `*` a really just `Model`s with zero parameters. Turtles all the way up, turtles all the way down.)
+
+So far this is all very Torch-esque. The downside of Torch's DIY philosophy is that you have to take care of managing memory, pre-allocating temporaries, differentiation and so on yourself. In Flux, however, we can define a type like this:
+
+```julia
+@flux type Perceptron <: Model
+  W; b
+  x -> σ( W * x + b )
+end
+
+Perceptron(in::Integer, out::Integer) =
+  Perceptron(randn(out, in), randn(out))
+```
+
+We've defined a simple Julia type with a couple of parameters, and added a convenient constructor in the usual way. We also defined what should happen when the model is called with an input vector `x`.
+
+The difference is that the `back!` and `update!` functions are now defined for `Perceptron` objects. Flux differentiated the `σ( W * x + b )` expression automatically and figured out handling of temporaries and so on. That's forty or so lines of code that you *could* have written yourself, but it's much nicer not to have to – and the benefits multiply with more complex layers.
+
+Like symbolic frameworks, then, we aim for a very declarative way of defining new layers and architectures. The key difference is that we don't require *all* computation to happen in an opaque, custom runtime. In contrast, Flux simply writes a little code for you and then gets out of the way, making it very easy to understand and extend.
+
+## Recurrence
+
+What really sets Flux apart is how easy it makes it to compose models together in some arbitrary graph. For one thing this makes it very easy to express network architecture; splits, merges, networks running in parallel and so on. But it's also great for recurrence:
+
+```julia
+@flux type Recurrent
+  Wxh; Whh; Bh
+  Wxy; Why; By
+
+  function (x)
+    hidden = σ( Wxh*x + Whh*hidden + Bh )
+    σ( Wxy*x + Why*hidden′ + By )
+  end
+end
+```
+
+Above, `hidden` is a variable that depends on itself; it creates a cycle in the network. Flux can resolve this cycle by unrolling the network in time.
+
+[recurrence is still very preliminary, I haven't worked out the details of the design yet.]
--- a/examples/MNIST.jl
+++ b/examples/MNIST.jl
@ -0,0 +1,13 @@
+using Flux, MNIST
+
+const data = collect(zip([trainfeatures(i) for i = 1:60_000],
+                         [onehot(trainlabel(i), 1:10) for i = 1:60_000]))
+const train = data[1:50_000]
+const test = data[50_001:60_000]
+
+const m = Sequence(
+  Input(784),
+  Dense(30), Sigmoid(),
+  Dense(10), Sigmoid())
+
+@time Flux.train!(m, train, test, epoch = 30)
--- a/examples/sketch.jl
+++ b/examples/sketch.jl
@ -0,0 +1,57 @@
+# Simple Perceptron Layer
+
+@flux type Simple
+  weight
+  bias
+  x -> σ( weight*x + bias )
+end
+
+Simple(nx::Integer, ny::Integer; init = randn) =
+  Simple(init(nx, ny), init(ny))
+
+# Time Delay Node
+
+type Delay
+  n::Int
+  next
+end
+
+# feed(l::Delay, x) = ...
+
+# back(l::Delay, y) = ...
+
+# Simple Recurrent
+
+@flux type RecurrentU
+  Wxh; Whh; Bh
+  Wxy; Why; By
+
+  function feed(x, hidden)
+    hidden′ = σ( Wxh*x + Whh*hidden + Bh )
+    y = σ( Wxy*x + Why*hidden′ + By )
+    y, hidden′
+  end
+end
+
+Recurrent(nx, ny, nh; init = randn) =
+  Recurrent(init(nx, nh), init(nh, nh), init(nh),
+            init(nx, ny), init(nh, ny), init(ny))
+
+@flux type Looped{T}
+  delay::Delay
+  layer::T
+
+  function (x)
+    y, hidden = layer(x, delay(hidden))
+    return y
+  end
+end
+
+type Recurrent
+  layer::Looped{RecurrentU}
+end
+
+Recurrent(nx, ny, nh; init = randn, delay = 10) =
+  Looped(Delay(delay, init(nh)), RecurrentU(nx, ny, nh))
+
+@forward Recurrent.layer feed
--- a/src/Flux.jl
+++ b/src/Flux.jl
@ -1,5 +1,28 @@
 module Flux

-# package code goes here
+using MacroTools, Lazy, Flow
+
+# Zero Flux Given
+
+export Model, back!, update!
+
+abstract Model
+abstract Activation <: Model
+
+back!(m::Model, ∇) = error("Backprop not implemented for $(typeof(m))")
+update!(m::Model, η) = m
+
+include("capacitor.jl")
+
+include("compiler/diff.jl")
+include("compiler/loop.jl")
+include("compiler/code.jl")
+
+include("cost.jl")
+include("activation.jl")
+include("layers/input.jl")
+include("layers/dense.jl")
+include("layers/sequence.jl")
+include("utils.jl")

 end # module
--- a/src/activation.jl
+++ b/src/activation.jl
@ -0,0 +1,28 @@
+export Sigmoid
+
+σ(x) = 1/(1+exp(-x))
+σ′(x) = σ(x)*(1-σ(x))
+
+∇₁(::typeof(σ)) = σ′
+
+type Sigmoid <: Activation
+  in::Vector{Float32}
+  out::Vector{Float32}
+  ∇in::Vector{Float32}
+end
+
+Sigmoid(size::Integer) = Sigmoid(zeros(size), zeros(size), zeros(size))
+
+function (l::Sigmoid)(x)
+  l.in = x
+  map!(σ, l.out, x)
+end
+
+function back!(l::Sigmoid, ∇)
+  map!(σ′, l.∇in, l.in)
+  map!(*, l.∇in, l.∇in, ∇)
+end
+
+shape(l::Sigmoid) = length(l.in)
+
+Sigmoid() = Init(in -> Sigmoid(in[1]))
--- a/src/capacitor.jl
+++ b/src/capacitor.jl
@ -0,0 +1,8 @@
+type Capacitor{T}
+  Δs::Vector{T}
+end
+
+type Patch{T}
+  η::Float32
+  Δs::Capacitor{T}
+end
--- a/src/compiler/code.jl
+++ b/src/compiler/code.jl
@ -0,0 +1,84 @@
+function process_func(ex, params)
+  @capture(shortdef(ex), (args__,) -> body_)
+  body = il(graphm(body))
+  body = map(x -> x in params ? :(self.$x) : x, body)
+  return args, body
+end
+
+function build_type(T, params)
+  quote
+    type $T
+      $(params...)
+      $([symbol("Δ", s) for s in params]...)
+    end
+    $T($(params...)) = $T($(params...),
+                          $((:(zeros($p)) for p in params)...))
+  end
+end
+
+function build_forward(body, args)
+  body = cut_forward(body, args)
+  cse(body)
+end
+
+function build_backward(body, x, params)
+  Δs, Δloops = cut_backward(body, [x])
+  back = IVertex{Any}(Flow.Do())
+  for param in params
+    haskey(Δs, :(self.$param)) || continue
+    k = symbol("Δ", param)
+    ksym = Expr(:quote, k)
+    ex = Δs[:(self.$param)]
+    for Δloop in Δloops
+      ex = addΔ(ex, get(Δloop, :(self.$param), vertex(0)))
+    end
+    thread!(back, @v(setfield!(:self, ksym, :(self.$k) + ex)))
+  end
+  ex = Δs[x]
+  for Δloop in Δloops
+    ex = addΔ(ex, get(Δloop, x, vertex(0)))
+  end
+  thread!(back, @flow(tuple($ex)))
+  cse(back)
+end
+
+function build_update(T, params)
+  updates = []
+  for p in params
+    Δp = symbol("Δ", p)
+    push!(updates, :(self.$p += self.$Δp; fill!(self.$Δp, 0)))
+  end
+  :(update!(self::$T) = $(updates...))
+end
+
+function process_type(ex)
+  @capture(ex, type T_ fs__ end)
+  @destruct [params = true || [],
+             funcs  = false || []] = groupby(x->isa(x, Symbol), fs)
+  @assert length(funcs) == 1
+  args, body = process_func(funcs[1], params)
+  @assert length(args) == 1
+  quote
+    $(build_type(T, params))
+    (self::$T)($(args...),) = $(syntax(build_forward(body, args)))
+    back!(self::$T, Δ, $(args...)) = $(syntax(build_backward(body, args[1], params)))
+    $(build_update(T, params))
+  end |> longdef
+end
+
+# process_type(:(type Sigmoid
+#   W
+#   b
+#   bp
+#   x -> σ(W*x+b)
+# end)) |> prettify
+
+process_type(:(type Recurrent
+  Wxh; Whh; Bh
+  Why; By
+
+  function (x)
+    hidden = σ( Wxh*x + Whh*Delay(hidden) + Bh )
+    y = σ( Why*hidden + By )
+  end
+end)) |> prettify
--- a/src/compiler/diff.jl
+++ b/src/compiler/diff.jl
@ -0,0 +1,34 @@
+import Flow: isconstant, il, dl, cse, prewalk, graphm, syntax, @v
+
+vertex(a...) = IVertex{Any}(a...)
+
+addΔ(a, b) = vertex(:+, a, b)
+
+# Special case a couple of operators to clean up output code
+const symbolic = Dict()
+
+symbolic[:+] = (Δ, args...) -> map(_->Δ, args)
+
+function ∇v(v::Vertex, Δ)
+  haskey(symbolic, value(v)) && return symbolic[value(v)](Δ, inputs(v)...)
+  Δ = vertex(:back!, vertex(value(v)), Δ, inputs(v)...)
+  map(i -> @flow(getindex($Δ, $i)), 1:Flow.nin(v))
+end
+
+function invert(v::IVertex, Δ = vertex(:Δ), out = d())
+  @assert !iscyclic(v)
+  if isconstant(v)
+    @assert !haskey(out, value(v))
+    out[value(v)] = il(Δ)
+  else
+    Δ′s = ∇v(v, Δ)
+    for (v′, Δ′) in zip(inputs(v), Δ′s)
+      invert(v′, Δ′, out)
+    end
+  end
+  return out
+end
+
+back!(::typeof(+), Δ, args...) = map(_ -> Δ, args)
+
+back!(::typeof(*), Δ, a, b) = Δ*b', Δ*a'
--- a/src/compiler/loop.jl
+++ b/src/compiler/loop.jl
@ -0,0 +1,59 @@
+function delays(v::IVertex)
+  ds = []
+  Flow.prefor(v) do w
+    value(w) == :Delay &&
+      push!(ds, w)
+  end
+  return ds
+end
+
+function cut(v::IVertex, f = _ -> il(@flow(last(self.delay))))
+  prewalk(v) do v
+    value(v) == :Delay ? f(v) : v
+  end
+end
+
+replaceall(d::Dict, args...) = Dict(k => replace(v, args...) for (k, v) in d)
+
+# Create the forward function; a single delay node becomes an
+# input and an output node.
+function cut_forward(v::IVertex, params, ds = delays(v))
+  pushes = map(x->vertex(:push!, vertex(:(self.delay)), x[1], map(vertex, params)...), ds)
+  isempty(pushes) && return v
+  @assert length(pushes) == 1
+  v = vertex(Flow.Do(), pushes..., v)
+  cut(v)
+end
+
+# Given a delay node, give the parameter gradients with respect to
+# the node and a function which will propagate gradients around
+# the loop.
+function invertloop(v::IVertex, params)
+  @gensym input
+  v = cut(v[1], v -> vertex(input))
+  Δs = invert(v, @flow(Δloop))
+  Δs = replaceall(Δs, vertex(input), il(@flow(last(self.delay))))
+  Δs, :((Δ, $input, $(params...)) -> $(syntax(cse(Δs[input]))))
+end
+
+# Returns:
+#   Parameter gradients with respect to the function
+#   Parameter gradients with respect to each delay node
+function cut_backward(v::IVertex, params, ds = delays(v))
+  isempty(ds) && return invert(v), []
+  @assert length(ds) == 1
+  @gensym input
+  Δs = invert(cut(v, _ -> vertex(input)))
+  Δs = replaceall(Δs, vertex(input), il(@flow(last(self.delay))))
+  Δloop, ∇loop = invertloop(ds[1], params)
+  Δh = vertex(:back!, vertex(:(self.delay)), Δs[input], vertex(∇loop))
+  Δloop = replaceall(Δloop, vertex(:Δloop), Δh)
+  Δs, [Δloop]
+end
+
+# g = il(@flow begin
+#   hidden = σ( Wxh*x + Whh*Delay(hidden) + bh )
+#   y = σ( Why*hidden + by )
+# end)
+
+# cut_backward(g, [:x])[1]
--- a/src/cost.jl
+++ b/src/cost.jl
@ -0,0 +1,8 @@
+export mse, mse!
+
+function mse!(∇, pred, target)
+  map!(-, ∇, pred, target)
+  sumabs2(∇)/2
+end
+
+mse(pred, target) = mse(similar(pred), pred, target)
--- a/src/layers/dense.jl
+++ b/src/layers/dense.jl
@ -0,0 +1,41 @@
+export Dense
+
+type Dense <: Model
+  W::Matrix{Float32}
+  b::Vector{Float32}
+  ∇W::Matrix{Float32}
+  ∇b::Vector{Float32}
+
+  in::Vector{Float32}
+  out::Vector{Float32}
+  ∇in::Vector{Float32}
+end
+
+Dense(in::Integer, out::Integer) =
+  Dense(randn(out, in), randn(out),
+        zeros(out, in), zeros(out),
+        zeros(in), zeros(out), zeros(in))
+
+Dense(out::Integer) = Init(in -> Dense(in[1], out))
+
+function (l::Dense)(x)
+  l.in = x
+  A_mul_B!(l.out, l.W, x)
+  map!(+, l.out, l.out, l.b)
+end
+
+function back!(l::Dense, ∇)
+  map!(+, l.∇b, l.∇b, ∇)
+  # l.∇W += ∇ * l.in'
+  BLAS.gemm!('N', 'T', eltype(∇)(1), ∇, l.in, eltype(∇)(1), l.∇W)
+  At_mul_B!(l.∇in, l.W, ∇)
+end
+
+function update!(l::Dense, η)
+  map!((x, ∇x) -> x - η*∇x, l.W, l.W, l.∇W)
+  map!((x, ∇x) -> x - η*∇x, l.b, l.b, l.∇b)
+  fill!(l.∇W, 0)
+  fill!(l.∇b, 0)
+end
+
+shape(d::Dense) = size(d.b)
--- a/src/layers/input.jl
+++ b/src/layers/input.jl
@ -0,0 +1,26 @@
+export Input
+
+typealias Dims{N} NTuple{N,Int}
+
+dims(d::Dims) = d
+
+dims(i...) = (i...,)
+
+type Input{N} <: Model
+  dims::Dims{N}
+end
+
+Input(i) = Input(dims(i))
+
+(::Input)(x) = x
+back!(::Input, ∇) = ∇
+
+shape(i::Input) = i.dims
+
+# Initialise placeholder
+
+type Init{F}
+  f::F
+end
+
+(f::Init)(args...) = f.f(args...)
--- a/src/layers/sequence.jl
+++ b/src/layers/sequence.jl
@ -0,0 +1,23 @@
+export Sequence
+
+type Sequence
+  layers::Vector{Model}
+end
+
+Sequence() = Sequence([])
+
+@forward Sequence.layers Base.getindex, Base.first, Base.last
+
+Base.push!(s::Sequence, m::Model) = push!(s.layers, m)
+
+Base.push!(s::Sequence, f::Init) = push!(s, f(shape(last(s))))
+
+function Sequence(ms...)
+  s = Sequence()
+  foreach(m -> push!(s, m), ms)
+  return s
+end
+
+(s::Sequence)(x) = foldl((x, m) -> m(x), x, s.layers)
+back!(s::Sequence, ∇) = foldr((m, ∇) -> back!(m, ∇), ∇, s.layers)
+update!(s::Sequence, η) = foreach(l -> update!(l, η), s.layers)
--- a/src/utils.jl
+++ b/src/utils.jl
@ -0,0 +1,27 @@
+export onehot, onecold
+
+onehot(label, labels) = [i == label for i in labels]
+onecold(pred, labels = 1:length(pred)) = labels[findfirst(pred, maximum(pred))]
+
+function train!(m::Model, train, test = []; epoch = 1, batch = 10, η = 0.1)
+    i = 0
+    ∇ = zeros(length(train[1][2]))
+    for _ in 1:epoch
+      for (x, y) in shuffle!(train)
+        i += 1
+        err = mse!(∇, m(x), y)
+        back!(m, ∇)
+        i % batch == 0 && update!(m, η/batch)
+      end
+      @show accuracy(m, test)
+    end
+    return m
+end
+
+function accuracy(m::Model, data)
+  correct = 0
+  for (x, y) in data
+    onecold(m(x)) == onecold(y) && (correct += 1)
+  end
+  return correct/length(data)
+end