From 37d58e16dd234b16fb59bb2dd9cf1a37f54fcbde Mon Sep 17 00:00:00 2001 From: Dhairya Gandhi Date: Sat, 8 Feb 2020 16:33:18 +0530 Subject: [PATCH] common questions answered in docs --- docs/src/models/basics.md | 18 ++++++++++++++++++ docs/src/training/training.md | 17 +++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/docs/src/models/basics.md b/docs/src/models/basics.md index d83fc462..76f93684 100644 --- a/docs/src/models/basics.md +++ b/docs/src/models/basics.md @@ -219,3 +219,21 @@ Flux.@functor Affine ``` This enables a useful extra set of functionality for our `Affine` layer, such as [collecting its parameters](../training/optimisers.md) or [moving it to the GPU](../gpu.md). + +By default all the fields in the `Affine` type are collected as its parameters, however, in some cases it may be desired to hold other metadata in our "layers" that may not be needed for training, and are hence supposed to be ignored while the parameters are collected. With Flux, it is possible to mark the fields of our layers that are trainable in two ways. + +The first way of achieving this is through overloading the `trainable` function. + +```julia +Flux.trainable(a::Affine) = (a.W, a.b,) +``` + +To add other fields is simply to add them to the tuple. + +Another way of achieving this is through the `@functor` macro. Here, wee can mark the fields we are interested in like so: + +```julia +Flux.@functor Affine (W,) +``` + +However, doing this requires the `struct` to have a corresponding constructor that accepts those parameters. diff --git a/docs/src/training/training.md b/docs/src/training/training.md index b42db7c9..7680a776 100644 --- a/docs/src/training/training.md +++ b/docs/src/training/training.md @@ -41,6 +41,23 @@ The model to be trained must have a set of tracked parameters that are used to c Such an object contains a reference to the model's parameters, not a copy, such that after their training, the model behaves according to their updated values. +When it is desired to not include all the model parameters (for e.g. transfer learning), we can simply not pass in those layers into our call to `params`. + +Consider the simple multi-layer model where we want to omit optimising the second layer. This setup would look something like so: + +```julia +m = Chain( + Dense(784, 64, σ), + Dense(64, 32), + Dense(32, 10), softmax) + +ps = Flux.params(m[1], m[3:end]) +``` + +`ps` now holds a reference to only the parameters of the layers passed to it. + +Handling all the parameters on a layer by layer basis is explained in the [Layer Helpers](../models/basics.md) section. + ## Datasets The `data` argument provides a collection of data to train with (usually a set of inputs `x` and target outputs `y`). For example, here's a dummy data set with only one data point: