Loss Functions

Flux provides a large number of common loss functions used for training machine learning models.

Loss functions for supervised learning typically expect as inputs a target y, and a prediction . In Flux's convention, the order of the arguments is the following

loss(ŷ, y)

Most loss functions in Flux have an optional argument agg, denoting the type of aggregation performed over the batch:

loss(ŷ, y)                         # defaults to `mean`
loss(ŷ, y, agg=sum)                # use `sum` for reduction
loss(ŷ, y, agg=x->sum(x, dims=2))  # partial reduction
loss(ŷ, y, agg=x->mean(w .* x))    # weighted mean
loss(ŷ, y, agg=identity)           # no aggregation. 

Losses Reference

Flux.maeFunction
mae(ŷ, y; agg=mean)

Return the loss corresponding to mean absolute error:

agg(abs.(ŷ .- y))
source
Flux.mseFunction
mse(ŷ, y; agg=mean)

Return the loss corresponding to mean square error:

agg((ŷ .- y).^2)
source
Flux.msleFunction
msle(ŷ, y; agg=mean, ϵ=eps(eltype(ŷ)))

The loss corresponding to mean squared logarithmic errors, calculated as

agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)

The ϵ term provides numerical stability. Penalizes an under-predicted estimate more than an over-predicted estimate.

source
Flux.huber_lossFunction
huber_loss(ŷ, y; δ=1, agg=mean)

Return the mean of the Huber loss given the prediction and true values y.

             | 0.5 * |ŷ - y|,            for |ŷ - y| <= δ
Huber loss = |
             |  δ * (|ŷ - y| - 0.5 * δ), otherwise
source
Flux.crossentropyFunction
crossentropy(ŷ, y; weight=nothing, dims=1, ϵ=eps(eltype(ŷ)), 
                   logits=false, agg=mean)

Return the cross entropy between the given probability distributions; calculated as

agg(.-sum(weight .* y .* log.(ŷ .+ ϵ); dims=dims))agg=mean,

weight can be nothing, a number or an array. weight=nothing acts like weight=1 but is faster.

If logits=true, the input ̂y is first fed to a softmax layer.

See also: Flux.logitcrossentropy, Flux.binarycrossentropy, Flux.logitbinarycrossentropy

source
Flux.kldivergenceFunction
kldivergence(ŷ, y; dims=1, agg=mean, ϵ=eps(eltype(ŷ)))

Return the Kullback-Leibler divergence between the given arrays interpreted as probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source
Flux.poisson_lossFunction
poisson_loss(ŷ, y; agg=mean, ϵ=eps(eltype(ŷ))))

Loss function derived from likelihood for a Poisson random variable with mean to take value y. It is given by

agg(ŷ .- y .* log.(ŷ .+ ϵ))

More information..

source
Flux.hingeFunction
hinge(ŷ, y; agg=mean)

Return the hinge loss given the prediction and true labels y (containing 1 or -1); calculated as

agg(max.(0, 1 .- ŷ .* y))

See also: squared_hinge

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y; agg=mean)

Return the squared hinge loss given the prediction and true labels y (containing 1 or -1); calculated as

agg(max.(0, 1 .- ŷ .* y).^2)

See also: hinge

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1, dims=size(ŷ)[1:end-1], agg=mean)

Return a loss based on the Dice coefficient. Used in the V-Net architecture for image segmentation. Current implementation only works for the binary segmentation case.

The arrays and y contain the predicted and true probabilities respectively for the foreground to be present in a certain pixel. The loss is computed as

1 - (2*sum(ŷ .* y; dims) .+ smooth) ./ (sum(ŷ.^2 .+ y.^2; dims) .+ smooth)

and then aggregated with agg over the batch.

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7, α=1-β, dims=size(ŷ)[1:end-1] agg=mean)

Return the Tversky loss for binary classification. The arrays and y contain the predicted and true probabilities respectively. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as:

num = sum(y .* ŷ, dims=dims)
den = sum(@.(ŷ*y + α*ŷ*(1-y) + β*(1-ŷ)*y)), dims=dims)
tversky_loss = 1 - num/den

and then aggregated with agg over the batch.

When α+β=1, it is equal to 1-F_β, where F_β is an F-score.

source