Loss Functions

Flux provides a large number of common loss functions used for training machine learning models.

Most loss functions in Flux have an optional argument agg, denoting the type of aggregation performed over the batch:

loss(ŷ, y; agg=mean)
Flux.maeFunction
mae(ŷ, y; agg=mean)

Return the loss corresponding to mean absolute error:

agg(abs.(ŷ .- y))
source
Flux.mseFunction
mse(ŷ, y; agg=mean)

Return the loss corresponding to mean square error:

agg((ŷ .- y).^2)
source
Flux.msleFunction
msle(ŷ, y; agg=mean, ϵ=eps(eltype(ŷ)))

The loss corresponding to mean squared logarithmic errors, calculated as

agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)

The ϵ term provides numerical stability. Penalizes an under-predicted estimate more than an over-predicted estimate.

source
Flux.huber_lossFunction
huber_loss(ŷ, y; δ=1, agg=mean)

Return the mean of the Huber loss given the prediction and true values y.

             | 0.5 * |ŷ - y|,            for |ŷ - y| <= δ
Huber loss = |
             |  δ * (|ŷ - y| - 0.5 * δ), otherwise
source
Flux.kldivergenceFunction
kldivergence(ŷ, y; dims=1, agg=mean, ϵ=eps(eltype(ŷ)))

Return the Kullback-Leibler divergence between the given arrays interpreted as probability distributions.

KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.

source
Flux.poisson_lossFunction
poisson_loss(ŷ, y; agg=mean, ϵ=eps(eltype(ŷ))))

Return how much the predicted distribution diverges from the expected Poisson

distribution y; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2).

REDO More information..

source
Flux.hingeFunction
hinge(ŷ, y; agg=mean)

Return the hinge loss given the prediction and true labels y (containing 1 or -1); calculated as agg(max.(0, 1 .- ŷ .* y)).

See also: squared_hinge

source
Flux.squared_hingeFunction
squared_hinge(ŷ, y; agg=mean)

Return the squared hinge loss given the prediction and true labels y (containing 1 or -1); calculated as agg((max.(0, 1 .- ŷ .* y)).^2)).

See also: hinge

source
Flux.dice_coeff_lossFunction
dice_coeff_loss(ŷ, y; smooth=1)

Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as: 1 - 2sum(|ŷ . y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`

source
Flux.tversky_lossFunction
tversky_loss(ŷ, y; β=0.7)

Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)

source