Loss Functions
Flux provides a large number of common loss functions used for training machine learning models.
Most loss functions in Flux have an optional argument agg
, denoting the type of aggregation performed over the batch:
loss(ŷ, y; agg=mean)
Flux.mae
— Functionmae(ŷ, y; agg=mean)
Return the loss corresponding to mean absolute error:
agg(abs.(ŷ .- y))
Flux.mse
— Functionmse(ŷ, y; agg=mean)
Return the loss corresponding to mean square error:
agg((ŷ .- y).^2)
Flux.msle
— Functionmsle(ŷ, y; agg=mean, ϵ=eps(eltype(ŷ)))
The loss corresponding to mean squared logarithmic errors, calculated as
agg((log.(ŷ .+ ϵ) .- log.(y .+ ϵ)).^2)
The ϵ
term provides numerical stability. Penalizes an under-predicted estimate more than an over-predicted estimate.
Flux.huber_loss
— Functionhuber_loss(ŷ, y; δ=1, agg=mean)
Return the mean of the Huber loss given the prediction ŷ
and true values y
.
| 0.5 * |ŷ - y|, for |ŷ - y| <= δ
Huber loss = |
| δ * (|ŷ - y| - 0.5 * δ), otherwise
Flux.crossentropy
— Functioncrossentropy(ŷ, y; weight=nothing, dims=1, ϵ=eps(eltype(ŷ)), agg=mean)
Return the cross entropy between the given probability distributions; calculated as
agg(.-sum(weight .* y .* log.(ŷ .+ ϵ); dims=dims))agg=mean,
weight
can be nothing
, a number or an array. weight=nothing
acts like weight=1
but is faster.
See also: Flux.logitcrossentropy
, Flux.binarycrossentropy
, Flux.logitbinarycrossentropy
Flux.logitcrossentropy
— Functionlogitcrossentropy(ŷ, y; weight=nothing, agg=mean, dims=1)
Return the crossentropy computed after a Flux.logsoftmax
operation; calculated as
agg(.-sum(weight .* y .* logsoftmax(ŷ; dims=dims); dims=dims))
logitcrossentropy(ŷ, y)
is mathematically equivalent to Flux.crossentropy(softmax(log.(ŷ)), y)
but it is more numerically stable.
See also: Flux.crossentropy
, Flux.binarycrossentropy
, Flux.logitbinarycrossentropy
Flux.binarycrossentropy
— Functionbinarycrossentropy(ŷ, y; ϵ=eps(ŷ))
Return $-y*\log(ŷ + ϵ) - (1-y)*\log(1-ŷ + ϵ)$. The ϵ
term provides numerical stability.
Typically, the prediction ŷ
is given by the output of a sigmoid
activation.
See also: Flux.crossentropy
, Flux.logitcrossentropy
, Flux.logitbinarycrossentropy
Flux.logitbinarycrossentropy
— Functionlogitbinarycrossentropy(ŷ, y; agg=mean)
logitbinarycrossentropy(ŷ, y)
is mathematically equivalent to Flux.binarycrossentropy(σ(log(ŷ)), y)
but it is more numerically stable.
See also: Flux.crossentropy
, Flux.logitcrossentropy
, Flux.binarycrossentropy
Flux.kldivergence
— Functionkldivergence(ŷ, y; dims=1, agg=mean, ϵ=eps(eltype(ŷ)))
Return the Kullback-Leibler divergence between the given arrays interpreted as probability distributions.
KL divergence is a measure of how much one probability distribution is different from the other. It is always non-negative and zero only when both the distributions are equal everywhere.
Flux.poisson_loss
— Functionpoisson_loss(ŷ, y; agg=mean, ϵ=eps(eltype(ŷ))))
Return how much the predicted distribution ŷ
diverges from the expected Poisson
distribution y
; calculated as sum(ŷ .- y .* log.(ŷ)) / size(y, 2)
.
REDO More information..
Flux.hinge
— Functionhinge(ŷ, y; agg=mean)
Return the hinge loss given the prediction ŷ
and true labels y
(containing 1 or -1); calculated as agg(max.(0, 1 .- ŷ .* y))
.
See also: squared_hinge
Flux.squared_hinge
— Functionsquared_hinge(ŷ, y; agg=mean)
Return the squared hinge loss given the prediction ŷ
and true labels y
(containing 1 or -1); calculated as agg((max.(0, 1 .- ŷ .* y)).^2))
.
See also: hinge
Flux.dice_coeff_loss
— Functiondice_coeff_loss(ŷ, y; smooth=1)
Return a loss based on the dice coefficient. Used in the V-Net image segmentation architecture. Similar to the F1_score. Calculated as: 1 - 2sum(|ŷ . y| + smooth) / (sum(ŷ.^2) + sum(y.^2) + smooth)`
Flux.tversky_loss
— Functiontversky_loss(ŷ, y; β=0.7)
Return the Tversky loss. Used with imbalanced data to give more weight to false negatives. Larger β weigh recall higher than precision (by placing more emphasis on false negatives) Calculated as: 1 - sum(|y .* ŷ| + 1) / (sum(y .* ŷ + β(1 .- y) . ŷ + (1 - β)y . (1 .- ŷ)) + 1)