Equivalence of Empirical Risk Minimization to Regularization on the Family of $f- \text{Divergences}$

Francisco Daunas; Iñaki Esnaola; Samir M. Perlaza; H Vincent Vincent Poort

doi:10.1109/isit57864.2024.10619260

Abstract

1 min read

The solution to empirical risk minimization with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f-\mathbf{divergence}$</tex> regularization <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathbf{ERM}-f\mathbf{DR}$</tex>) is presented under mild conditions on <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f$</tex>. Under such conditions, the optimal measure is shown to be unique. Examples of the solution for particular choices of the function <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f$</tex> are presented. Previously known solutions to common regularization choices are obtained by lever-aging the flexibility of the family of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f-\mathbf{divergences}$</tex>, These include the unique solutions to empirical risk minimization with relative entropy regularization (Type-I and Type-II). The analysis of the solution unveils the following properties of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f-\mathbf{divergences}$</tex> when used in the ERM-f DR problem: <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$i$</tex>) <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f-\mathbf{divergence}$</tex> regularization forces the support of the solution to coincide with the support of the reference measure, which introduces a strong inductive bias that dominates the evidence provided by the training data; and ii) any <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f-\mathbf{divergence}$</tex> regularization is equivalent to a different <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f-\mathbf{divergence}$</tex> regularization with an appropriate transformation of the empirical risk function.

Equivalence of Empirical Risk Minimization to Regularization on the Family of $f- \text{Divergences}$

Abstract

Discussion(0)

Related publications

Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences

Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization

Conditional Mean Estimation in Gaussian Noise: A Meta Derivative Identity With Applications

On Permutation Commutative Q-Algebras with Their Ideals

The Influence of Growth Method Towards Carbon Nanotube Field Effect Transistor Performance

Related publications

Preprint2024
Equivalence of the Empirical Risk Minimization to Regularization on the Family of f-Divergences
Preprint2024

Article2025
Asymmetry of the Relative Entropy in the Regularization of Empirical Risk Minimization
Article2025

Article2022
Conditional Mean Estimation in Gaussian Noise: A Meta Derivative Identity With Applications
Article2022

Article2023
On Permutation Commutative Q-Algebras with Their Ideals
Article2023

Article2021
The Influence of Growth Method Towards Carbon Nanotube Field Effect Transistor Performance
Article2021