neural-networks.io

neural-networks.io

Gradient descent example

Let's consider the function ( \( f: \mathbb{R^2} \mapsto \mathbb{R} \) ) given by: $$ f(x,y) = (x-2)^2 + 2(y-3)^2 $$ Here is a 3D surface plot of this function: We want to apply the gradient descent algorithm to find the minima. Steps are given by the following formula: $$ X_{n+1} = X_n - \alpha \nabla f(X_n) $$ Let's start by calculating the gradient of \( f(x,y) \): $$ \nabla f(X) = \begin{pmatrix} \frac{df}{dx} \\ \frac{df}{dy} \end{pmatrix} = \begin{pmatrix} 2x-4 \\ 4y-12 \end{pmatrix} $$ The coordinates will be updated according to: $$ x_{n+1} = x_{n} + \alpha(2x_{n} - 4) $$ $$ y_{n+1} = y_{n} + \alpha(4y_{n} - 12) $$
In the following example, we arbitrary placed the starting point at coordinates \( X_0 = ( 30 , 20 ) \). Here is an illustration of the convergence to \( X_{200} = ( 2 , 3 ) \) after 200 iterations:

# Source code

Click one of the language below to display the source code of this minimization:

close all;
clear all;
clc;


%% Display function
[X,Y] = meshgrid(-30:2:30,-30:2:30);
Z = (X-2).^2 + 2*(Y-3).^2;
%surf(X,Y,Z);
contour(X,Y,Z,20);
hold on;
% Axis labels
xlabel ('X');
ylabel ('Y');
title ('f(x,y)=(x-2)² + 2(y-3)²');


%% Parameters
% Starting point
X=[30;20];
% Step size multiplier
alpha=0.05;


%% Gradient descent
for i=1:200    
    plot (X(1),X(2),'k.');
    X= X - alpha * [ 2*X(1)-4 ; 4*X(2)-12 ];
end


%% Diplay result
X
F=(X(1)-2).^2 + 2*(X(2)-3).^2
    

Output :

X =

    2.0000
    3.0000


F =

   3.9023e-16

import numpy as np

# Function to minimize
def function ( X ):
   return (X[0]-2)**2 + 2*(X[1]-3)**2

# Gradient of the function
def gradient (X):
    return np.array([ 2*X[0]-4 , 4*X[1]-12 ])

# Starting point
X = np.array([30,20])

# Step size multiplier
alpha=0.05

# Gradient descent
for x in range(0, 200):
    X = X - alpha*gradient(X)

# Print results
print ('X=')
print (X)
print ('f=')
print (function(X))

Output :

X=
[ 2.00000002  3.        ]
f=
3.90229279472e-16

#include <iostream>

// Coordinates of the starting point
#define     X_INIT  30
#define     Y_INIT  20

// Step size multiplier
#define     ALPHA   0.05

// Function to optimize
inline double function(double x, double y) { return (x-2)*(x-2) + 2*(y-3)*(y-3); }

// Gradient
inline double gradient_x(double x, double y) { return 2*x-4; }
inline double gradient_y(double x, double y) { return 4*y-12; }


int main(void)
{
    // Starting point
    double X[2]= { X_INIT , Y_INIT };

    // Gradient descent main loop
    for (int i=0;i<200;i++)
    {
        X[0]=X[0]-ALPHA*gradient_x(X[0],X[1]);
        X[1]=X[1]-ALPHA*gradient_y(X[0],X[1]);
    }

    std::cout << "::: Results :::" << std::endl;

    // Display X and Y
    std::cout << "x=" << X[0] << std::endl;
    std::cout << "y=" << X[1] << std::endl;

    // Display minimum of the function
    std::cout << "f(" << X[0] << "," << X[1] << ")=" << function(X[0],X[1]) << std::endl;

    return 0;
}

Output :

::: Results :::
x=2
y=3
f(2,3)=3.90229e-16