Building a Forward Automatic Differentiation System in Python

Automatic differentiation (AD) is a fundamental technique in deep learning frameworks. This article demonstrates how to implement forward-mode automatic differentiation using Python operator overloading. The implemantation is remarkably concise—you can build a working system in just a few lines of code.

Understanding Forward-Mode Automatic Differentiation

Forward-mode automatic differentiation computes derivatives by traversing the computational graph from inputs to outputs. At each node, it calculates both the function value and its derivative with respect to the input variables. This approach corresponds to applying the chain rule from the innermost to the outermost function when differentiating composite functions.

The core idea can be summarized in three steps:

  • Decompose the program into elementary operations with known derivative rules
  • Apply the corresponding derivative rules to each elementary expression
  • Combine derivative results using the chain rule based on data dependencies between expressions

When implementing this in Python using operator overloading, the process follows a similar pattern but leverages the language's operator overloading capabilities.

Implementation

First, we import numpy for numerical computations:

import numpy as np

We create a class called ADValue that holds two pieces of information: the actual value and its derivative. The derivative (grad) represents the sensitivity of the output with respect to this input variable.

class ADValue:
    """Forward-mode automatic differentiation value container."""
    
    def __init__(self, val, grad):
        self.val = val
        self.grad = grad
    
    def __repr__(self):
        return f'value:{self.val:.4f}, derivative:{self.grad:.4f}'

Now we implement operator overloading for basic arithmetic operations. The key insight is that each operation must also propagate the derivative using standard calculus rules.

    def __add__(self, other):
        if isinstance(other, ADValue):
            new_val = self.val + other.val
            new_grad = self.grad + other.grad
        elif isinstance(other, (int, float)):
            new_val = self.val + other
            new_grad = self.grad
        else:
            return NotImplemented
        return ADValue(new_val, new_grad)
    
    def __radd__(self, other):
        return self.__add__(other)
    
    def __sub__(self, other):
        if isinstance(other, ADValue):
            new_val = self.val - other.val
            new_grad = self.grad - other.grad
        elif isinstance(other, (int, float)):
            new_val = self.val - other
            new_grad = self.grad
        else:
            return NotImplemented
        return ADValue(new_val, new_grad)
    
    def __mul__(self, other):
        if isinstance(other, ADValue):
            # Product rule: (f*g)' = f'*g + f*g'
            new_val = self.val * other.val
            new_grad = self.grad * other.val + self.val * other.grad
        elif isinstance(other, (int, float)):
            new_val = self.val * other
            new_grad = self.grad * other
        else:
            return NotImplemented
        return ADValue(new_val, new_grad)
    
    def __rmul__(self, other):
        return self.__mul__(other)

We also need to overload common mathematical functions:

    def log(self):
        # d/dx log(x) = 1/x
        new_val = np.log(self.val)
        new_grad = (1.0 / self.val) * self.grad
        return ADValue(new_val, new_grad)
    
    def sin(self):
        # d/dx sin(x) = cos(x)
        new_val = np.sin(self.val)
        new_grad = self.grad * np.cos(self.val)
        return ADValue(new_val, new_grad)

Testing the Implementation

Let's verify our implementation with the following function:

f(x₁, x₂) = ln(x₁) + x₁ * x₂ - sin(x₂)

We initialize the input variables. To compute the partial derivative with respect to x₁, we set its derivative to 1 and x₂'s derivative to 0:

x1 = ADValue(val=2.0, grad=1.0)
x2 = ADValue(val=5.0, grad=0.0)

# Compute f(x1, x2) = ln(x1) + x1 * x2 - sin(x2)
result = x1.log() + x1 * x2 - x2.sin()

print(f"Function value: {result.val:.4f}")
print(f"Partial derivative df/dx1: {result.grad:.4f}")

Output:

Function value: 11.6521
Partial derivative df/dx1: 5.5000

To obtain the derivative with respect to x₂, we simply swap the gradient settings:

x1 = ADValue(val=2.0, grad=0.0)
x2 = ADValue(val=5.0, grad=1.0)

result = x1.log() + x1 * x2 - x2.sin()
print(f"Partial derivative df/dx2: {result.grad:.4f}")

Output:

Partial derivative df/dx2: 1.7163

Verification with Deep Learning Frameworks

We can verify our results against PyTorch:

import torch

x1 = torch.tensor([2.0], requires_grad=True)
x2 = torch.tensor([5.0], requires_grad=True)
f = torch.log(x1) + x1 * x2 - torch.sin(x2)
f.backward()

print(f"PyTorch - df/dx1: {x1.grad.item():.4f}")
print(f"PyTorch - df/dx2: {x2.grad.item():.4f}")

The results match our implementation exactly, confirming the correctness of the forward-mode automatic differentiation approach.

Tags: automatic-differentiation python machine-learning mathematical-optimization deep-learning

Posted on Sun, 24 May 2026 17:04:07 +0000 by kindoman