Neural Networks and Deep Learning Fundamentals

Deep learning, a subset of machine learning, relies on neural networks with multiple layers to model complex patterns in data. At its core is the artificial neural network (ANN), inspired by biological neurons, which processes inputs through layered computations to produce meaningful outputs.

Structure of a Neural Network

A typical feedforward neural network conissts of three types of layers:

  • Input layer: Receives raw feature vectors.
  • Hidden layer(s): Performs nonlinear transformations to extract abstract representations.
  • Output layer: Produces predictions (e.g., class probabilities or regression values).

Each neuron in layer \( l \) computes a weighted sum of activations from the previous layer, adds a bias, and applies a nonlinear activation function.

Forward Propagation

Given input \( \mathbf{x} \), the computation proceeds layer by layer:

For layer \( l \), the pre-activation value for neuron \( j \) is:

\[ z_j^{(l)} = \sum_{k=1}^{n^{(l-1)}} w_{jk}^{(l)} a_k^{(l-1)} + b_j^{(l)} \] where \( a_k^{(l-1)} \) is the activation from the previous layer, \( w_{jk}^{(l)} \) is the weight, and \( b_j^{(l)} \) is the bias.

The activation is then:

\[ a_j^{(l)} = f(z_j^{(l)}) \] Common choices for \( f \) include ReLU (\( f(x) = \max(0, x) \)), sigmoid, and tanh.

Backpropagation and Optimization

To train the network, we minimize a loss function \( \mathcal{L} \) using gradient-based optimization. Backpropagation efficient computes gradients via the chain rule.

The error signal at the output layer \( L \) is:

\[ \delta_j^{(L)} = \frac{\partial \mathcal{L}}{\partial z_j^{(L)}} \] For hidden layers, the error propagates backward:

\[ \delta_j^{(l)} = f'(z_j^{(l)}) \sum_k \delta_k^{(l+1)} w_{kj}^{(l+1)} \] Gradients for weights and biases are:

\[ \frac{\partial \mathcal{L}}{\partial w_{jk}^{(l)}} = \delta_j^{(l)} a_k^{(l-1)}, \quad \frac{\partial \mathcal{L}}{\partial b_j^{(l)}} = \delta_j^{(l)} \] Parameters are updated using stochastic gradient descent (SGD):

\[ w_{jk}^{(l)} \leftarrow w_{jk}^{(l)} - \alpha \cdot \frac{1}{m} \sum_{i=1}^m \delta_j^{(l,i)} a_k^{(l-1,i)} \] \[ b_j^{(l)} \leftarrow b_j^{(l)} - \alpha \cdot \frac{1}{m} \sum_{i=1}^m \delta_j^{(l,i)} \] where \( \alpha \) is the learning rate and \( m \) is the batch size.

Implementation Example

Below is a minimal implementation of a two-layer neural network for binary classification:

import numpy as np

class SimpleNN:
    def __init__(self, n_in, n_hid, n_out):
        self.W1 = np.random.randn(n_in, n_hid) * 0.1
        self.b1 = np.zeros(n_hid)
        self.W2 = np.random.randn(n_hid, n_out) * 0.1
        self.b2 = np.zeros(n_out)

    def relu(self, x):
        return np.maximum(0, x)

    def sigmoid(self, x):
        # Clip x to avoid overflow
        x = np.clip(x, -500, 500)
        return 1 / (1 + np.exp(-x))

    def forward(self, X):
        self.z1 = X @ self.W1 + self.b1
        self.a1 = self.relu(self.z1)
        self.z2 = self.a1 @ self.W2 + self.b2
        self.y_pred = self.sigmoid(self.z2)
        return self.y_pred

    def backward(self, X, y_true, lr=0.01):
        m = X.shape[0]
        dz2 = self.y_pred - y_true
        dW2 = (self.a1.T @ dz2) / m
        db2 = np.sum(dz2, axis=0) / m

        da1 = dz2 @ self.W2.T
        dz1 = da1 * (self.z1 > 0)  # ReLU derivative
        dW1 = (X.T @ dz1) / m
        db1 = np.sum(dz1, axis=0) / m

        # Update parameters
        self.W2 -= lr * dW2
        self.b2 -= lr * db2
        self.W1 -= lr * dW1
        self.b1 -= lr * db1

# Training on XOR-like data
X = np.array([[0,0], [0,1], [1,0], [1,1]], dtype=float)
y = np.array([[0], [1], [1], [0]], dtype=float)

model = SimpleNN(n_in=2, n_hid=4, n_out=1)
for epoch in range(2000):
    pred = model.forward(X)
    loss = -np.mean(y * np.log(pred + 1e-8) + (1 - y) * np.log(1 - pred + 1e-8))
    model.backward(X, y, lr=1.0)
    if epoch % 500 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

This example trains a network to solve a nonlinearly separable problem (XOR varient) using backpropagation and SGD.

Applications and Frameworks

Neural networks power modern AI systems:

  • Convolutional Neural Networks (CNNs) excel in image and video analysis.
  • Recurrent Neural Networks (RNNs) and Transformers handle sequential data like text and speech.
  • Used in recommendation engines, autonomous vehicles, medical diagnosis, and more.

Popular deep learning frameworks include TensorFlow, PyTorch, and Keras, which provide high-level APIs and GPU acceleration for scalable training.

Tags: neural-networks deep-learning backpropagation stochastic-gradient-descent ReLU

Posted on Fri, 15 May 2026 19:58:06 +0000 by project18726