NumPy Fundamentals: Array Creation and Statistical Operations

Notably, many APIs in popular deep learning frameworks share similarities with NumPy. Mastering NumPy makes learning framework APIs significantly easier. This first part covers array creation and statistical operations through practical code examples.

# Import libraries
import numpy as np
# Visualization tool
import matplotlib.pyplot as plt

Array Creation Methods

This section covers array creation techniques. We begin with these because:

Real-world work often requires testing array APIs or interoperaiblity
Tools like scikit-learn, matplotlib, PyTorch, and TensorFlow need sample data for experimentation

Key creation methods we'll cover:

From Python lists/tuples
Using arange
Using linspaec/logspace
Using ones/zeros
Using random generation
Loading from files

The most frequently used are linspace/logspace (for plotting) and random (for generating synthetic data).

Creating from Python Lists or Tuples

⭐⭐ Master creating arrays from lists: np.array(list)

⚠️ Note the automatic type conversion. Arrays require uniform element types, so mixed types will be upcast.

# From a simple list
matrix_a = np.array([5, 10, 15])
print(matrix_a)
# Output: [ 5 10 15]

# 2D array (note the float)
matrix_b = np.array([[1, 2.5, 3], [4, 5, 6]])
print(matrix_b)
# Output: [[1.  2.5 3. ]
#          [4.  5.  6. ]]

# Explicit type specification
matrix_c = np.array([1, 2, 3], dtype=np.float32)
print(matrix_c)
# Output: [1. 2. 3.]

# Type casting without rounding
data = [[1, 2, 3], [4, 5, 6.9]]
matrix_d = np.array(data, dtype=np.int32)
print(matrix_d)
# Output: [[1 2 3]
#          [4 5 6]]

Using arange

⭐⭐ NumPy's range generator creates 1D vectors, useful for:

Creating sequential inputs (e.g., positional encodings)
Examining filtered/sampled results with ordered data

⚠️ Ensure reshape target shape matches element count.

# Basic usage
sequence = np.arange(8).reshape(2, 4)
print(sequence)
# Output: [[0 1 2 3]
#          [4 5 6 7]]

# With float step
float_seq = np.arange(0, 9, 1.5).reshape(2, 3)
print(float_seq)
# Output: [[0.  1.5 3. ]
#          [4.5 6.  7.5]]

# 3D reshape
tensor = np.arange(24).reshape(2, 3, 4)
print(tensor)
# Output: [[[ 0  1  2  3]
#           [ 4  5  6  7]
#           [ 8  9 10 11]]
#          
#          [[12 13 14 15]
#           [16 17 18 19]
#           [20 21 22 23]]]

Using linspace/logspace

⭐⭐⭐ These are crucial APIs. linspace requires start, end, and count parameters. logspace adds a base parameter (default 10).

⚠️ The third parameter is element count, not step size.

Using ones/zeros

⭐ Quick methods for creating uniform arrays. The \_like variants create arrays matching the shape of existing arrays.

⚠️ Default type is float.

# Ones array
ones_2d = np.ones((3, 2))
print(ones_2d)
# Output: [[1. 1.]
#          [1. 1.]
#          [1. 1.]]

# Zeros array
zeros_3d = np.zeros((2, 2, 3))
print(zeros_3d)
# Output: [[[0. 0. 0.]
#           [0. 0. 0.]]
#          
#          [[0. 0. 0.]
#           [0. 0. 0.]]]

# Shape matching
sample_array = np.array([[1, 2, 3], [4, 5, 6]])
zeros_like = np.zeros_like(sample_array)
print(zeros_like)
# Output: [[0 0 0]
#          [0 0 0]]

Using Random Generation

⭐⭐⭐⭐⭐ Essential for generating training/test data and neural network initialization.

⚠️ Use the new Generator API: np.random.default\_rng()

Loading from Files

⭐ Useful for loading preprocessed datasets or trained model parameters.

⚠️ File extensions are added automatically.

# Save single array
weights = np.array([[0.1, 0.2], [0.3, 0.4]])
np.save('model_weights.npy', weights)

# Save multiple arrays
features = np.random.rand(100, 10)
labels = np.random.randint(0, 2, 100)
np.savez('dataset.npz', X=features, y=labels)

# Load single array
loaded_weights = np.load('model_weights.npy')
print(loaded_weights)
# Output: [[0.1 0.2]
#          [0.3 0.4]]

# Load multiple arrays
data = np.load('dataset.npz')
print(data['X'].shape)  # (100, 10)
print(data['y'].shape)  # (100,)

Statistical Operations and Properties

This section explores array statistics including:

Dimension properties
Maximum, minimum, median, quantiles
Mean, sum, standard deviation

Key concepts to remember:

Axis parameter (0 for columns, 1 for rows)
keepdims parameter for dimension preservation

# Create sample data for demonstration
rng = np.random.default_rng(seed=42)
sample_data = rng.uniform(0, 10, (4, 5))
print(sample_data)

Dimension Properties

⭐⭐ Access dimension information including shape and size.

⚠️ size = total elements, shape = tuple of dimensions

# Number of dimensions
print(sample_data.ndim)  # Output: 2

# Shape tuple
print(sample_data.shape)  # Output: (4, 5)

# Total element count
print(sample_data.size)  # Output: 20

Extremes and Quantiles

⭐⭐⭐ Finding maximum, minimum, median, and quantiles.

⚠️ Quantiles can be any value between 0-1 and may not exist in the original array.

# Global maximum
global_max = sample_data.max()
print(f"Global max: {global_max:.3f}")

# Row-wise maximum with dimension preservation
row_maxes = sample_data.max(axis=1, keepdims=True)
print("Row maxes shape:", row_maxes.shape)  # (4, 1)

# Column-wise minimum
col_mins = sample_data.min(axis=0)
print("Column minima:", col_mins)

# Median
median_val = np.median(sample_data)
print(f"Median: {median_val:.3f}")

# 25th percentile along rows
q25_rows = np.quantile(sample_data, 0.25, axis=1)
print("25th percentile per row:", q25_rows)

# 75th percentile along columns with dimension preservation
q75_cols = np.quantile(sample_data, 0.75, axis=0, keepdims=True)
print("75th percentile per columns shape:", q75_cols.shape)

Averages, Sums, and Variance

⭐⭐⭐ Computing mean, sum, standard deviation, and variance.

# Overall average
overall_avg = np.average(sample_data)
print(f"Overall average: {overall_avg:.3f}")

# Row-wise average
row_avgs = np.average(sample_data, axis=1)
print("Row averages:", row_avgs)

# Column-wise sum
col_sums = np.sum(sample_data, axis=0)
print("Column sums:", col_sums)

# Cumulative sum along rows
cumsum_rows = np.cumsum(sample_data, axis=1)
print("Cumulative sum shape:", cumsum_rows.shape)

# Standard deviation along columns
std_cols = np.std(sample_data, axis=0)
print("Column standard deviations:", std_cols)

# Variance along rows
var_rows = np.var(sample_data, axis=1)
print("Row variances:", var_rows)

Posted on Wed, 13 May 2026 13:23:13 +0000 by navid

Freaks City