LSTM Background
For detailed coverage of LSTM core concepts, internal structure, and backpropagation derivation, there are many existing in-depth resources available. A high-level understanding of how LSTMs store and propagate information is sufficient to work through this implementasion.
PyTorch Environment Setup
When configuring PyTorch in a conda environment on Windows, a common dependency error may occur:
OSError: [WinError 126] The specified module could not be found. Error loading "...\\torch\\lib\\c10\\_cuda.dll" or one of its dependencies
This issue is often resolved by creating a new clean conda environment with Python 3.9, then reinstalling PyTorch in the fresh environment. Official PyTorch installation documentation can be referenced for step-by-step setup instructions.
LSTM Implemantation
This is a simplified end-to-end example of LSTM training and prediction desgined for beginner learning. All code includes detailed comments to aid readability, and the full project (including raw data, preprocessing scripts, and training code) is available in a public GitHub repository.
LSTM Network Definition
The following code defines a reusable LSTM model that leverages PyTorch's built-in LSTM implementation, so we do not need to code LSTM cell logic from scratch:
import torch
import torch.nn as nn
class TimeSeriesLSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, num_stacked_layers=1):
super(TimeSeriesLSTM, self).__init__()
self.hidden_dim = hidden_dim
self.num_layers = num_stacked_layers
# Initialize LSTM layer: batch_first=True sets input shape to (batch_size, sequence_length, feature_count)
self.lstm_layer = nn.LSTM(
input_dim,
hidden_dim,
num_stacked_layers,
batch_first=True
)
# Fully connected layer maps LSTM output to final prediction
self.output_layer = nn.Linear(hidden_dim, output_dim)
def forward(self, input_sequence):
# Initialize hidden and cell states to zero, matching the input device
batch_size = input_sequence.size(0)
hidden_init = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(input_sequence.device)
cell_init = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(input_sequence.device)
# Forward pass through LSTM layers
lstm_output, _ = self.lstm_layer(input_sequence, (hidden_init, cell_init))
# Extract output from the last time step and generate final prediction
final_prediction = self.output_layer(lstm_output[:, -1, :])
return final_prediction
Data Loading Module
This module loads time series data from a local CSV file, preprocesses features, and splits data into training and testing loaders:
import torch
from torch.utils.data import Dataset, DataLoader, random_split
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
class NormalizedTimeSeriesDataset(Dataset):
def __init__(self, raw_data, sequence_length, target_column_index=-1):
self.seq_len = sequence_length
self.target_idx = target_column_index
# Normalize all features to 0-1 range for stable training
self.scaler = MinMaxScaler()
self.normalized_data = self.scaler.fit_transform(raw_data)
# Generate input-output sequence pairs
self.input_seqs, self.target_vals = self._build_sequences()
def _build_sequences(self):
X, y = [], []
for i in range(len(self.normalized_data) - self.seq_len):
seq_x = self.normalized_data[i:i + self.seq_len, :]
seq_y = self.normalized_data[i + self.seq_len, self.target_idx]
X.append(seq_x)
y.append(seq_y)
return (
torch.tensor(X, dtype=torch.float32),
torch.tensor(y, dtype=torch.float32).unsqueeze(1)
)
def __len__(self):
return len(self.input_seqs)
def __getitem__(self, idx):
return self.input_seqs[idx], self.target_vals[idx]
def get_train_test_loaders(csv_file_path, sequence_length, batch_size, train_split_ratio=0.8):
# Load raw data from CSV
raw_df = pd.read_csv(csv_file_path)
dataset = NormalizedTimeSeriesDataset(raw_df.values, sequence_length)
# Split into training and test sets
train_size = int(train_split_ratio * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
# Create batched dataloaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
return train_loader, test_loader, dataset.scaler