Implementing Multi-step Time Series Forecasting with PyTorch Encoder-Decoder Architecture

Data Preparation

The dataset originates from a Kaggle competition involving store item demand forecasting. It contains 5 years of sales data (2013-2017) for 50 items across 10 stores, requiring predictions for the next 3 months (January-March 2018). This represents a multi-step multivariate time series problem with 500 distinct time series to forecast.

Feature Engineering

Key observations from the data include weekly/monthly seasonality and annual trends. To capture these patterns, we incorporate:

  • DateTime features with cyclic encoding (sine/cosine transformations)
  • Annual autocorrelation values
  • All features are normalized per time series

Sequence Construction

The model requires fixed-length input/output sequences:

  • Output sequence: 90 days (3 months)
  • Input sequence: 180 days (6 months)
  • Sliding window approach generates sequential training samples

PyTorch Data Pipeline

class TimeSeriesDataset(Dataset):
    def __init__(self, categorical_cols=[], numeric_cols=[], embed_dims=None, include_decoder_input=True):
        self.sequences = None
        self.cat_cols = categorical_cols
        self.num_cols = numeric_cols
        self.embed_config = []
        self.embed_dims = embed_dims if embed_dims else {}
        self.decoder_input = include_decoder_input

    def load_data(self, processed_df):
        self.sequences = processed_df

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        sample = self.sequences.iloc[[idx]]
        x_seq = torch.tensor(sample['x_sequence'].values[0], dtype=torch.float32)
        y_seq = torch.tensor(sample['y_sequence'].values[0], dtype=torch.float32)
        
        if self.decoder_input:
            decoder_in = torch.tensor(y_seq[:, 1:], dtype=torch.float32)
        
        # Handle numeric features
        for col in self.num_cols:
            num_val = torch.tensor([sample[col].values[0]], dtype=torch.float32)
            x_seq = torch.cat((x_seq, num_val.repeat(x_seq.size(0)).unsqueeze(1)), dim=1)
            decoder_in = torch.cat((decoder_in, num_val.repeat(decoder_in.size(0)).unsqueeze(1)), dim=1)
            
        return (x_seq, decoder_in), y_seq[:, 0]

Model Architecture

The encoder-decoder framework consists of two main components:

Encoder Network

class SequenceEncoder(nn.Module):
    def __init__(self, input_dim, hidden_size, num_layers=1, bidirectional=False, dropout=0.2):
        super().__init__()
        self.gru = nn.GRU(
            input_size=input_dim,
            hidden_size=hidden_size,
            num_layers=num_layers,
            bidirectional=bidirectional,
            dropout=dropout,
            batch_first=True
        )
        
    def forward(self, x):
        hidden = torch.zeros(self.gru.num_layers * (2 if self.gru.bidirectional else 1), 
                            x.size(0), self.gru.hidden_size, device=x.device)
        if x.ndim < 3:
            x = x.unsqueeze(2)
        output, hidden = self.gru(x, hidden)
        return output, hidden[-1] if hidden.size(0) > 1 else hidden.squeeze(0)

Decoder Network

class DecoderUnit(nn.Module):
    def __init__(self, input_dim, hidden_size, dropout=0.2):
        super().__init__()
        self.gru_cell = nn.GRUCell(input_dim, hidden_size)
        self.linear = nn.Linear(hidden_size, 1)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, hidden, x):
        hidden = self.gru_cell(x, hidden)
        return self.linear(self.dropout(hidden)), hidden

Complete Model

class Seq2SeqForecaster(nn.Module):
    def __init__(self, encoder, decoder, pred_length=90, teacher_forcing=0.3):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.pred_len = pred_length
        self.teacher_forcing = teacher_forcing
        
    def forward(self, x, y_true=None):
        enc_out, hidden = self.encoder(x[0])
        predictions = torch.zeros(x[0].size(0), self.pred_len, device=x[0].device)
        prev_val = x[0][:, -1, 0].unsqueeze(1)
        
        for i in range(self.pred_len):
            dec_input = torch.cat((prev_val, x[1][:, i]), dim=1)
            if y_true is not None and i > 0 and torch.rand(1) < self.teacher_forcing:
                dec_input = torch.cat((y_true[:, i].unsqueeze(1), x[1][:, i]), dim=1)
                
            pred, hidden = self.decoder(hidden, dec_input)
            predictions[:, i] = pred.squeeze(1)
            prev_val = pred
            
        return predictions

Training Strategy

Key training considerations:

  1. Validation Approach: Time-based split (2014-2016 train, 2017 validation)
  2. Optimizer: AdamW with seperate optimizers for encoder/decoder
  3. Learning Rate: 1cycle policy with maximum rate determined via LR finder
  4. Loss Function: MSE (more stable than SMAPE during training)
  5. Regularization: Dropout in both encoder and decoder networks

Performance

The model achieved top 10% performance in the Kaggle competition. Future improvements could include attention mechanisms and additoinal hyperparameter tuning.

Tags: pytorch time-series forecasting encoder-decoder deep-learning

Posted on Wed, 03 Jun 2026 18:16:35 +0000 by nadeemshafi9