Back to Repositories

Testing DeepSpeed Strategy Integration with PyTorch Lightning in Microsoft/DeepSpeed

This test suite validates the integration between DeepSpeed and PyTorch Lightning using a simple model implementation. It ensures proper functionality of DeepSpeed’s optimization capabilities when used with Lightning’s training framework through a basic linear model and random dataset.

Test Coverage Overview

The test suite covers basic integration between DeepSpeed and PyTorch Lightning frameworks.

Key areas tested include:
  • Model training workflow with DeepSpeed strategy
  • 16-bit precision training compatibility
  • GPU acceleration support
  • Basic data loading and batch processing
  • Loss computation and logging functionality

Implementation Analysis

The implementation uses a straightforward testing approach with a minimal working example. It leverages PyTorch Lightning’s BoringModel pattern for testing framework integrations.

Key implementation patterns include:
  • Custom Dataset implementation with random data generation
  • LightningModule with basic linear layer
  • Training, validation and test step definitions
  • Integration with DeepSpeed strategy configuration

Technical Details

Testing tools and configuration:
  • PyTorch Lightning Trainer with DeepSpeed strategy
  • 16-bit precision training
  • Single GPU configuration
  • Custom RandomDataset class for synthetic data
  • SGD optimizer with 0.1 learning rate
  • Batch size of 2 for training and validation
  • 32-dimensional input features, 2-dimensional output

Best Practices Demonstrated

The test demonstrates several testing best practices for deep learning frameworks.

Notable practices include:
  • Minimal working example approach
  • Clear separation of data, model, and training components
  • Proper logging of metrics
  • Standard Lightning module structure
  • Explicit strategy configuration
  • Hardware-specific settings for GPU acceleration

microsoft/deepspeed

tests/lightning/test_simple.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import torch
from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy
from torch.utils.data import DataLoader, Dataset


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def val_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=2)


def test_lightning_model():
    """Test that DeepSpeed works with a simple LightningModule and LightningDataModule."""

    model = BoringModel()
    trainer = Trainer(strategy=DeepSpeedStrategy(), max_epochs=1, precision=16, accelerator="gpu", devices=1)
    trainer.fit(model)