Back to Repositories

Testing CPU Adam Optimizer Performance in DeepSpeed

This test suite evaluates the performance of DeepSpeed’s CPU-based Adam optimizer implementation against PyTorch’s native Adam optimizer. It measures and compares execution times for gradient updates on large parameter tensors, providing critical performance benchmarking data.

Test Coverage Overview

The test provides comprehensive coverage of optimizer performance characteristics by comparing DeepSpeed’s CPU Adam implementation against PyTorch’s baseline.

  • Measures step() execution time across multiple iterations
  • Tests with large-scale parameter tensors (1GB+)
  • Validates performance with different group sizes
  • Ensures consistent gradient updates

Implementation Analysis

The testing approach employs a systematic performance comparison methodology using identical parameter configurations for both optimizers.

  • Implements parallel testing of both optimizer variants
  • Uses controlled environment with fixed iterations (NUM_ITERS=100)
  • Maintains consistent gradient values for fair comparison
  • Leverages PyTorch’s Parameter management system

Technical Details

  • Testing Framework: Python unit testing
  • Primary Dependencies: PyTorch, DeepSpeed
  • Test Parameters: 1GB model size, 274432 group size
  • Device Target: CPU-specific implementation
  • Performance Metrics: Average step time calculation

Best Practices Demonstrated

The test implementation showcases several performance testing best practices for optimizer comparisons.

  • Consistent test conditions across implementations
  • Multiple iteration averaging for reliable metrics
  • Large-scale tensor testing for real-world scenarios
  • Clean separation of test setup and execution
  • Proper resource management and cleanup

microsoft/deepspeed

tests/perf/adam_test.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import torch
from deepspeed.ops.adam import DeepSpeedCPUAdam
import time

NUM_ITERS = 100


def _test_perf(param, optimizer_func):
    optimizer = optimizer_func(param)
    avg = 0
    for i in range(NUM_ITERS):
        for i, p in enumerate(param):
            p.grad = torch.ones_like(p) * 2
        start = time.time()
        optimizer.step()
        stop = time.time()
        avg += (stop - start)

    return avg / NUM_ITERS


def _main():
    device = 'cpu'
    model_size = 1 * 1024**3
    group_size = [model_size, 274432]
    param = [torch.nn.Parameter(torch.ones(size, device=device)) for size in group_size]
    torch_time = _test_perf(param, torch.optim.Adam)
    ds_time = _test_perf(param, DeepSpeedCPUAdam)
    print(f"Step time: {torch_time=} {ds_time=}")


_main()