Back to Repositories

Testing CPU Adam Optimizer Performance in DeepSpeed

This test suite evaluates the performance of DeepSpeed’s CPU-based Adam optimizer implementation against PyTorch’s native Adam optimizer. It measures and compares execution times for gradient updates on large parameter tensors, providing critical performance benchmarking data.

Test Coverage Overview

The test provides comprehensive coverage of optimizer performance characteristics by comparing DeepSpeed’s CPU Adam implementation against PyTorch’s baseline.

Measures step() execution time across multiple iterations
Tests with large-scale parameter tensors (1GB+)
Validates performance with different group sizes
Ensures consistent gradient updates

Implementation Analysis

The testing approach employs a systematic performance comparison methodology using identical parameter configurations for both optimizers.

Implements parallel testing of both optimizer variants
Uses controlled environment with fixed iterations (NUM_ITERS=100)
Maintains consistent gradient values for fair comparison
Leverages PyTorch’s Parameter management system

Technical Details

Testing Framework: Python unit testing
Primary Dependencies: PyTorch, DeepSpeed
Test Parameters: 1GB model size, 274432 group size
Device Target: CPU-specific implementation
Performance Metrics: Average step time calculation

Best Practices Demonstrated

The test implementation showcases several performance testing best practices for optimizer comparisons.

Consistent test conditions across implementations
Multiple iteration averaging for reliable metrics
Large-scale tensor testing for real-world scenarios
Clean separation of test setup and execution
Proper resource management and cleanup

microsoft/deepspeed

tests/perf/adam_test.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import torch
from deepspeed.ops.adam import DeepSpeedCPUAdam
import time

NUM_ITERS = 100


def _test_perf(param, optimizer_func):
    optimizer = optimizer_func(param)
    avg = 0
    for i in range(NUM_ITERS):
        for i, p in enumerate(param):
            p.grad = torch.ones_like(p) * 2
        start = time.time()
        optimizer.step()
        stop = time.time()
        avg += (stop - start)

    return avg / NUM_ITERS


def _main():
    device = 'cpu'
    model_size = 1 * 1024**3
    group_size = [model_size, 274432]
    param = [torch.nn.Parameter(torch.ones(size, device=device)) for size in group_size]
    torch_time = _test_perf(param, torch.optim.Adam)
    ds_time = _test_perf(param, DeepSpeedCPUAdam)
    print(f"Step time: {torch_time=} {ds_time=}")


_main()