Back to Repositories

Testing Transformer Inference Numerical Accuracy in DeepSpeed

This test utility module provides essential testing infrastructure for DeepSpeed’s transformer inference operations, focusing on numerical accuracy and dtype compatibility. It implements tolerance management and comparison utilities across different precision formats including FP32, FP16, and BF16.

Test Coverage Overview

The test suite provides comprehensive coverage for numerical comparison operations in deep learning model testing.

Handles multiple data types (FP32, FP16, BF16)
Implements customized tolerance levels per dtype
Supports accelerator-specific dtype compatibility checks
Includes array comparison utilities for numerical testing

Implementation Analysis

The testing approach utilizes PyTorch’s tensor operations with custom tolerance management for different precision levels. The implementation employs lazy initialization patterns for tolerances and supported dtypes, ensuring efficient test execution across different hardware configurations.

Dynamic tolerance configuration based on dtype
Accelerator-aware dtype support detection
Flexible comparison utilities with numpy integration

Technical Details

PyTorch tensor operations for numerical comparisons
NumPy testing integration for array comparisons
Hardware accelerator detection for dtype support
Configurable tolerance levels for different precision formats
Debug utilities for maximum difference detection

Best Practices Demonstrated

The test utilities demonstrate robust testing practices for numerical computing in deep learning frameworks.

Lazy initialization for performance optimization
Comprehensive dtype support verification
Flexible tolerance management
Clear error reporting and debugging capabilities
Hardware-aware testing configuration

microsoft/deepspeed

tests/unit/ops/transformer/inference/inference_test_utils.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import torch
from deepspeed.accelerator import get_accelerator

TOLERANCES = None


def get_tolerances():
    global TOLERANCES
    if TOLERANCES is None:
        TOLERANCES = {torch.float32: (5e-4, 5e-5), torch.float16: (3e-2, 2e-3)}
        if get_accelerator().is_bf16_supported():
            # Note: BF16 tolerance is higher than FP16 because of the lower precision (7 (+1) bits vs
            # 10 (+1) bits)
            TOLERANCES[torch.bfloat16] = (4.8e-1, 3.2e-2)
    return TOLERANCES


DTYPES = None


def get_dtypes():
    global DTYPES
    if DTYPES is None:
        DTYPES = get_accelerator().supported_dtypes()
    return DTYPES


def allclose(x, y):
    assert x.dtype == y.dtype
    rtol, atol = get_tolerances()[x.dtype]
    return torch.allclose(x, y, rtol=rtol, atol=atol)


def assert_almost_equal(x, y, decimal=2, err_msg=''):
    import numpy.testing as npt
    if isinstance(x, torch.Tensor):
        if x.dtype == torch.bfloat16:
            x = x.float()
        x = x.cpu().detach().numpy()
    if isinstance(y, torch.Tensor):
        if y.dtype == torch.bfloat16:
            y = y.float()
        y = y.cpu().detach().numpy()
    npt.assert_array_almost_equal(x, y, err_msg=err_msg, decimal=decimal)


def max_diff(a, b):
    a = a.to(torch.float32).flatten()
    b = b.to(torch.float32).flatten()
    diff = torch.abs(a - b)
    max_diff_indices = torch.argsort(diff)[-1]
    print("Max difference indices:", max_diff_indices)
    print("Max difference values:", diff[max_diff_indices])
    print(f"{a[max_diff_indices]} vs {b[max_diff_indices]}")
    return max_diff_indices