Back to Repositories

Testing Numerical Precision Comparison Utilities in DeepSpeed

This test utility module provides essential functionality for DeepSpeed’s inference testing framework, focusing on dtype handling and numerical tolerance comparisons. It implements specialized comparison logic for different floating-point precisions including FP32, FP16, and BF16 when supported by the accelerator.

Test Coverage Overview

The test suite covers critical numerical comparison functionality for DeepSpeed’s inference pipeline.

Key areas include:
  • Dynamic tolerance management for different precision types
  • Dtype support detection and handling
  • Custom allclose implementation with configurable tolerances
  • Edge cases for different accelerator configurations

Implementation Analysis

The implementation follows a modular approach with cached initialization patterns for performance optimization.

Technical patterns include:
  • Singleton-style caching for tolerances and dtypes
  • Dynamic BF16 support detection
  • Flexible tolerance configuration
  • Type-specific comparison logic

Technical Details

Testing infrastructure utilizes:
  • PyTorch’s tensor operations and dtype system
  • DeepSpeed accelerator abstraction layer
  • Custom tolerance definitions for different precision levels
  • Global caching mechanisms for performance

Best Practices Demonstrated

The test utilities showcase excellent testing practices through careful handling of numerical precision and platform differences.

Notable practices include:
  • Explicit type checking and validation
  • Configurable comparison tolerances
  • Platform-aware implementation
  • Efficient caching mechanisms

microsoft/deepspeed

tests/unit/inference/v2/inference_test_utils.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

from typing import Tuple

import torch
from deepspeed.accelerator import get_accelerator

TOLERANCES = None


def get_tolerances():
    global TOLERANCES
    if TOLERANCES is None:
        TOLERANCES = {torch.float32: (5e-4, 5e-5), torch.float16: (3e-2, 2e-3)}
        if get_accelerator().is_bf16_supported():
            # Note: BF16 tolerance is higher than FP16 because of the lower precision (7 (+1) bits vs
            # 10 (+1) bits)
            TOLERANCES[torch.bfloat16] = (4.8e-1, 3.2e-2)
    return TOLERANCES


DTYPES = None


def get_dtypes(include_float=True):
    global DTYPES
    if DTYPES is None:
        DTYPES = [torch.float16, torch.float32] if include_float else [torch.float16]
        try:
            if get_accelerator().is_bf16_supported():
                DTYPES.append(torch.bfloat16)
        except (AssertionError, AttributeError):
            pass
    return DTYPES


def allclose(x, y, tolerances: Tuple[int, int] = None):
    assert x.dtype == y.dtype
    if tolerances is None:
        rtol, atol = get_tolerances()[x.dtype]
    else:
        rtol, atol = tolerances
    return torch.allclose(x, y, rtol=rtol, atol=atol)