Back to Repositories

Testing RoPE Warp Size Alignment Implementation in DeepSpeed

This test suite validates the RoPE (Rotary Position Embedding) implementation in DeepSpeed’s inference operations, focusing on warp size alignment across different numbers of attention heads. It ensures correct position encoding behavior for transformer models during inference.

Test Coverage Overview

The test suite covers warp size alignment verification for RoPE operations across varying attention head configurations (64, 32, 16, 8 heads).

Key functionality tested includes:
  • Position embedding application with different head configurations
  • Tensor shape and alignment validation
  • GPU-specific operation verification
  • Memory alignment optimization checks

Implementation Analysis

The testing approach uses parametrized pytest fixtures to validate RoPE operations across multiple head configurations. It employs CUDA-specific testing patterns, ensuring proper device placement and tensor operations.

Framework features utilized:
  • pytest.mark decorators for test categorization
  • Parametrized test cases for multiple configurations
  • Device-specific test skipping logic
  • DeepSpeed’s inference builder integration

Technical Details

Testing tools and configuration:
  • PyTest framework with CUDA support
  • DeepSpeed InferenceBuilder for operation loading
  • PyTorch tensor operations
  • CUDA device management
  • Configurable parameters: batch size, sequence length, head dimensions
  • Rotary dimension and offset controls

Best Practices Demonstrated

The test implementation showcases several testing quality practices for GPU operations.

Notable practices include:
  • Proper device compatibility checking
  • Systematic parameter variation
  • Clean test isolation
  • Efficient resource handling
  • Clear test case organization
  • Hardware-specific test skipping

microsoft/deepspeed

tests/unit/ops/transformer/inference/test_rope.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import pytest
import torch
import deepspeed
from deepspeed.ops.op_builder import InferenceBuilder
from deepspeed.accelerator import get_accelerator

if not deepspeed.ops.__compatible_ops__[InferenceBuilder.NAME]:
    pytest.skip("Inference ops are not available on this system", allow_module_level=True)


@pytest.mark.inference_ops
@pytest.mark.parametrize("num_heads", [64, 32, 16, 8])
def test_rope_warp_size_alignment(num_heads):
    if get_accelerator().device_name() != "cuda":
        pytest.skip("This test runs only on GPU")

    batch = 1
    head = 8
    seq_len = 1024
    head_dim = 32
    rotary_dim = 32
    offset = 8
    rotate_half = False
    rope_theta = 2

    cuda0 = torch.device('cuda:0')
    query = torch.randn(batch, head, seq_len, head_dim, device=cuda0)
    key = torch.randn(batch, head, seq_len, head_dim, device=cuda0)

    inference = InferenceBuilder().load()
    # For num_heads values of 64, 32, 16, 8
    # corresponding threads_per_head (defined in apply_rotary_pos_emb.cu) values are 4, 8, 16, 32
    inference.apply_rotary_pos_emb(query, key, rotary_dim, offset, num_heads, rotate_half, rope_theta)