Back to Repositories

Testing RoPE Warp Size Alignment Implementation in DeepSpeed

This test suite validates the RoPE (Rotary Position Embedding) implementation in DeepSpeed’s inference operations, focusing on warp size alignment across different numbers of attention heads. It ensures correct position encoding behavior for transformer models during inference.

Test Coverage Overview

The test suite covers warp size alignment verification for RoPE operations across varying attention head configurations (64, 32, 16, 8 heads).

Key functionality tested includes:

Position embedding application with different head configurations
Tensor shape and alignment validation
GPU-specific operation verification
Memory alignment optimization checks

Implementation Analysis

The testing approach uses parametrized pytest fixtures to validate RoPE operations across multiple head configurations. It employs CUDA-specific testing patterns, ensuring proper device placement and tensor operations.

Framework features utilized:

pytest.mark decorators for test categorization
Parametrized test cases for multiple configurations
Device-specific test skipping logic
DeepSpeed’s inference builder integration

Technical Details

Testing tools and configuration:

PyTest framework with CUDA support
DeepSpeed InferenceBuilder for operation loading
PyTorch tensor operations
CUDA device management
Configurable parameters: batch size, sequence length, head dimensions
Rotary dimension and offset controls

Best Practices Demonstrated

The test implementation showcases several testing quality practices for GPU operations.

Notable practices include:

Proper device compatibility checking
Systematic parameter variation
Clean test isolation
Efficient resource handling
Clear test case organization
Hardware-specific test skipping

microsoft/deepspeed

tests/unit/ops/transformer/inference/test_rope.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

import pytest
import torch
import deepspeed
from deepspeed.ops.op_builder import InferenceBuilder
from deepspeed.accelerator import get_accelerator

if not deepspeed.ops.__compatible_ops__[InferenceBuilder.NAME]:
    pytest.skip("Inference ops are not available on this system", allow_module_level=True)


@pytest.mark.inference_ops
@pytest.mark.parametrize("num_heads", [64, 32, 16, 8])
def test_rope_warp_size_alignment(num_heads):
    if get_accelerator().device_name() != "cuda":
        pytest.skip("This test runs only on GPU")

    batch = 1
    head = 8
    seq_len = 1024
    head_dim = 32
    rotary_dim = 32
    offset = 8
    rotate_half = False
    rope_theta = 2

    cuda0 = torch.device('cuda:0')
    query = torch.randn(batch, head, seq_len, head_dim, device=cuda0)
    key = torch.randn(batch, head, seq_len, head_dim, device=cuda0)

    inference = InferenceBuilder().load()
    # For num_heads values of 64, 32, 16, 8
    # corresponding threads_per_head (defined in apply_rotary_pos_emb.cu) values are 4, 8, 16, 32
    inference.apply_rotary_pos_emb(query, key, rotary_dim, offset, num_heads, rotate_half, rope_theta)