Back to Repositories

Testing Expert Parallel Group Formation in DeepSpeed

This test suite validates the expert parallel ranks functionality in DeepSpeed’s distributed training system, focusing on the proper formation of parallel processing groups for model, expert, and data parallelism configurations.

Test Coverage Overview

The test coverage focuses on verifying the correct generation of expert parallel and expert data parallel groups in a distributed training setup.

  • Tests expert parallel group formation with 16 processes
  • Validates tensor parallel size of 2 and expert parallel size of 4
  • Ensures correct rank assignments across different parallel groups

Implementation Analysis

The testing approach uses a specific configuration to validate the _get_expert_parallel_ranks function implementation.

The test verifies:
  • Expert parallel groups formation with correct process rankings
  • Expert data parallel groups organization
  • Integration between model parallelism and expert parallelism

Technical Details

Testing utilizes:
  • Python’s built-in assert statements for validation
  • DeepSpeed’s groups utility module
  • Specific test configuration: world_size=16, tensor_parallel_size=2, expert_parallel_size=4
  • Process rank mapping verification for parallel groups

Best Practices Demonstrated

The test exemplifies strong testing practices through:

  • Clear test case documentation with example configurations
  • Comprehensive validation of group formations
  • Explicit expected output verification
  • Well-structured test organization with clear assertions

microsoft/deepspeed

tests/unit/utils/test_groups.py

            
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team

from deepspeed.utils.groups import _get_expert_parallel_ranks


def test_get_expert_parallel_ranks():
    """
    Example - E + M + D parallel
    world_size = 16
    model_degree = 2
    expert_degree = 4 # number of experts in same group
    mp_group = [0, 1], [2,3], [4,5] ...
    data_parallel_group =[0,2,4,6,8,10, 12,14],                 [1,3,5,7,9,11,13,15]
    expert_parallel_group = [0,2,4,6], [8,10,12,14]             [1,3,5,7], [9,11,13,15]
    expert_data_parallel_group = [0,8],[2,10],[4,12],[6,14],    [1,9],[3,11],[5,13],[7,15]
    """
    expert_parallel_groups, expert_data_parallel_groups = _get_expert_parallel_ranks(world_size=16,
                                                                                     tensor_parallel_size_=2,
                                                                                     expert_parallel_size_=4)
    assert expert_parallel_groups == [
        [0, 2, 4, 6],
        [8, 10, 12, 14],
        [1, 3, 5, 7],
        [9, 11, 13, 15],
    ]
    assert expert_data_parallel_groups == [
        [0, 8],
        [2, 10],
        [4, 12],
        [6, 14],
        [1, 9],
        [3, 11],
        [5, 13],
        [7, 15],
    ]