Back to Repositories

Testing OpenAI API Server Implementation with Multi-Model Workers in FastChat

This test suite implements and validates an OpenAI API test server with multiple model workers in FastChat. It handles both standard language models and multimodal capabilities, coordinating multiple worker processes and model configurations.

Test Coverage Overview

The test coverage focuses on launching and coordinating multiple model workers through an OpenAI-compatible API server. Key functionality includes:

  • Controller and API server initialization
  • Dynamic model worker allocation
  • Support for both standard and multimodal models
  • Port management and worker addressing
  • Custom tokenizer configurations

Implementation Analysis

The testing approach uses a process-based architecture to simulate a production environment. It implements worker management patterns through subprocess launching and configuration, with specific technical handling for different model types and worker variants.

The implementation leverages argument parsing for flexible test configuration and environment-specific GPU device allocation.

Technical Details

  • Uses Python’s os.popen for process management
  • Implements argparse for CLI argument handling
  • Configures CUDA device allocation
  • Manages multiple worker types (model_worker, vllm_worker, sglang_worker)
  • Handles custom port assignments (40000+)
  • Supports specialized tokenizer configurations

Best Practices Demonstrated

The test implementation showcases several testing quality practices including modular process management, clear separation of concerns, and flexible configuration options.

  • Configurable model selection
  • Systematic resource allocation
  • Clean process management
  • Environment-aware setup
  • Extensible worker configuration

lm-sys/fastchat

tests/launch_openai_api_test_server.py

            
"""
Launch an OpenAI API server with multiple model workers.
"""
import os
import argparse


def launch_process(cmd):
    os.popen(cmd)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--multimodal", action="store_true", default=False)
    args = parser.parse_args()

    launch_process("python3 -m fastchat.serve.controller")
    launch_process("python3 -m fastchat.serve.openai_api_server")

    if args.multimodal:
        models = [
            ("liuhaotian/llava-v1.5-7b", "sglang_worker"),
        ]
    else:
        models = [
            ("lmsys/vicuna-7b-v1.5", "model_worker"),
            ("lmsys/fastchat-t5-3b-v1.0", "model_worker"),
            ("THUDM/chatglm-6b", "model_worker"),
            ("mosaicml/mpt-7b-chat", "model_worker"),
            ("meta-llama/Llama-2-7b-chat-hf", "vllm_worker"),
        ]

    for i, (model_path, worker_name) in enumerate(models):
        cmd = (
            f"CUDA_VISIBLE_DEVICES={i} python3 -m fastchat.serve.{worker_name} "
            f"--model-path {model_path} --port {40000+i} "
            f"--worker-address http://localhost:{40000+i} "
        )

        if "llava" in model_path.lower():
            cmd += f"--tokenizer-path llava-hf/llava-1.5-7b-hf"

        if worker_name == "vllm_worker":
            cmd += "--tokenizer hf-internal-testing/llama-tokenizer"

        launch_process(cmd)

    while True:
        pass