Back to Repositories

Testing CLI Model Inference Workflows in FastChat

This test suite validates the command line interface functionality for model inference in FastChat, focusing on different GPU configurations and model loading scenarios. It ensures reliable model loading and inference across various popular language models.

Test Coverage Overview

The test suite provides comprehensive coverage for CLI-based model inference, spanning multiple scenarios:
  • Single GPU model loading and inference
  • Multi-GPU configurations with memory management
  • 8-bit quantization testing
  • HuggingFace API integration testing
The suite tests against a diverse set of models including Vicuna, LongChat, FastChat-T5, Llama-2, ChatGLM, and others.

Implementation Analysis

The testing approach employs a systematic verification of model loading and inference capabilities using command-line execution patterns. Each test function focuses on specific deployment scenarios, utilizing Python’s subprocess management through a custom run_cmd utility. The implementation validates both local and HuggingFace-hosted models with various configuration parameters.

Technical Details

Key technical components include:
  • Python subprocess management for CLI testing
  • GPU memory configuration (up to 14GiB)
  • 8-bit quantization support
  • Multi-GPU distribution testing
  • Model path handling for both local and remote models
  • Programmatic style inference

Best Practices Demonstrated

The test suite exemplifies several testing best practices:
  • Modular test function organization
  • Comprehensive model compatibility checking
  • Graceful error handling and test termination
  • Systematic resource configuration testing
  • Clear separation of test scenarios

lm-sys/fastchat

tests/test_cli.py

            
"""Test command line interface for model inference."""
import argparse
import os

from fastchat.utils import run_cmd


def test_single_gpu():
    models = [
        "lmsys/vicuna-7b-v1.5",
        "lmsys/longchat-7b-16k",
        "lmsys/fastchat-t5-3b-v1.0",
        "meta-llama/Llama-2-7b-chat-hf",
        "THUDM/chatglm-6b",
        "THUDM/chatglm2-6b",
        "mosaicml/mpt-7b-chat",
        "tiiuae/falcon-7b-instruct",
        "~/model_weights/alpaca-7b",
        "~/model_weights/RWKV-4-Raven-7B-v11x-Eng99%-Other1%-20230429-ctx8192.pth",
    ]

    for model_path in models:
        if "model_weights" in model_path and not os.path.exists(
            os.path.expanduser(model_path)
        ):
            continue
        cmd = (
            f"python3 -m fastchat.serve.cli --model-path {model_path} "
            f"--style programmatic < test_cli_inputs.txt"
        )
        ret = run_cmd(cmd)
        if ret != 0:
            return

        print("")


def test_multi_gpu():
    models = [
        "lmsys/vicuna-13b-v1.3",
    ]

    for model_path in models:
        cmd = (
            f"python3 -m fastchat.serve.cli --model-path {model_path} "
            f"--style programmatic --num-gpus 2 --max-gpu-memory 14Gib < test_cli_inputs.txt"
        )
        ret = run_cmd(cmd)
        if ret != 0:
            return
        print("")


def test_8bit():
    models = [
        "lmsys/vicuna-13b-v1.3",
    ]

    for model_path in models:
        cmd = (
            f"python3 -m fastchat.serve.cli --model-path {model_path} "
            f"--style programmatic --load-8bit < test_cli_inputs.txt"
        )
        ret = run_cmd(cmd)
        if ret != 0:
            return
        print("")


def test_hf_api():
    models = [
        "lmsys/vicuna-7b-v1.5",
        "lmsys/fastchat-t5-3b-v1.0",
    ]

    for model_path in models:
        cmd = f"python3 -m fastchat.serve.huggingface_api --model-path {model_path}"
        ret = run_cmd(cmd)
        if ret != 0:
            return
        print("")


if __name__ == "__main__":
    test_single_gpu()
    test_multi_gpu()
    test_8bit()
    test_hf_api()