Back to Repositories

Testing Multiband MelGAN Vocoder Training Workflow in Coqui-AI TTS

This test suite validates the training functionality of the Multiband MelGAN vocoder model in the Coqui-AI TTS system. It focuses on model initialization, training execution, and checkpoint restoration capabilities.

Test Coverage Overview

The test suite covers essential training workflows for the Multiband MelGAN vocoder model.

Key areas tested include:
  • Configuration initialization and parameter validation
  • Single epoch training execution
  • Model checkpoint restoration and continued training
  • GPU device handling and CUDA configuration

Implementation Analysis

The testing approach implements an end-to-end training validation using the CLI interface. It uses a small-scale configuration with reduced batch sizes and epochs for efficient testing.

Notable patterns include:
  • Dynamic device ID allocation
  • File path handling with glob patterns
  • Automated cleanup of training artifacts
  • Custom audio preprocessing settings

Technical Details

Testing components include:
  • MultibandMelganConfig for model configuration
  • CLI-based training execution
  • CUDA device management
  • File system operations for checkpoint handling
  • LJSpeech dataset integration
  • Custom test utilities for path management

Best Practices Demonstrated

The test implementation showcases several testing best practices.

Notable examples include:
  • Isolated test environment with dedicated output paths
  • Comprehensive configuration validation
  • Proper resource cleanup after test execution
  • Modular test structure with clear separation of concerns
  • Efficient test execution with minimal computational requirements

coqui-ai/tts

tests/vocoder_tests/test_multiband_melgan_train.py

            
import glob
import os
import shutil

from tests import get_device_id, get_tests_output_path, run_cli
from TTS.vocoder.configs import MultibandMelganConfig

config_path = os.path.join(get_tests_output_path(), "test_vocoder_config.json")
output_path = os.path.join(get_tests_output_path(), "train_outputs")

config = MultibandMelganConfig(
    batch_size=8,
    eval_batch_size=8,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1,
    seq_len=8192,
    eval_split_size=1,
    print_step=1,
    print_eval=True,
    steps_to_start_discriminator=1,
    data_path="tests/data/ljspeech",
    discriminator_model_params={"base_channels": 16, "max_channels": 64, "downsample_factors": [4, 4, 4]},
    output_path=output_path,
)
config.audio.do_trim_silence = True
config.audio.trim_db = 60
config.save_json(config_path)

# train the model for one epoch
command_train = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --config_path {config_path} "
run_cli(command_train)

# Find latest folder
continue_path = max(glob.glob(os.path.join(output_path, "*/")), key=os.path.getmtime)

# restore the model and continue training for one more epoch
command_train = (
    f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --continue_path {continue_path} "
)
run_cli(command_train)
shutil.rmtree(continue_path)