Back to Repositories

Testing Fullband MelGAN Vocoder Training Pipeline in Coqui-AI TTS

This test suite validates the training functionality of the Fullband MelGAN vocoder in the Coqui-AI TTS system. It covers the model configuration, training initialization, and training continuation scenarios.

Test Coverage Overview

The test suite provides comprehensive coverage of the Fullband MelGAN vocoder training pipeline.

Key areas tested include:
  • Model configuration setup and persistence
  • Initial training execution
  • Training continuation from checkpoints
  • Audio preprocessing parameters
  • Device management and CUDA integration

Implementation Analysis

The testing approach implements a practical training workflow using the CLI interface. It utilizes a minimal configuration with reduced epochs and batch sizes for testing efficiency.

Notable patterns include:
  • Dynamic device selection
  • File path management
  • Training state persistence
  • Resource cleanup

Technical Details

Testing components include:
  • FullbandMelganConfig for model configuration
  • CUDA device management
  • CLI command execution wrapper
  • File system operations for checkpoint management
  • LJSpeech dataset integration
  • Python’s glob and shutil libraries for file operations

Best Practices Demonstrated

The test implementation showcases several testing best practices for ML model training.

Notable practices include:
  • Isolated test environment with controlled parameters
  • Proper resource cleanup after test execution
  • Modular configuration management
  • Reproducible training scenarios
  • Efficient test data handling

coqui-ai/tts

tests/vocoder_tests/test_fullband_melgan_train.py

            
import glob
import os
import shutil

from tests import get_device_id, get_tests_output_path, run_cli
from TTS.vocoder.configs import FullbandMelganConfig

config_path = os.path.join(get_tests_output_path(), "test_vocoder_config.json")
output_path = os.path.join(get_tests_output_path(), "train_outputs")

config = FullbandMelganConfig(
    batch_size=8,
    eval_batch_size=8,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1,
    seq_len=8192,
    eval_split_size=1,
    print_step=1,
    print_eval=True,
    data_path="tests/data/ljspeech",
    discriminator_model_params={"base_channels": 16, "max_channels": 64, "downsample_factors": [4, 4, 4]},
    output_path=output_path,
)
config.audio.do_trim_silence = True
config.audio.trim_db = 60
config.save_json(config_path)

# train the model for one epoch
command_train = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --config_path {config_path} "
run_cli(command_train)

# Find latest folder
continue_path = max(glob.glob(os.path.join(output_path, "*/")), key=os.path.getmtime)

# restore the model and continue training for one more epoch
command_train = (
    f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --continue_path {continue_path} "
)
run_cli(command_train)
shutil.rmtree(continue_path)