Back to Repositories

Testing Fullband MelGAN Vocoder Training Pipeline in Coqui-AI TTS

This test suite validates the training functionality of the Fullband MelGAN vocoder in the Coqui-AI TTS system. It covers the model configuration, training initialization, and training continuation scenarios.

Test Coverage Overview

The test suite provides comprehensive coverage of the Fullband MelGAN vocoder training pipeline.

Key areas tested include:

Model configuration setup and persistence
Initial training execution
Training continuation from checkpoints
Audio preprocessing parameters
Device management and CUDA integration

Implementation Analysis

The testing approach implements a practical training workflow using the CLI interface. It utilizes a minimal configuration with reduced epochs and batch sizes for testing efficiency.

Notable patterns include:

Dynamic device selection
File path management
Training state persistence
Resource cleanup

Technical Details

Testing components include:

FullbandMelganConfig for model configuration
CUDA device management
CLI command execution wrapper
File system operations for checkpoint management
LJSpeech dataset integration
Python’s glob and shutil libraries for file operations

Best Practices Demonstrated

The test implementation showcases several testing best practices for ML model training.

Notable practices include:

Isolated test environment with controlled parameters
Proper resource cleanup after test execution
Modular configuration management
Reproducible training scenarios
Efficient test data handling

coqui-ai/tts

tests/vocoder_tests/test_fullband_melgan_train.py

            
import glob
import os
import shutil

from tests import get_device_id, get_tests_output_path, run_cli
from TTS.vocoder.configs import FullbandMelganConfig

config_path = os.path.join(get_tests_output_path(), "test_vocoder_config.json")
output_path = os.path.join(get_tests_output_path(), "train_outputs")

config = FullbandMelganConfig(
    batch_size=8,
    eval_batch_size=8,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1,
    seq_len=8192,
    eval_split_size=1,
    print_step=1,
    print_eval=True,
    data_path="tests/data/ljspeech",
    discriminator_model_params={"base_channels": 16, "max_channels": 64, "downsample_factors": [4, 4, 4]},
    output_path=output_path,
)
config.audio.do_trim_silence = True
config.audio.trim_db = 60
config.save_json(config_path)

# train the model for one epoch
command_train = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --config_path {config_path} "
run_cli(command_train)

# Find latest folder
continue_path = max(glob.glob(os.path.join(output_path, "*/")), key=os.path.getmtime)

# restore the model and continue training for one more epoch
command_train = (
    f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --continue_path {continue_path} "
)
run_cli(command_train)
shutil.rmtree(continue_path)