Back to Repositories

Validating HiFi-GAN Vocoder Training Workflow in Coqui-AI TTS

This test suite validates the training functionality of the HiFi-GAN vocoder model in the Coqui-AI TTS framework. It covers model initialization, training execution, and checkpoint restoration capabilities.

Test Coverage Overview

The test suite provides comprehensive coverage of HiFi-GAN vocoder training workflow.

Key areas tested include:

Configuration initialization and validation
Single epoch training execution
Model checkpoint management
Training restoration from saved checkpoints

Implementation Analysis

The testing approach uses a combination of configuration setup and CLI command execution to validate the training pipeline. It implements a two-phase testing strategy – initial training followed by continued training from a checkpoint, using minimal data and epochs for efficient testing.

Technical patterns include:

Dynamic device selection for CUDA
File path handling and cleanup
Configuration serialization

Technical Details

Testing infrastructure includes:

Python’s built-in testing framework
Custom CLI execution utilities
HifiganConfig for model configuration
File system operations for cleanup
CUDA device management
LJSpeech dataset for training

Best Practices Demonstrated

The test implementation showcases several testing best practices in ML model training validation.

Notable practices include:

Isolated test environment with dedicated output paths
Proper resource cleanup after test execution
Minimal dataset usage for efficient testing
Comprehensive config parameter validation
Hardware-agnostic device selection

coqui-ai/tts

tests/vocoder_tests/test_hifigan_train.py

            
import glob
import os
import shutil

from tests import get_device_id, get_tests_output_path, run_cli
from TTS.vocoder.configs import HifiganConfig

config_path = os.path.join(get_tests_output_path(), "test_vocoder_config.json")
output_path = os.path.join(get_tests_output_path(), "train_outputs")


config = HifiganConfig(
    batch_size=8,
    eval_batch_size=8,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1,
    seq_len=1024,
    eval_split_size=1,
    print_step=1,
    print_eval=True,
    data_path="tests/data/ljspeech",
    output_path=output_path,
)
config.audio.do_trim_silence = True
config.audio.trim_db = 60
config.save_json(config_path)

# train the model for one epoch
command_train = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --config_path {config_path} "
run_cli(command_train)

# Find latest folder
continue_path = max(glob.glob(os.path.join(output_path, "*/")), key=os.path.getmtime)

# restore the model and continue training for one more epoch
command_train = (
    f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_vocoder.py --continue_path {continue_path} "
)
run_cli(command_train)
shutil.rmtree(continue_path)