Back to Repositories

Testing Tacotron2 Training and Inference Pipeline in Coqui-AI TTS

This test suite validates the training and inference functionality of the Tacotron2 model in the Coqui-AI TTS framework. It covers model configuration, training initialization, checkpointing, and inference capabilities through a comprehensive set of unit tests.

Test Coverage Overview

The test suite provides comprehensive coverage of the Tacotron2 model training pipeline.

Key areas tested include:

Model configuration setup and validation
Training initialization and execution
Checkpoint management and model restoration
Inference pipeline verification
Configuration persistence and loading

Implementation Analysis

The testing approach implements a full training-inference cycle validation. It utilizes a minimal Tacotron2 configuration with reduced epochs and batch sizes for efficient testing, while maintaining coverage of critical functionality.

Notable patterns include:

Dynamic device selection for CUDA compatibility
File path management for artifacts
CLI command execution validation
Checkpoint integrity verification

Technical Details

Testing infrastructure includes:

Python unit testing framework
CUDA device management
File system operations for artifact management
JSON configuration handling
CLI command execution utilities
LJSpeech dataset integration
Checkpoint management utilities

Best Practices Demonstrated

The test suite exemplifies several testing best practices in ML model validation.

Notable practices include:

Isolated test environment with cleanup
Comprehensive configuration validation
End-to-end pipeline testing
Resource efficient test execution
Deterministic test data usage
Proper error handling and cleanup

coqui-ai/tts

tests/tts_tests/test_tacotron2_train.py

            
import glob
import json
import os
import shutil

from trainer import get_last_checkpoint

from tests import get_device_id, get_tests_output_path, run_cli
from TTS.tts.configs.tacotron2_config import Tacotron2Config

config_path = os.path.join(get_tests_output_path(), "test_model_config.json")
output_path = os.path.join(get_tests_output_path(), "train_outputs")

config = Tacotron2Config(
    r=5,
    batch_size=8,
    eval_batch_size=8,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    text_cleaner="english_cleaners",
    use_phonemes=False,
    phoneme_language="en-us",
    phoneme_cache_path=os.path.join(get_tests_output_path(), "train_outputs/phoneme_cache/"),
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1,
    print_step=1,
    test_sentences=[
        "Be a voice, not an echo.",
    ],
    print_eval=True,
    max_decoder_steps=50,
)
config.audio.do_trim_silence = True
config.audio.trim_db = 60
config.save_json(config_path)

# train the model for one epoch
command_train = (
    f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_tts.py --config_path {config_path} "
    f"--coqpit.output_path {output_path} "
    "--coqpit.datasets.0.formatter ljspeech "
    "--coqpit.datasets.0.meta_file_train metadata.csv "
    "--coqpit.datasets.0.meta_file_val metadata.csv "
    "--coqpit.datasets.0.path tests/data/ljspeech "
    "--coqpit.test_delay_epochs 0 "
)
run_cli(command_train)

# Find latest folder
continue_path = max(glob.glob(os.path.join(output_path, "*/")), key=os.path.getmtime)

# Inference using TTS API
continue_config_path = os.path.join(continue_path, "config.json")
continue_restore_path, _ = get_last_checkpoint(continue_path)
out_wav_path = os.path.join(get_tests_output_path(), "output.wav")

# Check integrity of the config
with open(continue_config_path, "r", encoding="utf-8") as f:
    config_loaded = json.load(f)
assert config_loaded["characters"] is not None
assert config_loaded["output_path"] in continue_path
assert config_loaded["test_delay_epochs"] == 0

# Load the model and run inference
inference_command = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' tts --text 'This is an example.' --config_path {continue_config_path} --model_path {continue_restore_path} --out_path {out_wav_path}"
run_cli(inference_command)

# restore the model and continue training for one more epoch
command_train = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_tts.py --continue_path {continue_path} "
run_cli(command_train)
shutil.rmtree(continue_path)