Back to Repositories

Testing SpeedySpeech TTS Training Implementation in Coqui-AI/TTS

This test suite validates the training and inference functionality of the SpeedySpeech TTS model implementation in the Coqui-AI/TTS repository. It covers model configuration, training initialization, checkpointing, and inference capabilities.

Test Coverage Overview

The test suite provides comprehensive coverage of SpeedySpeech model functionality:

Model configuration and initialization
Training pipeline validation
Checkpoint management and restoration
Inference pipeline verification
Configuration persistence and loading

Implementation Analysis

The testing approach implements a full training-inference cycle validation. It utilizes CLI commands to simulate real-world usage patterns, validates configuration persistence, and verifies model checkpoint management. The test leverages the LJSpeech dataset format for training data.

Technical Details

Uses CUDA device management for GPU testing
Implements file system operations for model artifacts
Validates JSON configuration persistence
Employs CLI command execution for training and inference
Utilizes checkpoint management utilities

Best Practices Demonstrated

The test exemplifies robust testing practices including:

Environment isolation and cleanup
Comprehensive configuration validation
End-to-end workflow testing
Resource management and cleanup
Explicit test data management

coqui-ai/tts

tests/tts_tests/test_speedy_speech_train.py

            
import glob
import json
import os
import shutil

from trainer import get_last_checkpoint

from tests import get_device_id, get_tests_output_path, run_cli
from TTS.tts.configs.speedy_speech_config import SpeedySpeechConfig

config_path = os.path.join(get_tests_output_path(), "test_speedy_speech_config.json")
output_path = os.path.join(get_tests_output_path(), "train_outputs")


config = SpeedySpeechConfig(
    batch_size=8,
    eval_batch_size=8,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    text_cleaner="english_cleaners",
    use_phonemes=True,
    phoneme_language="en-us",
    phoneme_cache_path="tests/data/ljspeech/phoneme_cache/",
    run_eval=True,
    test_delay_epochs=-1,
    epochs=1,
    print_step=1,
    print_eval=True,
    test_sentences=[
        "Be a voice, not an echo.",
    ],
)
config.audio.do_trim_silence = True
config.audio.trim_db = 60
config.save_json(config_path)

# train the model for one epoch
command_train = (
    f"CUDA_VISIBLE_DEVICES='{get_device_id()}'  python TTS/bin/train_tts.py --config_path {config_path}  "
    f"--coqpit.output_path {output_path} "
    "--coqpit.datasets.0.formatter ljspeech "
    "--coqpit.datasets.0.meta_file_train metadata.csv "
    "--coqpit.datasets.0.meta_file_val metadata.csv "
    "--coqpit.datasets.0.path tests/data/ljspeech "
    "--coqpit.datasets.0.meta_file_attn_mask tests/data/ljspeech/metadata_attn_mask.txt "
    "--coqpit.test_delay_epochs 0"
)
run_cli(command_train)

# Find latest folder
continue_path = max(glob.glob(os.path.join(output_path, "*/")), key=os.path.getmtime)

# Inference using TTS API
continue_config_path = os.path.join(continue_path, "config.json")
continue_restore_path, _ = get_last_checkpoint(continue_path)
out_wav_path = os.path.join(get_tests_output_path(), "output.wav")

# Check integrity of the config
with open(continue_config_path, "r", encoding="utf-8") as f:
    config_loaded = json.load(f)
assert config_loaded["characters"] is not None
assert config_loaded["output_path"] in continue_path
assert config_loaded["test_delay_epochs"] == 0

# Load the model and run inference
inference_command = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' tts --text 'This is an example for it.' --config_path {continue_config_path} --model_path {continue_restore_path} --out_path {out_wav_path}"
run_cli(inference_command)

# restore the model and continue training for one more epoch
command_train = f"CUDA_VISIBLE_DEVICES='{get_device_id()}' python TTS/bin/train_tts.py --continue_path {continue_path} "
run_cli(command_train)
shutil.rmtree(continue_path)