Back to Repositories

Validating Phoneme Detection System in Coqui-TTS

This test suite validates the phoneme detection functionality in the Coqui-TTS system, focusing on unique phoneme identification and handling across different language configurations. It ensures proper phoneme extraction and caching mechanisms for text-to-speech processing.

Test Coverage Overview

The test suite covers essential phoneme detection scenarios with both eSpeak and non-eSpeak configurations. Key functionality includes:

  • Phoneme extraction for English language
  • Configuration validation for VITS model settings
  • Phoneme cache path verification
  • Dataset configuration handling

Implementation Analysis

The implementation uses unittest framework with dedicated test methods for different phoneme processing approaches. It employs PyTorch for configuration management and implements systematic test cases for both eSpeak and non-eSpeak phoneme detection scenarios.

The testing pattern follows a configuration-driven approach with explicit dataset and model parameters.

Technical Details

  • Testing Framework: unittest
  • Dependencies: PyTorch, TTS configurations
  • Test Data: LJSpeech dataset format
  • Configuration: VitsConfig with customizable parameters
  • Environment Control: CUDA device management

Best Practices Demonstrated

The test suite exemplifies several testing best practices:

  • Isolation of test cases for different phoneme processing methods
  • Proper configuration management and file handling
  • Clear separation of test data and configuration
  • Controlled environment setup for reproducible tests

coqui-ai/tts

tests/aux_tests/test_find_unique_phonemes.py

            
import os
import unittest

import torch

from tests import get_tests_output_path, run_cli
from TTS.config.shared_configs import BaseDatasetConfig
from TTS.tts.configs.vits_config import VitsConfig

torch.manual_seed(1)

config_path = os.path.join(get_tests_output_path(), "test_model_config.json")

dataset_config_en = BaseDatasetConfig(
    formatter="ljspeech",
    meta_file_train="metadata.csv",
    meta_file_val="metadata.csv",
    path="tests/data/ljspeech",
    language="en",
)

"""
dataset_config_pt = BaseDatasetConfig(
    formatter="ljspeech",
    meta_file_train="metadata.csv",
    meta_file_val="metadata.csv",
    path="tests/data/ljspeech",
    language="pt-br",
)
"""


# pylint: disable=protected-access
class TestFindUniquePhonemes(unittest.TestCase):
    @staticmethod
    def test_espeak_phonemes():
        # prepare the config
        config = VitsConfig(
            batch_size=2,
            eval_batch_size=2,
            num_loader_workers=0,
            num_eval_loader_workers=0,
            text_cleaner="english_cleaners",
            use_phonemes=True,
            phoneme_language="en-us",
            phoneme_cache_path="tests/data/ljspeech/phoneme_cache/",
            run_eval=True,
            test_delay_epochs=-1,
            epochs=1,
            print_step=1,
            print_eval=True,
            datasets=[dataset_config_en],
        )
        config.save_json(config_path)

        # run test
        run_cli(f'CUDA_VISIBLE_DEVICES="" python TTS/bin/find_unique_phonemes.py --config_path "{config_path}"')

    @staticmethod
    def test_no_espeak_phonemes():
        # prepare the config
        config = VitsConfig(
            batch_size=2,
            eval_batch_size=2,
            num_loader_workers=0,
            num_eval_loader_workers=0,
            text_cleaner="english_cleaners",
            use_phonemes=True,
            phoneme_language="en-us",
            phoneme_cache_path="tests/data/ljspeech/phoneme_cache/",
            run_eval=True,
            test_delay_epochs=-1,
            epochs=1,
            print_step=1,
            print_eval=True,
            datasets=[dataset_config_en],
        )
        config.save_json(config_path)

        # run test
        run_cli(f'CUDA_VISIBLE_DEVICES="" python TTS/bin/find_unique_phonemes.py --config_path "{config_path}"')