Back to Repositories

Testing Korean Phonemization Implementation in Coqui-AI TTS

This test suite validates the Korean text phonemization functionality in the Coqui-AI TTS library, focusing on converting Korean text to phonetic representations in both Korean and English character sets.

Test Coverage Overview

The test suite provides comprehensive coverage of Korean text phonemization, testing both native Korean and English romanization outputs.

Key areas tested include:
  • Basic Korean sentence phonemization
  • Numerical text conversion
  • Mixed alphabet handling
  • English romanization of Korean text
  • Special character processing

Implementation Analysis

The implementation uses Python’s unittest framework with a data-driven approach, utilizing predefined test cases stored in string constants. The testing pattern employs string splitting and comparison to verify phoneme conversion accuracy across multiple text samples.

Technical implementation features:
  • Split-based test case parsing
  • Direct assertion comparison
  • Character set switching capability

Technical Details

Testing tools and configuration:
  • unittest framework
  • korean_text_to_phonemes function from TTS.tts.utils.text.korean.phonemizer
  • Predefined test cases in _TEST_CASES and _TEST_CASES_EN
  • Character set parameter for switching between Korean and English output

Best Practices Demonstrated

The test suite exemplifies several testing best practices in Python unit testing.

Notable practices include:
  • Structured test data separation
  • Clear input/expected output pairs
  • Consistent test case formatting
  • Multiple character set support
  • Comprehensive edge case coverage

coqui-ai/tts

tests/text_tests/test_korean_phonemizer.py

            
import unittest

from TTS.tts.utils.text.korean.phonemizer import korean_text_to_phonemes

_TEST_CASES = """
포상은 열심히 한 아이에게만 주어지기 때문에 포상인 것입니다./포상으 녈심히 하 나이에게만 주어지기 때무네 포상인 거심니다.
오늘은 8월 31일 입니다./오느른 파뤌 삼시비리 림니다.
친구 100명 만들기가 목표입니다./친구 뱅명 만들기가 목표임니다.
A부터 Z까지 입니다./에이부터 제트까지 임니다.
이게 제 마음이에요./이게 제 마으미에요.
"""
_TEST_CASES_EN = """
이제야 이쪽을 보는구나./IJeYa IJjoGeul BoNeunGuNa.
크고 맛있는 cake를 부탁해요./KeuGo MaSinNeun KeIKeuLeul BuTaKaeYo.
전부 거짓말이야./JeonBu GeoJinMaLiYa.
좋은 노래를 찾았어요./JoEun NoLaeLeul ChaJaSseoYo.
"""


class TestText(unittest.TestCase):
    def test_korean_text_to_phonemes(self):
        for line in _TEST_CASES.strip().split("
"):
            text, phone = line.split("/")
            self.assertEqual(korean_text_to_phonemes(text), phone)
        for line in _TEST_CASES_EN.strip().split("
"):
            text, phone = line.split("/")
            self.assertEqual(korean_text_to_phonemes(text, character="english"), phone)


if __name__ == "__main__":
    unittest.main()