Back to Repositories

Validating Vocoder Loss Functions Implementation in Coqui-AI TTS

This test suite validates the vocoder loss functions in the Coqui-AI TTS system, focusing on STFT-based losses and MelGAN feature losses. It ensures accurate audio signal processing and feature extraction for neural vocoders.

Test Coverage Overview

The test suite provides comprehensive coverage of vocoder loss functions including:

TorchSTFT implementation verification against librosa
Single-scale STFT loss computation
Multi-scale STFT loss validation
MelGAN feature loss testing across different scales and layers

Edge cases include identical input validation and random input comparison.

Implementation Analysis

The testing approach uses PyTorch tensors and numpy arrays to validate audio processing transforms. It implements comparison-based testing between different STFT implementations and validates loss function behavior using both identical and random inputs.

Key patterns include tensor shape validation, numerical accuracy checks, and multi-scale feature comparison.

Technical Details

Testing tools and configuration:

PyTorch for tensor operations
AudioProcessor for wav file handling
Custom TorchSTFT implementation
Librosa for STFT reference implementation
Base audio configuration with FFT size, hop length, and window length parameters

Best Practices Demonstrated

The test suite demonstrates excellent testing practices including:

Isolated test functions for each loss component
Numerical accuracy validation with appropriate tolerances
Comprehensive edge case handling
Clear test organization with setup and configuration separation
Effective use of assert statements for validation

coqui-ai/tts

tests/vocoder_tests/test_vocoder_losses.py

            
import os

import torch

from tests import get_tests_input_path, get_tests_output_path, get_tests_path
from TTS.config import BaseAudioConfig
from TTS.utils.audio import AudioProcessor
from TTS.utils.audio.numpy_transforms import stft
from TTS.vocoder.layers.losses import MelganFeatureLoss, MultiScaleSTFTLoss, STFTLoss, TorchSTFT

TESTS_PATH = get_tests_path()

OUT_PATH = os.path.join(get_tests_output_path(), "audio_tests")
os.makedirs(OUT_PATH, exist_ok=True)

WAV_FILE = os.path.join(get_tests_input_path(), "example_1.wav")

ap = AudioProcessor(**BaseAudioConfig().to_dict())


def test_torch_stft():
    torch_stft = TorchSTFT(ap.fft_size, ap.hop_length, ap.win_length)
    # librosa stft
    wav = ap.load_wav(WAV_FILE)
    M_librosa = abs(stft(y=wav, fft_size=ap.fft_size, hop_length=ap.hop_length, win_length=ap.win_length))
    # torch stft
    wav = torch.from_numpy(wav[None, :]).float()
    M_torch = torch_stft(wav)
    # check the difference b/w librosa and torch outputs
    assert (M_librosa - M_torch[0].data.numpy()).max() < 1e-5


def test_stft_loss():
    stft_loss = STFTLoss(ap.fft_size, ap.hop_length, ap.win_length)
    wav = ap.load_wav(WAV_FILE)
    wav = torch.from_numpy(wav[None, :]).float()
    loss_m, loss_sc = stft_loss(wav, wav)
    assert loss_m + loss_sc == 0
    loss_m, loss_sc = stft_loss(wav, torch.rand_like(wav))
    assert loss_sc < 1.0
    assert loss_m + loss_sc > 0


def test_multiscale_stft_loss():
    stft_loss = MultiScaleSTFTLoss(
        [ap.fft_size // 2, ap.fft_size, ap.fft_size * 2],
        [ap.hop_length // 2, ap.hop_length, ap.hop_length * 2],
        [ap.win_length // 2, ap.win_length, ap.win_length * 2],
    )
    wav = ap.load_wav(WAV_FILE)
    wav = torch.from_numpy(wav[None, :]).float()
    loss_m, loss_sc = stft_loss(wav, wav)
    assert loss_m + loss_sc == 0
    loss_m, loss_sc = stft_loss(wav, torch.rand_like(wav))
    assert loss_sc < 1.0
    assert loss_m + loss_sc > 0


def test_melgan_feature_loss():
    feats_real = []
    feats_fake = []

    # if all the features are different.
    for _ in range(5):  # different scales
        scale_feats_real = []
        scale_feats_fake = []
        for _ in range(4):  # different layers
            scale_feats_real.append(torch.rand([3, 5, 7]))
            scale_feats_fake.append(torch.rand([3, 5, 7]))
        feats_real.append(scale_feats_real)
        feats_fake.append(scale_feats_fake)

    loss_func = MelganFeatureLoss()
    loss = loss_func(feats_fake, feats_real)
    assert loss.item() <= 1.0

    feats_real = []
    feats_fake = []

    # if all the features are the same
    for _ in range(5):  # different scales
        scale_feats_real = []
        scale_feats_fake = []
        for _ in range(4):  # different layers
            tensor = torch.rand([3, 5, 7])
            scale_feats_real.append(tensor)
            scale_feats_fake.append(tensor)
        feats_real.append(scale_feats_real)
        feats_fake.append(scale_feats_fake)

    loss_func = MelganFeatureLoss()
    loss = loss_func(feats_fake, feats_real)
    assert loss.item() == 0