Back to Repositories

Testing EPUB File Processing Implementation in QuivrHQ/quivr

A comprehensive test suite for validating EPUB file processing functionality in the Quivr application. These tests verify the EpubProcessor’s ability to handle various EPUB files and error conditions using pytest’s asynchronous testing capabilities.

Test Coverage Overview

The test suite provides thorough coverage of EPUB processing scenarios.

Key areas tested include:
  • Empty EPUB file processing (page-blanche.epub)
  • Valid EPUB content extraction (sway.epub)
  • Error handling for invalid file types
Integration points focus on the QuivrFile system and the unstructured library for content processing.

Implementation Analysis

The testing approach utilizes pytest’s asynchronous testing features with the @pytest.mark.asyncio decorator for handling async operations. The implementation follows fixture-based testing patterns and employs pytest.raises for exception validation.

Framework-specific features include:
  • pytest.importorskip for conditional test execution
  • Custom markers (@pytest.mark.unstructured)
  • Async/await pattern for file processing

Technical Details

Testing tools and configuration:
  • pytest as the primary testing framework
  • unstructured library for EPUB processing
  • Path from pathlib for file handling
  • UUID generation for unique identifiers
  • Custom QuivrFile class for file representation

Best Practices Demonstrated

The test suite exemplifies high-quality testing practices through comprehensive error case coverage and clear test organization. Notable practices include:
  • Isolated test cases for different scenarios
  • Proper exception handling verification
  • Use of fixtures for common test data
  • Clear test naming conventions
  • Appropriate use of async/await patterns

quivrhq/quivr

core/tests/processor/epub/test_epub_processor.py

            
from pathlib import Path
from uuid import uuid4

import pytest
from quivr_core.files.file import FileExtension, QuivrFile
from quivr_core.processor.implementations.default import EpubProcessor

unstructured = pytest.importorskip("unstructured")


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_epub_page_blanche():
    p = Path("./tests/processor/epub/page-blanche.epub")
    f = QuivrFile(
        id=uuid4(),
        brain_id=uuid4(),
        original_filename=p.stem,
        path=p,
        file_extension=FileExtension.epub,
        file_sha1="123",
    )
    processor = EpubProcessor()
    result = await processor.process_file(f)
    assert len(result) == 0


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_epub_processor():
    p = Path("./tests/processor/epub/sway.epub")
    f = QuivrFile(
        id=uuid4(),
        brain_id=uuid4(),
        original_filename=p.stem,
        path=p,
        file_extension=FileExtension.epub,
        file_sha1="123",
    )

    processor = EpubProcessor()
    result = await processor.process_file(f)
    assert len(result) > 0


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_epub_processor_fail(quivr_txt):
    processor = EpubProcessor()
    with pytest.raises(ValueError):
        await processor.process_file(quivr_txt)