Back to Repositories

Testing ODT Document Processing Implementation in QuivrHQ/quivr

This test suite validates the ODT file processing functionality in Quivr, focusing on handling OpenDocument Text files. The tests ensure proper processing of valid ODT files and appropriate error handling for invalid cases.

Test Coverage Overview

The test suite provides comprehensive coverage of ODT file processing capabilities.

Key areas tested include:
  • Successful processing of valid ODT files
  • Error handling for invalid ODT files
  • File extension validation
  • Integration with the QuivrFile system

Implementation Analysis

The testing approach uses pytest’s asynchronous testing capabilities with the @pytest.mark.asyncio decorator. The implementation leverages pytest’s fixture system and the unstructured library for ODT processing, with specific markers (@pytest.mark.unstructured) to control test execution.

Key patterns include async/await handling, UUID generation for file identification, and PathLib usage for file operations.

Technical Details

Testing tools and configuration:
  • pytest for test framework
  • unstructured library for ODT processing
  • PathLib for file path handling
  • UUID for unique identifier generation
  • Custom QuivrFile class implementation
  • Async/await for asynchronous operations

Best Practices Demonstrated

The test suite exemplifies several testing best practices in Python.

Notable practices include:
  • Proper test isolation and setup
  • Explicit error case testing
  • Use of pytest markers for test categorization
  • Clean separation of test cases
  • Effective use of pytest’s built-in assertion mechanisms

quivrhq/quivr

core/tests/processor/odt/test_odt.py

            
from pathlib import Path
from uuid import uuid4

import pytest
from quivr_core.files.file import FileExtension, QuivrFile
from quivr_core.processor.implementations.default import ODTProcessor

unstructured = pytest.importorskip("unstructured")


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_odt_processor():
    p = Path("./tests/processor/odt/sample.odt")
    f = QuivrFile(
        id=uuid4(),
        brain_id=uuid4(),
        original_filename=p.stem,
        path=p,
        file_extension=FileExtension.odt,
        file_sha1="123",
    )
    processor = ODTProcessor()
    result = await processor.process_file(f)
    assert len(result) > 0


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_odt_processor_fail():
    p = Path("./tests/processor/odt/bad_odt.odt")
    f = QuivrFile(
        id=uuid4(),
        brain_id=uuid4(),
        original_filename=p.stem,
        path=p,
        file_extension=FileExtension.txt,
        file_sha1="123",
    )
    processor = ODTProcessor()
    with pytest.raises(ValueError):
        await processor.process_file(f)