Back to Repositories

Testing DOCX Document Processing Implementation in QuivrHQ/quivr

This test suite evaluates the DOCXProcessor implementation in Quivr, focusing on handling DOCX file processing and validation. The tests verify both successful document processing and proper error handling for invalid file types.

Test Coverage Overview

The test suite provides comprehensive coverage of DOCX file processing functionality.

Key areas tested include:
  • Successful DOCX file processing and content extraction
  • Error handling for incompatible file types
  • Validation of processor output integrity
  • File extension validation

Implementation Analysis

The testing approach utilizes pytest’s asynchronous testing capabilities with the @pytest.mark.asyncio decorator. The implementation leverages pytest’s fixture system and skip markers to handle unstructured dependencies, demonstrating integration with the unstructured library for DOCX processing.

Technical patterns include:
  • Async/await pattern for file processing
  • UUID-based file identification
  • Exception handling verification

Technical Details

Testing tools and configuration:
  • pytest framework with asyncio support
  • unstructured library for DOCX processing
  • Custom QuivrFile class implementation
  • Path handling via pathlib
  • UUID generation for unique identifiers

Best Practices Demonstrated

The test suite exemplifies several testing best practices in Python.

Notable practices include:
  • Proper test isolation and dependency management
  • Explicit test case separation for success and failure scenarios
  • Use of pytest fixtures for test data management
  • Clear assertion patterns and error validation
  • Appropriate use of markers for test categorization

quivrhq/quivr

core/tests/processor/docx/test_docx.py

            
from pathlib import Path
from uuid import uuid4

import pytest
from quivr_core.files.file import FileExtension, QuivrFile
from quivr_core.processor.implementations.default import DOCXProcessor

unstructured = pytest.importorskip("unstructured")


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_docx_filedocx():
    p = Path("./tests/processor/docx/demo.docx")
    f = QuivrFile(
        id=uuid4(),
        brain_id=uuid4(),
        original_filename=p.stem,
        path=p,
        file_extension=FileExtension.docx,
        file_sha1="123",
    )
    processor = DOCXProcessor()
    result = await processor.process_file(f)
    assert len(result) > 0


@pytest.mark.unstructured
@pytest.mark.asyncio
async def test_docx_processor_fail(quivr_txt):
    processor = DOCXProcessor()
    with pytest.raises(ValueError):
        await processor.process_file(quivr_txt)