Back to Repositories

Testing Tika Document Processing Integration in Quivr

This test suite validates the TikaProcessor implementation in Quivr, focusing on PDF document processing and error handling capabilities. The tests verify both successful document parsing and proper exception handling when the Tika server is unavailable.

Test Coverage Overview

The test suite provides coverage for core TikaProcessor functionality with particular focus on PDF document processing. Key test scenarios include:

  • Successful PDF content extraction and validation
  • Error handling for invalid Tika server configurations
  • Runtime exception verification for server connectivity issues

Implementation Analysis

The testing approach utilizes pytest’s asynchronous testing capabilities with the @pytest.mark.asyncio decorator for handling async operations. The implementation employs fixture-based testing patterns with the quivr_pdf fixture and leverages pytest’s built-in exception testing through pytest.raises().

Technical Details

Testing tools and configuration:

  • pytest framework with asyncio support
  • Custom markers (@pytest.mark.tika)
  • TikaProcessor class implementation
  • Mock Tika server configuration for error testing
  • PDF fixture management

Best Practices Demonstrated

The test suite exemplifies several testing best practices:

  • Isolated test cases with clear separation of concerns
  • Proper exception handling verification
  • Use of markers for test categorization
  • Async/await pattern implementation
  • Fixture-based test data management

quivrhq/quivr

core/tests/processor/test_tika_processor.py

            
import pytest
from quivr_core.processor.implementations.tika_processor import TikaProcessor

# TODO: TIKA server should be set


@pytest.mark.tika
@pytest.mark.asyncio
async def test_process_file(quivr_pdf):
    tparser = TikaProcessor()
    doc = await tparser.process_file(quivr_pdf)
    assert len(doc) > 0
    assert doc[0].page_content.strip("
") == "Dummy PDF download"


@pytest.mark.tika
@pytest.mark.asyncio
async def test_send_parse_tika_exception(quivr_pdf):
    # TODO: Mock correct tika for retries
    tparser = TikaProcessor(tika_url="test.test")
    with pytest.raises(RuntimeError):
        doc = await tparser.process_file(quivr_pdf)
        assert len(doc) > 0
        assert doc[0].page_content.strip("
") == "Dummy PDF download"