Back to Repositories

Testing Document Ingestion API Endpoints in private-gpt

This test suite validates the ingestion functionality of the private-gpt server, focusing on file upload handling and text processing capabilities. The tests verify different file format support and document management operations.

Test Coverage Overview

The test suite provides comprehensive coverage of the ingestion API endpoints, including file upload and text processing.

Key areas tested include:
  • Text file (.txt) ingestion
  • PDF file ingestion
  • Plain text ingestion via API
  • Document listing functionality
  • Document count verification

Implementation Analysis

The tests utilize FastAPI’s TestClient for HTTP endpoint testing and leverage Python’s tempfile module for temporary file handling. The implementation follows a fixture-based approach using the IngestHelper class to abstract common ingestion operations.

Notable patterns include:
  • Isolated test environments using temporary files
  • Response validation using Pydantic models
  • File path handling with Path objects

Technical Details

Testing tools and configuration:
  • FastAPI TestClient for API testing
  • Pytest fixtures for test setup
  • Path from pathlib for file operations
  • tempfile for temporary file management
  • Custom IngestHelper class for ingestion operations
  • Pydantic models for response validation

Best Practices Demonstrated

The test suite demonstrates several testing best practices including isolation of test cases, proper resource cleanup, and comprehensive API validation.

Notable practices:
  • Independent test cases with clear assertions
  • Proper temporary resource management
  • Consistent response validation
  • Reusable test helpers and fixtures
  • Clear test naming conventions

zylon-ai/private-gpt

tests/server/ingest/test_ingest_routes.py

            
import tempfile
from pathlib import Path

from fastapi.testclient import TestClient

from private_gpt.server.ingest.ingest_router import IngestResponse
from tests.fixtures.ingest_helper import IngestHelper


def test_ingest_accepts_txt_files(ingest_helper: IngestHelper) -> None:
    path = Path(__file__).parents[0] / "test.txt"
    ingest_result = ingest_helper.ingest_file(path)
    assert len(ingest_result.data) == 1


def test_ingest_accepts_pdf_files(ingest_helper: IngestHelper) -> None:
    path = Path(__file__).parents[0] / "test.pdf"
    ingest_result = ingest_helper.ingest_file(path)
    assert len(ingest_result.data) == 1


def test_ingest_list_returns_something_after_ingestion(
    test_client: TestClient, ingest_helper: IngestHelper
) -> None:
    response_before = test_client.get("/v1/ingest/list")
    count_ingest_before = len(response_before.json()["data"])
    with tempfile.NamedTemporaryFile("w", suffix=".txt") as test_file:
        test_file.write("Foo bar; hello there!")
        test_file.flush()
        test_file.seek(0)
        ingest_result = ingest_helper.ingest_file(Path(test_file.name))
    assert len(ingest_result.data) == 1, "The temp doc should have been ingested"
    response_after = test_client.get("/v1/ingest/list")
    count_ingest_after = len(response_after.json()["data"])
    assert (
        count_ingest_after == count_ingest_before + 1
    ), "The temp doc should be returned"


def test_ingest_plain_text(test_client: TestClient) -> None:
    response = test_client.post(
        "/v1/ingest/text", json={"file_name": "file_name", "text": "text"}
    )
    assert response.status_code == 200
    ingest_result = IngestResponse.model_validate(response.json())
    assert len(ingest_result.data) == 1