Back to Repositories

Testing Platform Evaluation Dataset Management in llama_index

This test suite evaluates the platform evaluation functionality in llama_index, specifically focusing on dataset upload and management capabilities. It verifies the integration between local test environments and the Llama Cloud platform while handling evaluation datasets and questions.

Test Coverage Overview

The test coverage focuses on the evaluation dataset upload functionality and cloud platform integration.

Key areas tested include:
  • Dataset creation and naming with Python version specificity
  • Project management and dataset overwriting capabilities
  • Question batch handling and verification
  • Cloud platform API interaction and response validation

Implementation Analysis

The implementation utilizes pytest’s conditional test execution with environment-based skipping logic.

Notable patterns include:
  • Environment variable validation for cloud credentials
  • Integration test marking for CI/CD pipeline organization
  • Dynamic test dataset naming to prevent CI test conflicts
  • Cloud client initialization and API interaction verification

Technical Details

Testing infrastructure includes:
  • pytest framework for test organization
  • Environment variable configuration for cloud connectivity
  • LlamaCloud client for API interactions
  • System version tracking for test isolation
  • Conditional test execution using pytest.mark.skipif

Best Practices Demonstrated

The test implementation showcases several testing best practices for cloud-integrated applications.

Notable practices include:
  • Environment-aware test configuration
  • Isolated test data management
  • Clear assertion patterns for validation
  • Integration test separation with markers
  • Proper error handling for missing credentials

run-llama/llama_index

llama-index-core/tests/evaluation/test_platform_eval.py

            
import os
import sys

import pytest
from llama_index.core.evaluation.eval_utils import upload_eval_dataset

base_url = os.environ.get("LLAMA_CLOUD_BASE_URL", None)
api_key = os.environ.get("LLAMA_CLOUD_API_KEY", None)
python_version = sys.version


@pytest.mark.skipif(
    not base_url or not api_key, reason="No platform base url or api keyset"
)
@pytest.mark.integration()
def test_upload_eval_dataset() -> None:
    from llama_cloud.client import LlamaCloud

    eval_dataset_id = upload_eval_dataset(
        "test_dataset" + python_version,  # avoid CI test clashes
        project_name="test_project" + python_version,
        questions=["foo", "bar"],
        overwrite=True,
    )

    client = LlamaCloud(base_url=base_url, token=api_key)
    eval_dataset = client.evals.get_dataset(dataset_id=eval_dataset_id)
    assert eval_dataset.name == "test_dataset" + python_version

    eval_questions = client.evals.get_questions(dataset_id=eval_dataset_id)
    assert len(eval_questions) == 2