Back to Repositories

Testing Platform Evaluation Dataset Management in llama_index

This test suite evaluates the platform evaluation functionality in llama_index, specifically focusing on dataset upload and management capabilities. It verifies the integration between local test environments and the Llama Cloud platform while handling evaluation datasets and questions.

Test Coverage Overview

The test coverage focuses on the evaluation dataset upload functionality and cloud platform integration.

Key areas tested include:

Dataset creation and naming with Python version specificity
Project management and dataset overwriting capabilities
Question batch handling and verification
Cloud platform API interaction and response validation

Implementation Analysis

The implementation utilizes pytest’s conditional test execution with environment-based skipping logic.

Notable patterns include:

Environment variable validation for cloud credentials
Integration test marking for CI/CD pipeline organization
Dynamic test dataset naming to prevent CI test conflicts
Cloud client initialization and API interaction verification

Technical Details

Testing infrastructure includes:

pytest framework for test organization
Environment variable configuration for cloud connectivity
LlamaCloud client for API interactions
System version tracking for test isolation
Conditional test execution using pytest.mark.skipif

Best Practices Demonstrated

The test implementation showcases several testing best practices for cloud-integrated applications.

Notable practices include:

Environment-aware test configuration
Isolated test data management
Clear assertion patterns for validation
Integration test separation with markers
Proper error handling for missing credentials

run-llama/llama_index

llama-index-core/tests/evaluation/test_platform_eval.py

            
import os
import sys

import pytest
from llama_index.core.evaluation.eval_utils import upload_eval_dataset

base_url = os.environ.get("LLAMA_CLOUD_BASE_URL", None)
api_key = os.environ.get("LLAMA_CLOUD_API_KEY", None)
python_version = sys.version


@pytest.mark.skipif(
    not base_url or not api_key, reason="No platform base url or api keyset"
)
@pytest.mark.integration()
def test_upload_eval_dataset() -> None:
    from llama_cloud.client import LlamaCloud

    eval_dataset_id = upload_eval_dataset(
        "test_dataset" + python_version,  # avoid CI test clashes
        project_name="test_project" + python_version,
        questions=["foo", "bar"],
        overwrite=True,
    )

    client = LlamaCloud(base_url=base_url, token=api_key)
    eval_dataset = client.evals.get_dataset(dataset_id=eval_dataset_id)
    assert eval_dataset.name == "test_dataset" + python_version

    eval_questions = client.evals.get_questions(dataset_id=eval_dataset_id)
    assert len(eval_questions) == 2