Back to Repositories

Testing Structured Planner Agent Workflows in LlamaIndex

This test suite validates the functionality of a structured planner agent implementation in LlamaIndex, focusing on plan creation, task execution, and state management. It tests the core planning and execution workflow using mock LLM responses and dummy tools.

Test Coverage Overview

The test suite provides comprehensive coverage of the StructuredPlannerAgent’s core functionalities.

  • Plan creation and management
  • Task dependency handling
  • State tracking for completed and remaining tasks
  • Plan refinement capabilities
  • Integration with ReActAgentWorker

Implementation Analysis

The testing approach uses a mock LLM implementation to simulate planned responses and validate agent behavior. It employs a structured testing pattern with clear setup, execution, and verification phases.

Key technical aspects include:
  • Custom LLM implementation with controlled responses
  • Function tool integration
  • State management verification
  • Task dependency resolution

Technical Details

Testing components and configuration:

  • MockLLM class implementing CustomLLM interface
  • FunctionTool for basic operation testing
  • Plan and SubTask data structures
  • ReActAgentWorker integration
  • State tracking through plan_dict

Best Practices Demonstrated

The test implementation showcases several testing best practices for agent-based systems.

  • Isolated testing environment with mocked dependencies
  • Step-by-step verification of state changes
  • Clear assertion patterns
  • Comprehensive workflow validation
  • Modular test structure

run-llama/llama_index

llama-index-core/tests/agent/runner/test_planner.py

            
from typing import Any

from llama_index.core.agent import ReActAgentWorker, StructuredPlannerAgent
from llama_index.core.agent.runner.planner import Plan, SubTask
from llama_index.core.llms.custom import CustomLLM
from llama_index.core.llms import LLMMetadata, CompletionResponse, CompletionResponseGen
from llama_index.core.tools import FunctionTool


class MockLLM(CustomLLM):
    @property
    def metadata(self) -> LLMMetadata:
        """LLM metadata.

        Returns:
            LLMMetadata: LLM metadata containing various information about the LLM.
        """
        return LLMMetadata()

    def complete(
        self, prompt: str, formatted: bool = False, **kwargs: Any
    ) -> CompletionResponse:
        if "CREATE A PLAN" in prompt:
            text = Plan(
                sub_tasks=[
                    SubTask(
                        name="one", input="one", expected_output="one", dependencies=[]
                    ),
                    SubTask(
                        name="two", input="two", expected_output="two", dependencies=[]
                    ),
                    SubTask(
                        name="three",
                        input="three",
                        expected_output="three",
                        dependencies=["one", "two"],
                    ),
                ]
            ).model_dump_json()
            return CompletionResponse(text=text)

        # dummy response for react
        return CompletionResponse(text="Final Answer: All done")

    def stream_complete(
        self, prompt: str, formatted: bool = False, **kwargs: Any
    ) -> CompletionResponseGen:
        raise NotImplementedError


def dummy_function(a: int, b: int) -> int:
    """A dummy function that adds two numbers together."""
    return a + b


def test_planner_agent() -> None:
    dummy_tool = FunctionTool.from_defaults(fn=dummy_function)
    dummy_llm = MockLLM()

    worker = ReActAgentWorker.from_tools([dummy_tool], llm=dummy_llm)
    agent = StructuredPlannerAgent(worker, tools=[dummy_tool], llm=dummy_llm)

    # create a plan
    plan_id = agent.create_plan("CREATE A PLAN")
    plan = agent.state.plan_dict[plan_id]
    assert plan is not None
    assert len(plan.sub_tasks) == 3
    assert len(agent.state.get_completed_sub_tasks(plan_id)) == 0
    assert len(agent.state.get_remaining_subtasks(plan_id)) == 3
    assert len(agent.state.get_next_sub_tasks(plan_id)) == 2

    next_tasks = agent.state.get_next_sub_tasks(plan_id)

    for task in next_tasks:
        response = agent.run_task(task.name)
        agent.state.add_completed_sub_task(plan_id, task)

    assert len(agent.state.get_completed_sub_tasks(plan_id)) == 2

    next_tasks = agent.state.get_next_sub_tasks(plan_id)
    assert len(next_tasks) == 1

    # will insert the original dummy plan again
    agent.refine_plan("CREATE A PLAN", plan_id)

    assert len(plan.sub_tasks) == 3
    assert len(agent.state.get_completed_sub_tasks(plan_id)) == 2
    assert len(agent.state.get_remaining_subtasks(plan_id)) == 1
    assert len(agent.state.get_next_sub_tasks(plan_id)) == 1