Skip to content
nerva docs v0.2.1

Testing

Nerva ships a first-class testkit. Every primitive has a spy wrapper that delegates to the real implementation and records every call. No mocks — real code runs, and you assert against what happened.

Philosophy

PrincipleWhat it means
Real code firstSpies wrap real implementations (RuleRouter, InProcessRuntime, etc.) — not fakes
Mock only at boundariesThe only thing you stub is the LLM API. Everything else runs for real.
Expectation queuesSet what a boundary returns with .expect_*(). Expectations are consumed FIFO, then fall back to real behavior.
Zero configTestOrchestrator.build() / createTestOrchestrator() gives you a fully wired system in one call

Quick start

from nerva.testkit import (
TestOrchestrator,
assert_routed_to,
assert_handler_invoked,
assert_no_unconsumed_expectations,
)
from nerva.context import ExecContext
async def test_greeting_agent():
result = TestOrchestrator.build(
handlers={"default": lambda inp, ctx: "Hello!"},
)
# Set an LLM expectation at the boundary
result.runtime.expect_result("Hello from the agent!")
ctx = ExecContext.create(user_id="test-user", session_id="test-session")
response = await result.orchestrator.handle("hi there", ctx)
assert_routed_to(result.router, "default")
assert_handler_invoked(result.runtime, "default")
assert_no_unconsumed_expectations(result)

Spy wrappers

Every Nerva primitive has a corresponding spy. Spies implement the same Protocol / interface, so they drop in anywhere the real primitive is used.

SpyWrapsRecords
SpyRouterRuleRouter (catch-all)classifyCalls
SpyRuntimeInProcessRuntimeinvokeCalls, delegateCalls
SpyResponderPassthroughResponderformatCalls
SpyMemoryTieredMemory + InMemoryHotMemoryrecallCalls, storeCalls
SpyPolicyNoopPolicyEngineevaluateCalls, recordCalls
SpyToolManagerFunctionToolManagerdiscoverCalls, callCalls

Using spies directly

from nerva.testkit import SpyRouter
from nerva.router.rule import RuleRouter
router = SpyRouter(RuleRouter([
{"pattern": "billing.*", "handler": "billing_agent", "intent": "billing"},
{"pattern": ".*", "handler": "general_agent", "intent": "general"},
]))
result = await router.classify("billing question", ctx)
assert len(router.classify_calls) == 1
assert router.classify_calls[0].result.handler == "billing_agent"
assert router.classify_calls[0].was_expected is False # came from real router

Expectation queues

Call .expect_*() to enqueue a canned response. The spy returns expectations in FIFO order, then falls back to the real implementation when the queue is empty.

spy_router.expect_handler("billing_agent", confidence=0.95)
spy_router.expect_handler("support_agent", confidence=0.8)
# First call -> billing_agent (from expectation)
r1 = await spy_router.classify("anything", ctx)
assert r1.handler == "billing_agent"
assert spy_router.classify_calls[0].was_expected is True
# Second call -> support_agent (from expectation)
r2 = await spy_router.classify("anything", ctx)
assert r2.handler == "support_agent"
# Third call -> real RuleRouter runs (no more expectations)
r3 = await spy_router.classify("hello", ctx)
assert spy_router.classify_calls[2].was_expected is False

Expectation methods by spy

SpyMethods
SpyRouterexpect_handler() / expectHandler()
SpyRuntimeexpect_result() / expectResult()
SpyResponderexpect_response() / expectResponse()
SpyMemoryexpect_recall() / expectRecall()
SpyPolicyexpect_allow(), expect_deny() / expectAllow(), expectDeny()
SpyToolManagerexpect_tool_result() / expectToolResult()

TestOrchestrator builder

One call wires all primitives with spy-wrapped real defaults. Override any primitive — the rest stay wired.

from nerva.testkit import TestOrchestrator, DenyAllPolicy
# Default: all spies wrap real in-memory implementations
result = TestOrchestrator.build()
# Override one primitive — the rest use spy-wrapped defaults
result = TestOrchestrator.build(
policy=DenyAllPolicy("budget exceeded"),
)
# Register custom handlers
result = TestOrchestrator.build(
handlers={
"greet": lambda inp, ctx: f"Hello, {inp.message}!",
"farewell": lambda inp, ctx: "Goodbye!",
},
)
# Access all spies
result.router # SpyRouter
result.runtime # SpyRuntime
result.responder # SpyResponder
result.memory # SpyMemory
result.policy # SpyPolicy
result.tools # SpyToolManager
result.orchestrator # Orchestrator (fully wired)
# Lifecycle
result.reset_all() # clear all call histories
result.verify_all_expectations_consumed() # assert no leftover expectations

Assertion helpers

Readable, purpose-built assertions that inspect spy call histories.

from nerva.testkit import (
assert_routed_to,
assert_handler_invoked,
assert_policy_allowed,
assert_policy_denied,
assert_memory_stored,
assert_memory_recalled,
assert_tool_called,
assert_pipeline_order,
assert_no_unconsumed_expectations,
)
assert_routed_to(result.router, "billing_agent")
assert_handler_invoked(result.runtime, "billing_agent")
assert_policy_allowed(result.policy)
assert_policy_denied(result.policy, reason="budget exceeded")
assert_memory_stored(result.memory, content="important fact")
assert_memory_recalled(result.memory, query="previous question")
assert_tool_called(result.tools, "search", args={"query": "nerva"})
assert_pipeline_order(result, ["router", "runtime", "responder"])
assert_no_unconsumed_expectations(result)

Boundary stubs

For the lowest-level external boundaries (LLM API calls), use boundary stubs instead of spies.

StubLLMHandler

Returns canned responses in sequence. When the queue is empty, returns a default.

from nerva.testkit import StubLLMHandler
stub = StubLLMHandler(
responses=["First answer", "Second answer"],
default_response="Fallback answer",
)
r1 = await stub.handle(input, ctx) # -> "First answer"
r2 = await stub.handle(input, ctx) # -> "Second answer"
r3 = await stub.handle(input, ctx) # -> "Fallback answer"
assert stub.call_count == 3

DenyAllPolicy / AllowAllPolicy

Deterministic policy engines for testing permission boundaries.

from nerva.testkit import DenyAllPolicy, AllowAllPolicy
# Every action is denied
deny = DenyAllPolicy(reason="test: always deny")
decision = await deny.evaluate(action, ctx)
assert decision.allowed is False
# Every action is allowed
allow = AllowAllPolicy()
decision = await allow.evaluate(action, ctx)
assert decision.allowed is True

Pytest fixtures

Register the testkit’s fixtures in your conftest.py:

conftest.py
from nerva.testkit.fixtures import * # noqa: F401, F403

Available fixtures:

FixtureTypeDescription
ctxExecContextFresh execution context (user_id="test-user")
test_orchestratorTestOrchestratorResultFully wired orchestrator with all spies
spy_routerSpyRouterStandalone spy router
spy_runtimeSpyRuntimeStandalone spy runtime
spy_responderSpyResponderStandalone spy responder
spy_memorySpyMemoryStandalone spy memory
spy_policySpyPolicyStandalone spy policy
spy_toolsSpyToolManagerStandalone spy tool manager

Testing patterns

Test that policy blocks an action

async def test_policy_blocks_expensive_agent(test_orchestrator):
orch = test_orchestrator
orch.policy.expect_deny(reason="budget exceeded")
ctx = ExecContext.create(user_id="test-user", session_id="s1")
response = await orch.orchestrator.handle("do expensive thing", ctx)
assert_policy_denied(orch.policy, reason="budget exceeded")

Test memory recall influences agent behavior

async def test_memory_provides_context(test_orchestrator):
orch = test_orchestrator
orch.memory.expect_recall(MemoryContext(
conversation=[
{"role": "user", "content": "My name is Alice"},
{"role": "assistant", "content": "Nice to meet you, Alice!"},
],
))
ctx = ExecContext.create(user_id="test-user", session_id="s1")
await orch.orchestrator.handle("What's my name?", ctx)
assert_memory_recalled(orch.memory, query="What's my name?")

Test the full pipeline order

async def test_pipeline_executes_in_order(test_orchestrator):
orch = test_orchestrator
ctx = ExecContext.create(user_id="test-user", session_id="s1")
await orch.orchestrator.handle("hello", ctx)
assert_pipeline_order(orch, ["router", "runtime", "responder"])
assert_no_unconsumed_expectations(orch)

Testing pyramid

LayerReal codeStubbed
Unit (default)All primitives (in-memory)Nothing
Integration+ MCP ArmorLLM API only
E2EEverything + StubLLMHandlerNothing