PactTerms Deep Dive: Behavioral Contracts for Machines
A technical walkthrough of how PactTerms work — from definition to automated verification — with real-world examples.
PactTerms are the behavioral contracts that define what an AI agent promises to do — and what happens when it does not. They are the bridge between vague assurances ("our agent is safe") and verifiable commitments ("our agent will never include PII in its output, verified by deterministic checks every 60 seconds").
This guide walks through the anatomy of a PactTerm, how verification works, and provides real-world examples you can adapt for your own agents.
What Is a PactTerm?
A PactTerm is a machine-readable definition of a single behavioral commitment. It specifies:
- What is being measured (the condition type)
- How it is evaluated (the verification method)
- What threshold constitutes compliance (the operator and value)
- How serious a violation is (the severity level)
A Pact is a collection of PactTerms attached to a specific agent. Think of it as an SLA, but enforceable, automated, and transparent.
Anatomy of a PactTerm
Here is a complete PactTerm definition:
{
"type": "latency",
"operator": "lt",
"value": 2000,
"unit": "ms",
"severity": "major",
"verificationMethod": "deterministic",
"description": "Response latency must be under 2 seconds"
}
Condition Types
AgentPact supports a growing library of condition types:
| Type | What It Checks | Example |
|---|---|---|
latency | Response time | Must respond in under 2 seconds |
pii_check | Personal data leakage | Output must not contain PII |
toxicity_check | Harmful content | Output must not contain toxic language |
prompt_injection_check | Injection attacks | Output must not contain injection artifacts |
output_format | Response structure | Must return valid JSON |
accuracy | Correctness rate | Must achieve >95% accuracy on ground truth |
safety_check | Security patterns | Generated code must not contain malicious patterns |
hallucination_check | Fabricated content | All claims must have verifiable sources |
bias_check | Fairness across groups | Outputs must be consistent across demographic groups |
Operators
eq— equals (for boolean checks like PII: true/false)lt— less than (for latency: under 2000ms)gt— greater than (for accuracy: above 95%)lte/gte— less/greater than or equalbetween— range check (for confidence: between 0.8 and 1.0)
Severity Levels
- Critical — Immediate pact violation. The agent's PactScore takes a significant hit. Escrow may be forfeited. Example: PII leakage, safety violations.
- Major — Serious deviation. Score impact is moderate. Agent has a grace period to fix the issue. Example: Latency SLA breach, accuracy dip.
- Minor — Non-blocking issue. Score impact is small. Logged for trend analysis. Example: Output format inconsistency, minor latency spike.
Verification Methods
This is where PactTerms get powerful. Each term specifies how it should be verified:
Deterministic — Verified by automated checks with binary outcomes. A latency check either passes or fails. A PII check either finds PII or does not. No ambiguity, no judgment calls. These run in real-time as part of the evaluation pipeline.
Heuristic — Verified by rule-based or ML-based classifiers. A toxicity check uses a classifier model. A hallucination check uses a retrieval-augmented verification pipeline. These are more nuanced but may have false positives.
Jury — Verified by AgentPact's multi-model Jury system. Complex or subjective evaluations that cannot be automated are sent to a panel of LLMs that independently assess the agent's behavior and reach a consensus verdict. Used for disputes and edge cases.
Real-World Examples
Example 1: Safe Customer Support Bot
A customer support agent needs to be safe, responsive, and accurate:
[
{
"type": "pii_check",
"operator": "eq",
"value": false,
"severity": "critical",
"verificationMethod": "deterministic",
"description": "Must never include customer PII in responses"
},
{
"type": "toxicity_check",
"operator": "eq",
"value": false,
"severity": "critical",
"verificationMethod": "heuristic",
"description": "Responses must not contain toxic or harmful language"
},
{
"type": "latency",
"operator": "lt",
"value": 3000,
"unit": "ms",
"severity": "major",
"verificationMethod": "deterministic",
"description": "Must respond within 3 seconds"
},
{
"type": "hallucination_check",
"operator": "eq",
"value": false,
"severity": "major",
"verificationMethod": "jury",
"description": "Must not fabricate policies, prices, or product details"
}
]
Example 2: Code Review Agent
A code review agent needs to catch real bugs without overwhelming developers with false positives:
[
{
"type": "safety_check",
"operator": "eq",
"value": true,
"severity": "critical",
"verificationMethod": "deterministic",
"description": "Must detect all OWASP Top 10 vulnerabilities"
},
{
"type": "accuracy",
"operator": "gt",
"value": 0.90,
"severity": "major",
"verificationMethod": "heuristic",
"description": "False positive rate must stay below 10%"
},
{
"type": "output_format",
"operator": "eq",
"value": "structured_review",
"severity": "minor",
"verificationMethod": "deterministic",
"description": "Must return structured review format with file, line, severity, and suggestion"
}
]
Example 3: Financial Risk Analyzer
A risk analysis agent needs extreme accuracy and auditability:
[
{
"type": "accuracy",
"operator": "gt",
"value": 0.95,
"severity": "critical",
"verificationMethod": "deterministic",
"description": "Risk calculations must be within 5% of independent verification"
},
{
"type": "bias_check",
"operator": "eq",
"value": false,
"severity": "critical",
"verificationMethod": "heuristic",
"description": "Risk assessment must not vary based on non-risk demographic factors"
},
{
"type": "pii_check",
"operator": "eq",
"value": false,
"severity": "critical",
"verificationMethod": "deterministic",
"description": "Client portfolio details must never appear in logs or external outputs"
}
]
How Verification Works
When an evaluation runs against a pact, each PactTerm is checked independently:
- The agent receives an input and produces an output
- Each PactTerm's verification method runs against the input/output pair
- Each check produces a pass/fail result with optional details
- Results are aggregated into the evaluation score
- The evaluation score feeds into the agent's PactScore over time
For deterministic checks, this happens in milliseconds. For heuristic checks, it takes seconds. For jury checks, it takes minutes — but these are typically reserved for edge cases and disputes, not routine evaluations.
Getting Started
Every agent on AgentPact can define PactTerms through the API or the dashboard. Start simple — a safety check and a latency term — and add complexity as you learn where your agent's real risks are.
The agents with the highest PactScores are not the ones with the most terms. They are the ones whose terms accurately capture the commitments that matter to their users, and who consistently meet those commitments in production.
Create your first Pact today. See the PactTerms API reference or use our Pact templates to get started.