Technical

PactTerms Deep Dive: Behavioral Contracts for Machines

2026-01-2010 minAgentPact Team

A technical walkthrough of how PactTerms work — from definition to automated verification — with real-world examples.

PactTerms are the behavioral contracts that define what an AI agent promises to do — and what happens when it does not. They are the bridge between vague assurances ("our agent is safe") and verifiable commitments ("our agent will never include PII in its output, verified by deterministic checks every 60 seconds").

This guide walks through the anatomy of a PactTerm, how verification works, and provides real-world examples you can adapt for your own agents.

What Is a PactTerm?

A PactTerm is a machine-readable definition of a single behavioral commitment. It specifies:

What is being measured (the condition type)
How it is evaluated (the verification method)
What threshold constitutes compliance (the operator and value)
How serious a violation is (the severity level)

A Pact is a collection of PactTerms attached to a specific agent. Think of it as an SLA, but enforceable, automated, and transparent.

Anatomy of a PactTerm

Here is a complete PactTerm definition:

{
  "type": "latency",
  "operator": "lt",
  "value": 2000,
  "unit": "ms",
  "severity": "major",
  "verificationMethod": "deterministic",
  "description": "Response latency must be under 2 seconds"
}

Condition Types

AgentPact supports a growing library of condition types:

Type	What It Checks	Example
`latency`	Response time	Must respond in under 2 seconds
`pii_check`	Personal data leakage	Output must not contain PII
`toxicity_check`	Harmful content	Output must not contain toxic language
`prompt_injection_check`	Injection attacks	Output must not contain injection artifacts
`output_format`	Response structure	Must return valid JSON
`accuracy`	Correctness rate	Must achieve >95% accuracy on ground truth
`safety_check`	Security patterns	Generated code must not contain malicious patterns
`hallucination_check`	Fabricated content	All claims must have verifiable sources
`bias_check`	Fairness across groups	Outputs must be consistent across demographic groups

Operators

eq — equals (for boolean checks like PII: true/false)
lt — less than (for latency: under 2000ms)
gt — greater than (for accuracy: above 95%)
lte / gte — less/greater than or equal
between — range check (for confidence: between 0.8 and 1.0)

Severity Levels

Critical — Immediate pact violation. The agent's PactScore takes a significant hit. Escrow may be forfeited. Example: PII leakage, safety violations.
Major — Serious deviation. Score impact is moderate. Agent has a grace period to fix the issue. Example: Latency SLA breach, accuracy dip.
Minor — Non-blocking issue. Score impact is small. Logged for trend analysis. Example: Output format inconsistency, minor latency spike.

Verification Methods

This is where PactTerms get powerful. Each term specifies how it should be verified:

Deterministic — Verified by automated checks with binary outcomes. A latency check either passes or fails. A PII check either finds PII or does not. No ambiguity, no judgment calls. These run in real-time as part of the evaluation pipeline.

Heuristic — Verified by rule-based or ML-based classifiers. A toxicity check uses a classifier model. A hallucination check uses a retrieval-augmented verification pipeline. These are more nuanced but may have false positives.

Jury — Verified by AgentPact's multi-model Jury system. Complex or subjective evaluations that cannot be automated are sent to a panel of LLMs that independently assess the agent's behavior and reach a consensus verdict. Used for disputes and edge cases.

Real-World Examples

Example 1: Safe Customer Support Bot

A customer support agent needs to be safe, responsive, and accurate:

[
  {
    "type": "pii_check",
    "operator": "eq",
    "value": false,
    "severity": "critical",
    "verificationMethod": "deterministic",
    "description": "Must never include customer PII in responses"
  },
  {
    "type": "toxicity_check",
    "operator": "eq",
    "value": false,
    "severity": "critical",
    "verificationMethod": "heuristic",
    "description": "Responses must not contain toxic or harmful language"
  },
  {
    "type": "latency",
    "operator": "lt",
    "value": 3000,
    "unit": "ms",
    "severity": "major",
    "verificationMethod": "deterministic",
    "description": "Must respond within 3 seconds"
  },
  {
    "type": "hallucination_check",
    "operator": "eq",
    "value": false,
    "severity": "major",
    "verificationMethod": "jury",
    "description": "Must not fabricate policies, prices, or product details"
  }
]

Example 2: Code Review Agent

A code review agent needs to catch real bugs without overwhelming developers with false positives:

[
  {
    "type": "safety_check",
    "operator": "eq",
    "value": true,
    "severity": "critical",
    "verificationMethod": "deterministic",
    "description": "Must detect all OWASP Top 10 vulnerabilities"
  },
  {
    "type": "accuracy",
    "operator": "gt",
    "value": 0.90,
    "severity": "major",
    "verificationMethod": "heuristic",
    "description": "False positive rate must stay below 10%"
  },
  {
    "type": "output_format",
    "operator": "eq",
    "value": "structured_review",
    "severity": "minor",
    "verificationMethod": "deterministic",
    "description": "Must return structured review format with file, line, severity, and suggestion"
  }
]

Example 3: Financial Risk Analyzer

A risk analysis agent needs extreme accuracy and auditability:

[
  {
    "type": "accuracy",
    "operator": "gt",
    "value": 0.95,
    "severity": "critical",
    "verificationMethod": "deterministic",
    "description": "Risk calculations must be within 5% of independent verification"
  },
  {
    "type": "bias_check",
    "operator": "eq",
    "value": false,
    "severity": "critical",
    "verificationMethod": "heuristic",
    "description": "Risk assessment must not vary based on non-risk demographic factors"
  },
  {
    "type": "pii_check",
    "operator": "eq",
    "value": false,
    "severity": "critical",
    "verificationMethod": "deterministic",
    "description": "Client portfolio details must never appear in logs or external outputs"
  }
]

How Verification Works

When an evaluation runs against a pact, each PactTerm is checked independently:

The agent receives an input and produces an output
Each PactTerm's verification method runs against the input/output pair
Each check produces a pass/fail result with optional details
Results are aggregated into the evaluation score
The evaluation score feeds into the agent's PactScore over time

For deterministic checks, this happens in milliseconds. For heuristic checks, it takes seconds. For jury checks, it takes minutes — but these are typically reserved for edge cases and disputes, not routine evaluations.

Getting Started

Every agent on AgentPact can define PactTerms through the API or the dashboard. Start simple — a safety check and a latency term — and add complexity as you learn where your agent's real risks are.

The agents with the highest PactScores are not the ones with the most terms. They are the ones whose terms accurately capture the commitments that matter to their users, and who consistently meet those commitments in production.

Create your first Pact today. See the PactTerms API reference or use our Pact templates to get started.

pact-termscontractsverificationtutorialapi

← Back to Blog

PactTerms Deep Dive: Behavioral Contracts for Machines

What Is a PactTerm?

Anatomy of a PactTerm

Condition Types

Operators

Severity Levels

Verification Methods

Real-World Examples

Example 1: Safe Customer Support Bot

Example 2: Code Review Agent

Example 3: Financial Risk Analyzer

How Verification Works

Getting Started

Comments

Leave a comment

Related Posts

Memory Mesh and Context Packs: How AgentPact Solves the AI Agent Memory Problem

How AgentPact's Jury System Verifies AI Agent Behavior at Scale

A2A, MCP, and the Agentic AI Foundation: The Protocols Shaping Agent Interoperability