Verifier Registry
38 verifiers across 19 domains, organised into three tiers: HARD (deterministic state probes), SOFT (LLM-judged rubrics), and AGENTIC (multi-step tool-use checks). Browse below or filter by tier and domain.
38 verifiers
aiv.calendar.event_created
Verifies that a calendar event was created with correct title, date, and participants via HTTP API query.
aiv.email.sent_folder_confirmed
Opens IMAP connection to Sent folder and searches for a matching message by recipient and subject fragment. Proves the 'latent state' insight that UI confirmation does not equal actual delivery.
aiv.shell.state_probe
Executes a sandboxed read-only shell command and compares stdout to an expected value, probing whether an agent action actually changed system state.
api.http.header_present
Verifies that an HTTP response contains expected headers.
api.http.response_matches
Verifies that an HTTP response body contains expected substrings.
api.http.status_ok
Verifies that an HTTP endpoint returns the expected status code.
code.python.lint_ruff
Verifies that agent-generated Python code passes ruff lint checks with configurable violation thresholds.
code.python.tests_pass
Writes agent-generated Python code to a temp directory and runs pytest via the sandbox runner. Score is based on the fraction of tests that pass.
database.row.exists
Verifies that a row matching given criteria exists in a database table.
database.row.updated
Verifies that a database row was updated to contain expected values.
database.table.row_count
Verifies that a database table has the expected number of rows.
document.csv.row_count
Verifies that a CSV file has the expected number of data rows.
document.json.valid
Verifies that a file contains valid JSON with optional type and key checks.
document.pdf.page_count
Verifies that a PDF has the expected number of pages.
document.text.contains
Verifies that a text file contains all expected substrings.
document.yaml.valid
Verifies that a file contains valid YAML with optional key checks.
filesystem.file_created
Verifies that a file was created at the expected path with optional size and content hash checks.
git.commit_present
Verifies that a specific git commit exists in a repository by SHA prefix or message substring.
rubric.code.logic_correct
LLM-judge rubric verifier scoring code logic correctness on 4 criteria: algorithm, edge cases, logic errors, requirements.
rubric.email.tone_professional
4-component rubric scored by LLM judge: greeting, formality, key info, no inappropriate content. Score = sum/4. MUST be composed with vr/aiv.email.sent_folder_confirmed.
rubric.summary.faithful
Scores summary faithfulness to source text via a 3-component LLM rubric: factual accuracy, key points coverage, no hallucinations.
tau2.airline.rebooking_correct
Queries airline API to confirm flight rebooking fields match expected values (date, cabin class, passengers).
tau2.policy.constraint_not_violated
Pure-logic verifier checking agent action traces against domain policy rules. Works for any domain with codifiable constraints.
tau2.retail.inventory_updated
Queries retail API to confirm that a product SKU has the expected quantity in inventory after an agent action.
tau2.retail.order_cancelled
Queries retail API to confirm an order is in cancelled state with matching reason code.
tau2.retail.refund_processed
Queries mock/real retail API to confirm a refund has been processed with the expected amount and status. Catches agents that claim refunds were issued but the actual state shows otherwise.
tau2.telecom.plan_changed
Verifies that a customer's telecom plan was changed to the expected plan with correct effective date via CRM API.
web.browser.element_visible
Navigates to a URL via headless browser and checks whether a specific CSS selector is present in the DOM. Catches agents that claim UI actions succeeded but never actually modified the page.
web.browser.screenshot_match
Captures a live screenshot via browser automation and compares it to a reference image using SSIM (Structural Similarity Index).
web.ecommerce.order_placed
Verifies that an e-commerce order was placed with correct items and total via HTTP API query.
ci.github.workflow_passed
Verifies that a specific GitHub Actions workflow completed successfully.
git.ci.passed
Verifies that all CI check runs passed for a given commit SHA.
git.pr.merged
Verifies that a GitHub Pull Request was merged to the target branch.
messaging.slack.message_sent
Verifies that a message containing expected text exists in a Slack channel.
messaging.slack.reaction_added
Verifies that a specific reaction was added to a Slack message.
payment.stripe.charge_succeeded
Verifies that a Stripe charge was completed successfully.
payment.stripe.refund_processed
Verifies that a Stripe refund was processed successfully.
project.jira.ticket_transitioned
Verifies that a Jira ticket has been transitioned to the expected status.
Missing a verifier?
Tell us what domain or task you need verified and we'll prioritize it. You can also build your own or browse verifier ideas.