Rulepacks are YAML files that define rules for auditing candidates in the AISentinel.
A rulepack is a YAML file with a top-level rules key containing a list of rule objects.
Each rule has the following fields:
name: A unique identifier for the rule (string)when: The condition that triggers the rule (string)action: What to do when the condition is met: block, warn, require_approval, suggest_alternative, redact_output, quarantine, escalate, or auto_fixmessage: A human-readable message explaining the violationseverity: The severity level: low, medium, or highphase: When to check the rule: pre (before execution), post (after execution), or final (on final output)tags: Optional list of tags for categorization, e.g., ["pii", "high"]recommendation: Optional suggested remediation steps (string, supports template variables)remediation_config: Optional configuration for automated remediationThe when field is a string that specifies the condition using a simple DSL:
[not] dotpath op value
not: Optional negationdotpath: Path to the field in the candidate object, e.g., args.url, estimate.riskop: Operator: equals, regex, contains, not_regex (same as not regex), not (for other ops)value: The value to compare againstExamples:
args.url regex ^https://safe\.not output regex \[source \d+\]tool equals calcestimate.cost > 1.0 (numeric comparison)estimate.risk between 0.1,0.8 (range check)args.query len_gt 100 (length check)tool in calc,web,search (set membership)args.email endswith @company.com (string operations)args.data is_empty (type checking)equals: Exact string matchcontains: Substring checkicontains: Case-insensitive substring checkstartswith: String starts with valueendswith: String ends with valueregex: Regular expression matchnot_regex: Negated regex (same as not regex)>, gt: Greater than<, lt: Less than>=, gte: Greater than or equal<=, lte: Less than or equalbetween: Range check (format: "min,max")len_gt: Length greater thanlen_lt: Length less thanlen_gte: Length greater than or equallen_lte: Length less than or equallen_eq: Length equalsin: Value is in comma-separated listnot_in: Value is not in comma-separated listis_string: Target is a stringis_number: Target is numericis_list: Target is a list/tupleis_empty: Target is empty or nullAND: Logical AND (both conditions must be true)OR: Logical OR (at least one condition must be true)not: Logical NOT (negates the following condition)(, ): Parentheses for grouping expressionsYou can combine multiple conditions using logical operators:
# Multiple conditions with AND
when: 'tool equals web AND args.url contains malicious'
# Either/or conditions
when: 'estimate.cost > 5.0 OR estimate.risk > 0.8'
# Complex nested conditions
when: '(tool equals email AND args.to endswith @external.com) OR tool equals api_call'
# Negation with complex conditions
when: 'not (tool in calc,search AND args.query len_gt 100)'
rules:
# Cost management
- name: expensive_operation
when: 'estimate.cost > 10.0'
action: block
message: "Operation too expensive"
severity: high
# Risk assessment
- name: high_risk_approval
when: 'estimate.risk between 0.7,1.0'
action: require_approval
message: "High risk operation needs approval"
# Content validation
- name: empty_query_block
when: 'args.query is_empty'
action: block
message: "Empty queries not allowed"
# Tool allowlist
- name: allowed_tools_only
when: 'tool not_in calc,web,search,email,slack_api'
action: block
message: "Tool not in approved list"
# Email security
- name: company_email_only
when: 'tool equals email and args.to not endswith @company.com'
action: block
message: "Only company emails allowed"
# Numeric validation for search results
- name: reasonable_search_count
when: 'args.k > 10'
action: warn
message: "Large result count may be expensive"
severity: low
phase: pre
# Complex logical conditions
- name: high_risk_web_or_expensive
when: '(tool equals web AND estimate.risk > 0.8) OR estimate.cost > 15.0'
action: require_approval
message: "High-risk web operation or very expensive operation"
severity: high
phase: pre
# Multi-tool security check
- name: sensitive_data_protection
when: 'tool in email,api_call,slack_api AND (args.body icontains password OR args.data icontains ssn)'
action: block
message: "Sensitive data detected in external communication"
severity: high
phase: pre
# Length and content validation
- name: reasonable_code_length
when: 'tool equals python_exec AND args.code len_gt 1000'
action: require_approval
message: "Large code blocks require approval"
severity: medium
phase: pre
# Domain and path restrictions
- name: restricted_domains
when: 'tool equals web AND (args.url startswith https://internal. OR args.url contains /admin/)'
action: block
message: "Access to internal/admin resources blocked"
severity: high
phase: pre
The candidate object has these fields:
The estimate object provides machine-generated guidance about the candidate. Common fields:
3) meaning three discrete steps remaining, or0.0 and 1.0 (e.g. 0.25) representing a normalized portion of remaining effort.Guidance: rules can reference estimate fields directly, for example estimate.steps_left < 1 or estimate.risk > 0.8.
Example candidate snippet:
{
"type": "tool_call",
"tool": "web",
"args": {"url": "https://example.com/search?q=foo"},
"estimate": {"cost": 0.12, "risk": 0.05, "steps_left": 2, "confidence": 0.85}
}
evidence_expect: Expected evidence (list of strings)
The evidence_expect field is an advisory list of tokens that indicate what kind of evidence the candidate expects from a tool. It helps adapters, the LLM service, and rulepacks interpret and validate tool outputs.
Common tokens and their intended meaning:
results — Structured results array (for example search results). Use when the tool returns a list of items with metadata.result — A single structured result object.text — Plain textual output; no structured items expected.citation — Expect citation or source metadata alongside textual output.How evidence_expect is used:
evidence_expect so the governor can evaluate them.evidence_expect to require or check for certain evidence types (example: when: 'evidence_expect contains citation').Example candidate showing evidence_expect:
{
"type": "tool_call",
"tool": "web",
"args": {"url": "https://example.com/search?q=foo"},
"estimate": {},
"evidence_expect": ["results"]
}
block: Prevent the candidate from executingwarn: Allow but log a warningrequire_approval: Require manual approval before proceedingsuggest_alternative: Suggest a safer alternative approachredact_output: Automatically redact sensitive informationquarantine: Allow execution but isolate the resultsescalate: Route to human review for complex casesauto_fix: Apply automated remediationpre: Checked before tool executionpost: Checked after tool executionfinal: Checked on the final output of the runAISentinel supports automated remediation for detected violations. Rules can include remediation guidance and configuration for automatic fixes.
The optional recommendation field provides human-readable guidance for addressing violations:
- name: pii_email_detection
when: output regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
action: warn
message: "Email address found in output"
severity: medium
phase: final
recommendation: "Email detected in output. Consider using user IDs or anonymized identifiers."
Recommendations support template variables that are populated with violation context:
{detected_email} - The actual email found{tool} - The tool that triggered the violation{severity} - The violation severity levelThe optional remediation_config field enables automated remediation:
- name: pii_email_detection
when: output regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
action: warn
message: "Email address found in output"
severity: medium
phase: final
recommendation: "Email detected in output. Consider using user IDs or anonymized identifiers."
remediation_config:
auto_redact: true
redaction_pattern: '[REDACTED_EMAIL]'
requires_approval: false
Remediation Config Options:
auto_redact: Automatically apply redaction patterns (boolean)redaction_pattern: Pattern to replace sensitive data (string)requires_approval: Whether remediation needs manual approval (boolean)auto_suggest: Provide alternative suggestions (boolean)suggestion_type: Type of suggestion to provide (string)AISentinel includes built-in remediation for common violation types:
PII Violations (pii_* rules):
[REDACTED_*] placeholdersCost Violations (cost_* rules):
Remediation information appears in:
/api/remediation/optionsrules:
- name: comprehensive_pii_protection
when: 'output regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}'
action: redact_output
message: "Email addresses detected in output - automatically redacting"
severity: medium
phase: final
tags: ["pii", "privacy", "automated"]
recommendation: |
Email address {detected_email} found in output.
Automatically redacted with pattern: [REDACTED_EMAIL]
Consider using anonymized user identifiers instead.
remediation_config:
auto_redact: true
redaction_pattern: '[REDACTED_EMAIL]'
requires_approval: false
Tags are optional lists used for categorization, such as ["pii", "high"] indicating a PII-related rule with high risk.