Rulepacks

Rulepacks are YAML files that define rules for auditing candidates in the AISentinel.

Structure

A rulepack is a YAML file with a top-level rules key containing a list of rule objects.

Each rule has the following fields:

  • name: A unique identifier for the rule (string)
  • when: The condition that triggers the rule (string)
  • action: What to do when the condition is met: block, warn, require_approval, suggest_alternative, redact_output, quarantine, escalate, or auto_fix
  • message: A human-readable message explaining the violation
  • severity: The severity level: low, medium, or high
  • phase: When to check the rule: pre (before execution), post (after execution), or final (on final output)
  • tags: Optional list of tags for categorization, e.g., ["pii", "high"]
  • recommendation: Optional suggested remediation steps (string, supports template variables)
  • remediation_config: Optional configuration for automated remediation

Conditions

The when field is a string that specifies the condition using a simple DSL:

[not] dotpath op value

  • not: Optional negation
  • dotpath: Path to the field in the candidate object, e.g., args.url, estimate.risk
  • op: Operator: equals, regex, contains, not_regex (same as not regex), not (for other ops)
  • value: The value to compare against

Examples:

  • args.url regex ^https://safe\.
  • not output regex \[source \d+\]
  • tool equals calc
  • estimate.cost > 1.0 (numeric comparison)
  • estimate.risk between 0.1,0.8 (range check)
  • args.query len_gt 100 (length check)
  • tool in calc,web,search (set membership)
  • args.email endswith @company.com (string operations)
  • args.data is_empty (type checking)

Operators

String and Basic Operators

  • equals: Exact string match
  • contains: Substring check
  • icontains: Case-insensitive substring check
  • startswith: String starts with value
  • endswith: String ends with value
  • regex: Regular expression match
  • not_regex: Negated regex (same as not regex)

Numeric Operators

  • >, gt: Greater than
  • <, lt: Less than
  • >=, gte: Greater than or equal
  • <=, lte: Less than or equal
  • between: Range check (format: "min,max")

Length Operators

  • len_gt: Length greater than
  • len_lt: Length less than
  • len_gte: Length greater than or equal
  • len_lte: Length less than or equal
  • len_eq: Length equals

Set and List Operators

  • in: Value is in comma-separated list
  • not_in: Value is not in comma-separated list

Type Checking Operators

  • is_string: Target is a string
  • is_number: Target is numeric
  • is_list: Target is a list/tuple
  • is_empty: Target is empty or null

Logical Operators

  • AND: Logical AND (both conditions must be true)
  • OR: Logical OR (at least one condition must be true)
  • not: Logical NOT (negates the following condition)
  • (, ): Parentheses for grouping expressions

Complex Conditions

You can combine multiple conditions using logical operators:

# Multiple conditions with AND
when: 'tool equals web AND args.url contains malicious'

# Either/or conditions  
when: 'estimate.cost > 5.0 OR estimate.risk > 0.8'

# Complex nested conditions
when: '(tool equals email AND args.to endswith @external.com) OR tool equals api_call'

# Negation with complex conditions
when: 'not (tool in calc,search AND args.query len_gt 100)'

Advanced Examples

rules:
# Cost management
- name: expensive_operation
  when: 'estimate.cost > 10.0'
  action: block
  message: "Operation too expensive"
  severity: high

# Risk assessment  
- name: high_risk_approval
  when: 'estimate.risk between 0.7,1.0'
  action: require_approval
  message: "High risk operation needs approval"

# Content validation
- name: empty_query_block
  when: 'args.query is_empty'
  action: block
  message: "Empty queries not allowed"

# Tool allowlist
- name: allowed_tools_only
  when: 'tool not_in calc,web,search,email,slack_api'
  action: block
  message: "Tool not in approved list"

# Email security
- name: company_email_only
  when: 'tool equals email and args.to not endswith @company.com'
  action: block
  message: "Only company emails allowed"

# Numeric validation for search results
- name: reasonable_search_count
  when: 'args.k > 10'
  action: warn
  message: "Large result count may be expensive"
  severity: low
  phase: pre

# Complex logical conditions
- name: high_risk_web_or_expensive
  when: '(tool equals web AND estimate.risk > 0.8) OR estimate.cost > 15.0'
  action: require_approval
  message: "High-risk web operation or very expensive operation"
  severity: high
  phase: pre

# Multi-tool security check
- name: sensitive_data_protection
  when: 'tool in email,api_call,slack_api AND (args.body icontains password OR args.data icontains ssn)'
  action: block
  message: "Sensitive data detected in external communication"
  severity: high
  phase: pre

# Length and content validation
- name: reasonable_code_length
  when: 'tool equals python_exec AND args.code len_gt 1000'
  action: require_approval
  message: "Large code blocks require approval"
  severity: medium
  phase: pre

# Domain and path restrictions
- name: restricted_domains
  when: 'tool equals web AND (args.url startswith https://internal. OR args.url contains /admin/)'
  action: block
  message: "Access to internal/admin resources blocked"
  severity: high
  phase: pre

Field Reference

Candidate Object

The candidate object has these fields:

Estimate object

The estimate object provides machine-generated guidance about the candidate. Common fields:

  • an integer count (e.g. 3) meaning three discrete steps remaining, or
  • a fraction between 0.0 and 1.0 (e.g. 0.25) representing a normalized portion of remaining effort.

Guidance: rules can reference estimate fields directly, for example estimate.steps_left < 1 or estimate.risk > 0.8.

Example candidate snippet:

{
  "type": "tool_call",
  "tool": "web",
  "args": {"url": "https://example.com/search?q=foo"},
  "estimate": {"cost": 0.12, "risk": 0.05, "steps_left": 2, "confidence": 0.85}
}

evidence_expect: Expected evidence (list of strings)

evidence_expect (details)

The evidence_expect field is an advisory list of tokens that indicate what kind of evidence the candidate expects from a tool. It helps adapters, the LLM service, and rulepacks interpret and validate tool outputs.

Common tokens and their intended meaning:

  • results — Structured results array (for example search results). Use when the tool returns a list of items with metadata.
  • result — A single structured result object.
  • text — Plain textual output; no structured items expected.
  • citation — Expect citation or source metadata alongside textual output.

How evidence_expect is used:

  • Adapters parse and map tool outputs according to the tokens in evidence_expect so the governor can evaluate them.
  • Rulepacks can reference evidence_expect to require or check for certain evidence types (example: when: 'evidence_expect contains citation').
  • The field is advisory and optional; omit it or leave it empty when no particular evidence format is required.

Example candidate showing evidence_expect:

{
  "type": "tool_call",
  "tool": "web",
  "args": {"url": "https://example.com/search?q=foo"},
  "estimate": {},
  "evidence_expect": ["results"]
}

Actions

  • block: Prevent the candidate from executing
  • warn: Allow but log a warning
  • require_approval: Require manual approval before proceeding
  • suggest_alternative: Suggest a safer alternative approach
  • redact_output: Automatically redact sensitive information
  • quarantine: Allow execution but isolate the results
  • escalate: Route to human review for complex cases
  • auto_fix: Apply automated remediation

Phases

  • pre: Checked before tool execution
  • post: Checked after tool execution
  • final: Checked on the final output of the run

Tags

Remediation Features

AISentinel supports automated remediation for detected violations. Rules can include remediation guidance and configuration for automatic fixes.

Recommendation Field

The optional recommendation field provides human-readable guidance for addressing violations:

- name: pii_email_detection
  when: output regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
  action: warn
  message: "Email address found in output"
  severity: medium
  phase: final
  recommendation: "Email detected in output. Consider using user IDs or anonymized identifiers."

Recommendations support template variables that are populated with violation context:

  • {detected_email} - The actual email found
  • {tool} - The tool that triggered the violation
  • {severity} - The violation severity level

Remediation Configuration

The optional remediation_config field enables automated remediation:

- name: pii_email_detection
  when: output regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
  action: warn
  message: "Email address found in output"
  severity: medium
  phase: final
  recommendation: "Email detected in output. Consider using user IDs or anonymized identifiers."
  remediation_config:
    auto_redact: true
    redaction_pattern: '[REDACTED_EMAIL]'
    requires_approval: false

Remediation Config Options:

  • auto_redact: Automatically apply redaction patterns (boolean)
  • redaction_pattern: Pattern to replace sensitive data (string)
  • requires_approval: Whether remediation needs manual approval (boolean)
  • auto_suggest: Provide alternative suggestions (boolean)
  • suggestion_type: Type of suggestion to provide (string)

Built-in Remediation

AISentinel includes built-in remediation for common violation types:

PII Violations (pii_* rules):

  • Automatically redacts emails, phone numbers, and SSNs
  • Replaces sensitive data with [REDACTED_*] placeholders

Cost Violations (cost_* rules):

  • Suggests lower-cost alternatives
  • Provides optimization recommendations
  • May require approval for significant changes

Remediation in Reports

Remediation information appears in:

  • Security Audit Reports (SAR): HTML reports show remediation actions and results
  • Dashboard Reports: Interactive remediation panels with one-click fixes
  • API Responses: Remediation options via /api/remediation/options

Example Complete Rule

rules:
  - name: comprehensive_pii_protection
    when: 'output regex [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}'
    action: redact_output
    message: "Email addresses detected in output - automatically redacting"
    severity: medium
    phase: final
    tags: ["pii", "privacy", "automated"]
    recommendation: |
      Email address {detected_email} found in output.
      Automatically redacted with pattern: [REDACTED_EMAIL]
      Consider using anonymized user identifiers instead.
    remediation_config:
      auto_redact: true
      redaction_pattern: '[REDACTED_EMAIL]'
      requires_approval: false

Tags are optional lists used for categorization, such as ["pii", "high"] indicating a PII-related rule with high risk.