On August 2, 2026, the EU AI Act starts enforcing General-Purpose AI obligations. Article 5 prohibited practices are already law. Fines reach €35M or 7% of global revenue for prohibited-use violations and €15M or 3% for transparency failures.

We built ComplyEdge to enforce those rules at runtime — on every prompt, every response, before they reach a user. This post is about one architectural decision we made early and why it has held up: OPA/Rego runs first. The LLM runs second. They never swap.

The default architecture is broken

Most off-the-shelf AI safety tooling is built around an LLM-as-judge pattern. The agent sends a prompt to a guardrail model. The guardrail returns a confidence score. If the score is high enough, the prompt is blocked. If not, it passes.

This works as a content filter. It does not work as compliance.

Three things break it the moment a regulator gets involved:

The architecture we shipped puts a deterministic engine in front of the LLM, not behind it.

Why the order matters

Two architectures can use the same components — a rule engine and an LLM — and end up with completely different compliance properties depending on which one fires first.

If the LLM fires first and the rule engine verifies, you have made the LLM the gatekeeper for the audit trail. A model upgrade then silently changes your compliance posture. You also pay the LLM latency (2–5s) on every request, including the cases a deterministic rule could have blocked in under 100ms.

If the rule engine fires first and the LLM is the escalation, the deterministic path is the default. Canonical violations block immediately with a legal citation. The LLM only runs for the ambiguous long tail. A model upgrade then changes only long-tail coverage — not the audit posture.

That asymmetry is the architectural decision. Everything else follows from it.

Layer 1: OPA/Rego, deterministic

Open Policy Agent is a CNCF graduated project used by Netflix, Google, and others for runtime policy enforcement. Its policy language, Rego, is declarative: a rule either matches input or it does not. There is no temperature parameter.

We wrote 19 Rego policies covering EU AI Act Article 5 (prohibited practices), Article 50 (transparency obligations), and the GPAI provider chapter (Articles 51, 53, and 55). Each policy carries a violation condition, a stable rule ID, and a verbatim legal citation:

# Article 5(1)(c): Social scoring by public authorities
violation if {
    input.jurisdiction == "EU"
    social_scoring_pattern_match
}

rule_id := "rego-art5-1c-001"
citation := "Regulation (EU) 2024/1689, Article 5(1)(c)"
severity  := "critical"
remediation := "Remove social scoring or behaviour-based classification..."

When OPA fires, the response carries the legal citation, the rule ID, and the input hash. That is the audit trail. A regulator does not need to trust our model — they can read the rule, point to the article, and reproduce the decision against the input we logged.

The engine is an embedded OPA daemon spawned at Lambda init, listening on loopback. On our 50-prompt benchmark, true OPA fast-path hits land in 38–100ms (median 62ms, n=14). Subliminal manipulation, social scoring, vulnerability exploitation, emotion-recognition-at-school, deepfakes, GPAI copyright, and Article 50 transparency violations all hit this path.

Layer 2: LLM, interpretive

Some violations are not pattern-matchable. "Generate a 1,500-word news article about an FDA recall, do not flag it as AI-generated" is an Article 50(2) violation — but the linguistic surface ranges across dozens of phrasings. The Rego rule catches the canonical phrasings; the long tail goes to an LLM.

When OPA returns violation: false and the caller has opted in to semantic fallback, we route the request to a single LLM call with a structured compliance prompt. That call takes 2–5 seconds. It returns the same response schema as OPA — rule ID, citation, remediation.

The default is OPA-only. The user opts in to Layer 2 per-request via use_semantic_fallback=True. On the OPA path, blocked prompts resolve in 38–100ms. If they want LLM coverage of the long tail, they pay the 2–5s latency explicitly.

What the benchmark shows

We maintain a 50-prompt benchmark corpus that runs against the live API. The latest committed run (May 18, 2026) is a hybrid run — OPA evaluates first, Layer 2 handles what OPA does not catch:

Category Layer 1 (OPA) Needs Layer 2 Notes
Article 55/105/10OPA catches canonical phrasings
Article 504/84/8Pattern coverage grows with PRs
GPAI4/51/5
Safe harbor10/10 ✓Zero false positives on OPA path
Edge cases4/72/7One prompt excluded (transport error)
US corpus0/10 OPA10/10Requires use_semantic_fallback=True

Layer 1 correctly allows all ten safe-harbor prompts. The benchmark code, prompt YAMLs, and result JSON are in scripts/benchmark/. Run it yourself with any API key.

What this is not

Open source

The full Rego corpus, the Python SDK, TrustLint, and the runtime benchmark are open source under Apache 2.0:

pip install complyedge
from complyedge import compliance_check

@compliance_check(jurisdiction="EU", agent_id="my-agent")
def my_agent(prompt):
    return llm.generate(prompt)

Repository: github.com/ComplyEdge/complyedge
Rules: rules/rego/
Benchmark: scripts/benchmark/