On August 2, 2026, the EU AI Act starts fining general-purpose AI providers for documentation gaps. Article 5 prohibited practices are already law. Fines reach €15M or 3% of global revenue for GPAI transparency failures and €35M or 7% for prohibited-use violations.

We read the public documentation of every major GPAI provider with material EU exposure and graded it against the six obligation categories the Act creates for model providers. Across 12 providers, the average score is 49.6% on a 0-to-3 scale. Google (DeepMind) leads at 83.3%; xAI (Grok) trails at 11.1%. No provider scores the maximum on every obligation. The leaderboard, provider evidence files, and scoring code are open source — anyone can re-run the benchmark and dispute a score with a public URL we missed.

What we measured

The EU AI Act creates six obligation categories for general-purpose AI model providers:

ArticleCategoryQuestion
50(2)Content disclosureIs generated content marked machine-readable?
51Model classificationHas the provider self-classified and notified the AI Office?
52 + 53(1)(a-b)Technical documentationDoes a model card meet Annex XI completeness?
53(1)(c)Copyright transparencyIs there a published policy and an opt-out mechanism?
53(1)(d-e)Downstream obligationsIs integration documentation sufficient for deployers?
55Systemic riskRed-teaming, incident reporting, weight-security — for models at or above the ≥10²⁵ FLOPs threshold.

Each obligation is scored 0 (no evidence), 1 (partial), 2 (adequate), or 3 (exceeds). Only public documentation counts: model cards, terms of service, API docs, transparency reports.

Article 55 is marked N/A for providers below the systemic-risk threshold (Mistral, Hugging Face SmolLM, Stability AI, Cohere, AI21 Labs, Inflection AI, Amazon Titan/Nova). Those providers are scored out of 15 instead of 18.

ScoreLabelStandard
0No evidenceProvider does not address the obligation publicly.
1PartialMentioned without verifiable detail.
2AdequateSpecific, sourced documentation exists.
3ExceedsMachine-readable, regularly updated, audited.

What this is not

All twelve scores carry verification_status: needs_review in the working dataset. We publish the rankings with that flag visible. Dispute a score by opening a PR with a public URL we missed.

The leaderboard

Data as of May 9, 2026. Generated from benchmark_latest.json.

RankProviderClass50(2)5152/5353(c)53(d-e)55Aggregate%
1Google (DeepMind)closed_api22323315 / 1883.3%
2Meta (Llama)open_weights12323314 / 1877.8%
3Anthropicclosed_api02313312 / 1866.7%
4Mistral AIopen_weights12313N/A10 / 1566.7%
5Hugging Face (SmolLM)open_weights10332N/A9 / 1560.0%
6OpenAIclosed_api1121229 / 1850.0%
7Stability AIopen_weights10213N/A7 / 1546.7%
8Cohereclosed_api10203N/A6 / 1540.0%
9AI21 Labsopen_weights10202N/A5 / 1533.3%
10Inflection AIclosed_api10103N/A5 / 1533.3%
11Amazon (Titan/Nova)closed_api10102N/A4 / 1526.7%
12xAI (Grok)closed_api0010102 / 1811.1%

What the data shows

  1. Article 51 self-classification is the industry floor. Seven of twelve providers have zero public evidence of Article 51 self-classification or AI Office notification. Even top scorers stop at partial.
  2. Machine-readable text disclosure is functionally absent. Article 50(2) averages 0.92 / 3. Image generation has converged on C2PA credentials. Text outputs are a different story — Anthropic and xAI score zero on content disclosure.
  3. The distribution is bimodal. Five providers score zero on copyright transparency. xAI is the only provider scoring zero on four obligations simultaneously.
  4. Open-weights does not predict lower scores. Meta scores 3/3 on Article 55. Most open-weights vendors are below the FLOPs threshold, so Article 55 is N/A.
  5. Technical documentation is the high-water mark. Average 2.17 / 3 — five providers score the maximum.

Why we publish this

ComplyEdge's product is a runtime compliance engine — OPA/Rego on the hot path, legal citation on every block. The benchmark is not the product. It is a measurement of the documentation gap the product operates in.

  1. Deployers. Obligations cascade from the model provider to the deployer. The deployer cannot inherit what was never published — 50.4 percentage points below full compliance industry-wide.
  2. Acquirers. Average GPAI provider scores 49.6% on public documentation; no provider exceeds 84%.
  3. Regulators. Public data shows where industry stands before the AI Office finalizes evaluation protocols — Article 51 self-classification is the gap to expect post-August 2026.

Reproduce it

git clone https://github.com/ComplyEdge/complyedge
cd complyedge
python scripts/benchmark/benchmark_runner.py
python scripts/benchmark/leaderboard_renderer.py
cat scripts/benchmark/results/leaderboard.md

Repository: github.com/ComplyEdge/complyedge
Provider evidence: providers/
Benchmark code: scripts/benchmark/
Leaderboard JSON: benchmark_latest.json