We graded 12 GPAI providers on EU AI Act documentation

On August 2, 2026, the EU AI Act starts fining general-purpose AI providers for documentation gaps. Article 5 prohibited practices are already law. Fines reach €15M or 3% of global revenue for GPAI transparency failures and €35M or 7% for prohibited-use violations.

We read the public documentation of every major GPAI provider with material EU exposure and graded it against the six obligation categories the Act creates for model providers. Across 12 providers, the average score is 49.6% on a 0-to-3 scale. Google (DeepMind) leads at 83.3%; xAI (Grok) trails at 11.1%. No provider scores the maximum on every obligation. The leaderboard, provider evidence files, and scoring code are open source — anyone can re-run the benchmark and dispute a score with a public URL we missed.

What we measured

The EU AI Act creates six obligation categories for general-purpose AI model providers:

Article	Category	Question
50(2)	Content disclosure	Is generated content marked machine-readable?
51	Model classification	Has the provider self-classified and notified the AI Office?
52 + 53(1)(a-b)	Technical documentation	Does a model card meet Annex XI completeness?
53(1)(c)	Copyright transparency	Is there a published policy and an opt-out mechanism?
53(1)(d-e)	Downstream obligations	Is integration documentation sufficient for deployers?
55	Systemic risk	Red-teaming, incident reporting, weight-security — for models at or above the ≥10²⁵ FLOPs threshold.

Each obligation is scored 0 (no evidence), 1 (partial), 2 (adequate), or 3 (exceeds). Only public documentation counts: model cards, terms of service, API docs, transparency reports.

Article 55 is marked N/A for providers below the systemic-risk threshold (Mistral, Hugging Face SmolLM, Stability AI, Cohere, AI21 Labs, Inflection AI, Amazon Titan/Nova). Those providers are scored out of 15 instead of 18.

Score	Label	Standard
0	No evidence	Provider does not address the obligation publicly.
1	Partial	Mentioned without verifiable detail.
2	Adequate	Specific, sourced documentation exists.
3	Exceeds	Machine-readable, regularly updated, audited.

What this is not

Not a runtime test. We are not running model outputs through ComplyEdge's compliance engine. That is a different, complementary measurement — see our runtime benchmark.
Not a legal verdict. Until August 2, 2026, GPAI documentation obligations are not yet enforceable with fines. We are scoring readiness, not adjudicating non-compliance.
Not exhaustive. Twelve providers in scope — every vendor with material EU GPAI exposure we could verify from public sources.

All twelve scores carry verification_status: needs_review in the working dataset. We publish the rankings with that flag visible. Dispute a score by opening a PR with a public URL we missed.

The leaderboard

Data as of May 9, 2026. Generated from benchmark_latest.json.

Rank	Provider	Class	50(2)	51	52/53	53(c)	53(d-e)	55	Aggregate	%
1	Google (DeepMind)	closed_api	2	2	3	2	3	3	15 / 18	83.3%
2	Meta (Llama)	open_weights	1	2	3	2	3	3	14 / 18	77.8%
3	Anthropic	closed_api	0	2	3	1	3	3	12 / 18	66.7%
4	Mistral AI	open_weights	1	2	3	1	3	N/A	10 / 15	66.7%
5	Hugging Face (SmolLM)	open_weights	1	0	3	3	2	N/A	9 / 15	60.0%
6	OpenAI	closed_api	1	1	2	1	2	2	9 / 18	50.0%
7	Stability AI	open_weights	1	0	2	1	3	N/A	7 / 15	46.7%
8	Cohere	closed_api	1	0	2	0	3	N/A	6 / 15	40.0%
9	AI21 Labs	open_weights	1	0	2	0	2	N/A	5 / 15	33.3%
10	Inflection AI	closed_api	1	0	1	0	3	N/A	5 / 15	33.3%
11	Amazon (Titan/Nova)	closed_api	1	0	1	0	2	N/A	4 / 15	26.7%
12	xAI (Grok)	closed_api	0	0	1	0	1	0	2 / 18	11.1%

What the data shows

Article 51 self-classification is the industry floor. Seven of twelve providers have zero public evidence of Article 51 self-classification or AI Office notification. Even top scorers stop at partial.
Machine-readable text disclosure is functionally absent. Article 50(2) averages 0.92 / 3. Image generation has converged on C2PA credentials. Text outputs are a different story — Anthropic and xAI score zero on content disclosure.
The distribution is bimodal. Five providers score zero on copyright transparency. xAI is the only provider scoring zero on four obligations simultaneously.
Open-weights does not predict lower scores. Meta scores 3/3 on Article 55. Most open-weights vendors are below the FLOPs threshold, so Article 55 is N/A.
Technical documentation is the high-water mark. Average 2.17 / 3 — five providers score the maximum.

Why we publish this

ComplyEdge's product is a runtime compliance engine — OPA/Rego on the hot path, legal citation on every block. The benchmark is not the product. It is a measurement of the documentation gap the product operates in.

Deployers. Obligations cascade from the model provider to the deployer. The deployer cannot inherit what was never published — 50.4 percentage points below full compliance industry-wide.
Acquirers. Average GPAI provider scores 49.6% on public documentation; no provider exceeds 84%.
Regulators. Public data shows where industry stands before the AI Office finalizes evaluation protocols — Article 51 self-classification is the gap to expect post-August 2026.

Reproduce it

git clone https://github.com/ComplyEdge/complyedge
cd complyedge
python scripts/benchmark/benchmark_runner.py
python scripts/benchmark/leaderboard_renderer.py
cat scripts/benchmark/results/leaderboard.md

Repository: github.com/ComplyEdge/complyedge
Provider evidence: providers/
Benchmark code: scripts/benchmark/
Leaderboard JSON: benchmark_latest.json