Last batch · 200 generated chair photos · profile: `AZ Design Std`

Each square is one output, scored against a 5-dimension rubric. The reds didn’t fail because the model crashed - they failed because the chair came back wrong.

Pass ≥ 70 Review Fail

3 outputs failed silently - including batch_137.png (fabric mismatch, score 34%)

// would have shipped to the CDN without Eval. anton@ notified by email.

Use cases

Three teams. Same gate.

A profile is a rubric - name dimensions, weight them, set a threshold. The judge does the rest.

e-commerce / brand ops

Did the chair come back beige? In frame? Brand-safe?

// AZ Design Std rubric

subject_match

.30

background_clean

.25

in_frame

.20

color_fidelity

.15

resolution

.10

×

batch_137.png · pink fabric on a beige product

color_fidelity: 0.12 · subject_match: 0.41

34%

Pipeline image-gen → eval → cdn

ai engineers

Eval-as-CI for the LLM and image features you’re shipping.

// LLM output quality rubric

accuracy

.30

no_hallucinations

.25

format_valid

.20

latency_ok

.15

tone_match

.10

✓

PR #482 regression test · 1,200 prompts

avg score 91% · no_hallucinations down 2pt

91%

Triggered by github actions on push

editorial / content

Catch a thin AI-written paragraph before it goes live.

// blog posts apiai rubric

clarity

.25

substance

.25

structure

.15

voice_tone

.15

accuracy

.20

✓

RSS watch · /blog/feed.xml · daily 09:00

15 posts scored · 2 flagged for editor review

88%

Auto-runs on new posts only

THE DIFFERENTIATOR

Most eval tools run when you ask. Eval runs on a schedule.

Point Eval at a URL or RSS feed. It re-scores on every change, every day, every push - and emails you the moment something dips below threshold. No cron jobs to wire up. No CI pipeline to maintain.

Watch any URL or RSS feed - single, recurring, or change-based.
Pipe pipeline outputs straight in via API - score before publish.
Email recipients on FAIL or REVIEW. Webhooks coming soon.

// monitor activity · last 24h Live

11:38 scored /products/chair-117 PASS 100%

11:32 scored /products/chair-118 PASS 95%

11:15 scored /products/chair-119 FAIL 34%

10:43 RSS poll · blog/feed.xml SKIPPED -

09:00 scored 3 new posts PASS 93%

06:27 notified anton@ SENT -

Three steps from rubric to running gate.

Needs an apiai.me API key

1 DEFINE

Create a Profile

Pick dimensions, weights, pass threshold. Add good/bad criteria so the LLM judge has examples to anchor on.

# profile.yaml
name: "AZ Design Std"
threshold: 70
dims:
  - subject_match: .30
  - color_fidelity: .15

2 CONNECT

Watch a feed or POST

Attach to a pipeline as a step, watch a URL or RSS feed, or hit the eval endpoint directly from CI.

curl -X POST \
  api.apiai.me/eval/run \
  -H "Authorization: Bearer $KEY" \
  -F "profile=az_std"

3 TRUST

Ship with confidence

Every output gets a verdict and a stored audit trail. Failures email the team. Pipelines auto-block on FAIL.

# response
verdict: "PASS"
score: 95
latency_ms: 2099
audit_url: ".../r/9d2"

Five great outputs in a row. The sixth ships broken.

Last batch · 200 generated chair photos · profile: `AZ Design Std`

Three teams. Same gate.

Did the chair come back beige? In frame? Brand-safe?

Eval-as-CI for the LLM and image features you’re shipping.

Catch a thin AI-written paragraph before it goes live.

Most eval tools run when you ask. Eval runs on a schedule.

Three steps from rubric to running gate.

Create a Profile

Watch a feed or POST

Ship with confidence

Already on apiai.me? Eval is one POST away.

Five great outputs in a row. The sixth ships broken.

Last batch · 200 generated chair photos · profile: AZ Design Std

Three teams. Same gate.

Did the chair come back beige? In frame? Brand-safe?

Eval-as-CI for the LLM and image features you’re shipping.

Catch a thin AI-written paragraph before it goes live.

Most eval tools run when you ask. Eval runs on a schedule.

Three steps from rubric to running gate.

Create a Profile

Watch a feed or POST

Ship with confidence

Already on apiai.me? Eval is one POST away.

Last batch · 200 generated chair photos · profile: `AZ Design Std`