LIVE DEMO · TDA CART SHOPBOT

Same question. Different model. Different answer.

TDA Cart added a simple support chatbot. Swap the model behind it and the answer drifts — the refund policy goes vague, the JSON breaks, the bot guesses instead of asking. EvalDog runs the same checks on every model and catches the drift automatically.

See it running in a real store — TDA Cart

1 · Pick a question

Asserts: Must state the exact 30-day window. (contains, not-empty)

2 · Pick models (max 4)

3 · OpenRouter key (optional — for live calls)

Used only for this request — never stored. Get one at openrouter.ai/keys.

Pick a question + models, then run. EvalDog grades every model’s answer.

What this proves

The exact same prompt and assertions, run across models, produce different scores. That gap is model drift — and it’s exactly what breaks silently when a provider ships an update. EvalDog turns it into a number you can gate on.

Grade your own cases