LIVE DEMO · TDA CART SHOPBOT
Same question. Different model. Different answer.
TDA Cart added a simple support chatbot. Swap the model behind it and the answer drifts — the refund policy goes vague, the JSON breaks, the bot guesses instead of asking. EvalDog runs the same checks on every model and catches the drift automatically.
1 · Pick a question
Asserts: Must state the exact 30-day window. (contains, not-empty)
2 · Pick models (max 4)
3 · OpenRouter key (optional — for live calls)
Used only for this request — never stored. Get one at openrouter.ai/keys.
Pick a question + models, then run. EvalDog grades every model’s answer.
What this proves
The exact same prompt and assertions, run across models, produce different scores. That gap is model drift — and it’s exactly what breaks silently when a provider ships an update. EvalDog turns it into a number you can gate on.
Grade your own cases