Silent Model Updates
Models drift because they retrain. Each retrain shifts the worldview. PromptPit shows the shift.
Hosted AI models can change behind the same API name without notice. PromptPit runs fixed probe prompts against each hosted model every 6 hours and tracks tokenization, response length, latency, format adherence, and refusal posture. When we detect a statistically significant behavioral shift and an admin has verified and published it, it appears below.
These entries are behavioral shifts. They are strongly suggestive of a silent model update, but we never claim that as fact. The underlying statistics are shown for each entry so you can draw your own conclusion.
deepseek
- CRITICALDeepSeek: DeepSeek V3CN·Closed · deepseek-chat
Latency p50 shifted (z=-3.50)
5/17/2026, 10:36:41 PM- Metric
- latency_p50_ms
- Baseline
- 1280.667
- Current
- 705.000
- Δ
- -45.0%
- z-score
- -3.50
- CRITICALDeepSeek: DeepSeek V3CN·Closed · deepseek-chat
Response length shifted -21.4% (z=-3.42) vs prior 3-day window
5/17/2026, 10:36:41 PM- Metric
- length_mean
- Baseline
- 108.944
- Current
- 85.638
- Δ
- -21.4%
- z-score
- -3.42
Methodology. Every 6 hours we run a fixed set of 20 probe prompts against each hosted model at temperature 0. Results are aggregated daily into a fingerprint: tokenizer hash, output length distribution, latency p50/p95, format adherence, refusal rate. The current day is compared against a rolling 7-day baseline. Alerts require either a tokenizer-hash change or ≥2 simultaneous metric shifts exceeding configured thresholds.
Provider responses are welcome. Contact security@vulnex.com.