bullshit-benchmark

petergpt

BullshitBench measures whether AI models challenge nonsensical prompts instead of confidently answering them, created by Peter Gostev.

AI 简介

BullshitBench 是一个用于评估大语言模型对无意义提示（nonsensical prompts）识别与拒绝能力的基准测试工具。它通过人工构建的100道跨领域（软件、金融、法律、医疗、物理）荒谬问题，量化模型是否能明确指出错误（Clear Pushback）、部分质疑（Partial Challenge）或错误接受（Accepted Nonsense），并支持按领域、发布时间、推理开销和参数规模等多维度分析。适用于AI安全研究、模型鲁棒性评估及负责任AI开发场景。

Python

MIT License

在 GitHub 查看官方网站

1.8k

Stars

Forks

Watchers

Issues

Star 增长

今日0

近 7 天0

近 30 天+19

综合评分55.44

默认分支main

bullshit-benchmark

Star 增长

加入交流群