[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80753":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":16,"lastSyncTime":26,"discoverSource":27},80753,"research-chatgpt-guesses-between-1-and-100","exmergo\u002Fresearch-chatgpt-guesses-between-1-and-100","exmergo","When asked to pick a random number between 1 and 100, ChatGPT does not follow a random uniform distribution",null,"Python",43,3,41,0,1,2,42.51,"MIT License",false,"main",true,[],"2026-06-12 04:01:29","# GPT Guesses Between 1 and 100\n\n\u003Cimg width=\"1372\" height=\"731\" alt=\"Exmergo Viz - I asked GPT to pick a random number between 1 and 100 (sample 10k)\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0d51eb37-fdd9-4f8b-87c8-db44c551818e\" \u002F>\n\n\nAn interesting thing about humans is that they are not good random number generators.  \nIf you ask a person to \"pick a random number between 1 and 100\", they are\nremarkably predictable. Answers cluster on 37 and 73, on \"messy\" numbers, and\non memes like 42 and 69, while round numbers are quietly avoided. A true random\ngenerator would instead produce a flat, **uniform** distribution.\n\n**This project asks `gpt-4.1` the same question 10,000 times** and\ncharacterizes the distribution it produces, measured against a uniform baseline.\nDoes an LLM, which is trained on human text, behave like a fair die, or does it inherit\nthe lumpy human pattern?\n\nFull design and methodology: [`docs\u002FLLM Random Bias Experiment SDD.md`](docs\u002FLLM%20Random%20Bias%20Experiment%20SDD.md).\n\n## Inspiration\n\nThis experiment is an LLM-focused follow-up to two well-known explorations of *human* number-picking bias.\n\n- r\u002Fdataisbeautiful — [\"[OC] I asked 100 people to pick a number between 1 and 100\"](https:\u002F\u002Fwww.reddit.com\u002Fr\u002Fdataisbeautiful\u002Fcomments\u002Fiiafkd\u002Foc_i_asked_100_people_to_pick_a_number_between\u002F)\n- Veritasium — [Why is this number everywhere?](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=d6iQrh2TK98)\n\n## Methodology\n\nFull experimental design is in the\n[SDD](docs\u002FLLM%20Random%20Bias%20Experiment%20SDD.md); the essentials:\n\n- **Model.** `gpt-4.1` (OpenAI), called via the Responses API. It is a\n  *non-reasoning* model. It emits a direct answer rather than deliberating; what we're measuring is\n  its raw output distribution, not a reasoning strategy. The exact\n  model string is recorded in every raw-CSV row (`Model` column) and in\n  `data\u002Fraw\u002Frun_metadata.json`, so the dataset is self-describing.\n- **Sample size.** N = 10,000 independent calls — enough for a chi-square\n  goodness-of-fit test and per-number proportions stable to ~±0.5 pp.\n- **Sampling.** `temperature = 1.0`, so the model exercises its full sampling\n  distribution. This is the experiment: at low temperature it would just repeat\n  one number.\n- **Prompt.** A fixed system prompt instructs the model to output only one\n  integer between 1 and 100; the user prompt requests the number and carries a\n  unique `uuid4`. (The UUID is request-tracing hygiene, *not* cache-busting — at\n  temperature 1.0 every call should sample independently regardless.)\n- **Baseline.** The result is compared against a **uniform** distribution — what\n  a fair generator would produce — not against human data (see *Assumptions*).\n- **Pipeline.** Four stages — `collect → clean → transform → stats`, detailed\n  below. Cleaning validates every answer is an integer in [1, 100] and reports\n  the rejection rate.\n\n### Assumptions & Limitations\n\nThis is an illustrative probe, not a definitive study. Key caveats — see the\n[SDD's Limitations section](docs\u002FLLM%20Random%20Bias%20Experiment%20SDD.md) for\nthe formal treatment:\n\n- **Single model.** Results describe `gpt-4.1` only and do not generalize to\n  other models or providers.\n- **\"Randomness\" is a sampling artifact.** The model is not a random number\n  generator; it samples a learned token distribution. We characterize that\n  distribution — we do not claim the model is *trying* to be random.\n- **Prompt- and temperature-dependent.** A different prompt wording or sampling\n  temperature could shift the distribution. Both are fixed and documented.\n- **Not \"ChatGPT the product.\"** This tests a model through the API at a fixed\n  temperature — not the consumer ChatGPT app, which adds routing, tools, and a\n  system prompt outside our control.\n\n## Results\n\n**gpt-4.1 is emphatically not a uniform random generator.** A chi-square\ngoodness-of-fit test against a uniform distribution (N = 10,000, df = 99) returns\n**χ² = 15,604, p ≈ 0** — the deviation is so large it underflows any\nsignificance threshold. Asked for a random number, the model produces a lumpy,\ndistinctly human-shaped distribution.\n\n### It reproduced the classic human spikes\n\n| Number | Picked vs. uniform chance | Human reputation |\n| --- | --- | --- |\n| 37 | **4.0×** | \"the most random number\" |\n| 42 | **4.0×** | *Hitchhiker's Guide* meme |\n| 73 | **3.4×** | the other well-known spike |\n\nThe five most-picked numbers overall — `47, 57, 72, 37, 42` — lean heavily on\nnumbers ending in 7 (three of the five), the same \"number that feels random\" pull seen in\nhumans.\n\n### It avoids round numbers even harder than humans\n\n**All multiples of 10, except for 10 itself, were picked exactly 0 times in 10,000 calls**.\n10 was picked exactly once. Humans avoid round numbers — gpt-4.1 essentially refuses them.\n\n### The exception: 69\n\nOne number breaks the human pattern. 69 is a meme number humans *over*-pick.\ngpt-4.1 **under**-picks it (0.29× expected: ~29 occurrences against ~100). The\nmodel inherited the \"smart\" meme (42) and not the crude one. Our hypothesis is that \nthis is a product of safety guardrails during pre-training and post-training. \nIt is the most interesting aspect in the dataset: the model's\nbias is not a raw copy of human bias but a *moderated* version of it.\n\n### Takeaway\n\nThe hypothesis holds. An LLM trained on human text, asked to be random,\nreproduces human random-number bias: the pull toward 37 and 73, the meme spike\nat 42, the aversion to round numbers — with one guardrail-likely exception. The\ninteractive [distribution chart](https:\u002F\u002Fviz.exmergo.com\u002Fshare\u002Feea2a7b6-82d4-4333-8853-e909d9dabd49)\nshows the full 1–100 shape.\n\n*All figures from [`data\u002Fprocessed\u002Fstats_summary.csv`](data\u002Fprocessed\u002Fstats_summary.csv).*\n\n## The pipeline\n\n`collect → clean → transform → stats`. Each stage reads the previous stage's\ncommitted CSV, so any stage can be re-run on its own.\n\n| Stage | Module | Output |\n| --- | --- | --- |\n| Collect | `llm_random_bias.collect` | `data\u002Fraw\u002Fchatgpt_random_results.csv` |\n| Clean | `llm_random_bias.clean` | `data\u002Fprocessed\u002Fchatgpt_random_clean.csv` |\n| Transform | `llm_random_bias.transform` | `data\u002Fprocessed\u002Fdistribution.csv` |\n| Stats | `llm_random_bias.stats` | `data\u002Fprocessed\u002Fstats_summary.csv` |\n\n## Setup\n\nThis project uses [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) for everything.\n\n```sh\nuv sync\n```\n\n## Path 1 — Analysis only (free, no API key)\n\nThe raw dataset is committed to this repo, so you can reproduce the entire\nanalysis without spending a cent:\n\n```sh\nuv run python -m llm_random_bias.clean\nuv run python -m llm_random_bias.transform\nuv run python -m llm_random_bias.stats\n```\n\n## Path 2 — Fresh data collection (needs an OpenAI API key)\n\n```sh\ncp .env.example .env          # then edit .env and add your OPENAI_API_KEY\nuv run python -m llm_random_bias.collect\n# then run clean \u002F transform \u002F stats as in Path 1\n```\n\n**Cost & runtime:** ~10,000 short calls to `gpt-4.1` cost roughly US$2 and\nfinish in a few minutes at the default concurrency. The collector refuses to\noverwrite an existing raw CSV — delete it first to re-collect.\n\n## Visualization\n\nThe distribution bar chart is built in **Exmergo Viz** (our AI dashboard agent) directly from\n`data\u002Fprocessed\u002Fdistribution.csv`. The fully interactive data viz can be viewed [here](https:\u002F\u002Fviz.exmergo.com\u002Fshare\u002Feea2a7b6-82d4-4333-8853-e909d9dabd49).\n\n## Development\n\n```sh\nuv run ruff check .\nuv run ruff format .\nuv run mypy src\nuv run pytest\n```\n\nSee [`CONTRIBUTING.md`](CONTRIBUTING.md).\n\n## License\n\nMIT — see [`LICENSE`](LICENSE).\n","该项目研究了ChatGPT在被要求从1到100之间随机选择一个数字时的分布情况。通过向gpt-4.1模型发出10,000次请求并分析其响应，项目旨在探索该语言模型是否能够像真正的随机数生成器那样均匀分布，还是继承了人类倾向于选择某些特定数字的模式。技术上，使用Python编程语言实现，并采用统计方法如卡方检验来评估数据的一致性。适合用于对AI行为、特别是大型语言模型如何处理看似简单的任务感兴趣的研究者或开发者参考。","2026-06-11 04:01:53","CREATED_QUERY"]