[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-78245":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":13,"forks30d":13,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":21,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":13,"starSnapshotCount":13,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},78245,"tokenspeed","MikeVeerman\u002Ftokenspeed","MikeVeerman",null,"HTML",158,4,104,0,1,7,54,3,2.1,false,"master",true,[],"2026-06-12 02:03:46","# tokenspeed\n\n**How fast is 30 tokens per second, really?**\n\nEvery local-LLM benchmark reports throughput: \"47 tok\u002Fs on an M3,\" \"180 tok\u002Fs on a 4090,\" \"500 tok\u002Fs on Groq.\" But unless you've actually watched tokens stream at those speeds, the numbers are hard to internalize.\n\n`tokenspeed` is a tiny terminal toy that streams fake tokens at any rate you set, so you can see what those numbers actually look like.\n\nThree modes:\n\n- **`code`** — syntax-highlighted pseudo-code (Python\u002FRust\u002FJS), the most common thing you watch stream out of an LLM\n- **`text`** — Wikipedia prose, for the chat\u002Fanswer case\n- **`think`** — dim-italic reasoning sentences alternating with code, mimicking a reasoning model thinking out loud\n\nIf you don't pick a mode on the command line, it'll ask.\n\n## Run it\n\n```bash\npython3 tokenspeed.py                  # 30 tok\u002Fs, prompts for mode\npython3 tokenspeed.py 60               # 60 tok\u002Fs, prompts for mode\npython3 tokenspeed.py --mode code      # skip the prompt\npython3 tokenspeed.py 120 --mode think # both at once\n```\n\nNo dependencies — just Python 3 and a real terminal.\n\n## What to try\n\nStart at the default `30` and read along. Then hit `1` (5 tok\u002Fs) — Raspberry-Pi-class local model. Then `5` (60 tok\u002Fs) — typical hosted Claude or GPT. Then `7` (200 tok\u002Fs) — Groq territory. Then `9` (800 tok\u002Fs) — Cerebras-class, where the bottleneck is your eyeballs.\n\nNow switch between `--mode code` and `--mode text` at the same rate. The difference is striking and intentional — see below.\n\n## Controls\n\n| Key      | Action                                                              |\n| -------- | ------------------------------------------------------------------- |\n| `+` \u002F `-`| Nudge the rate by ×1.25                                             |\n| `1`–`9`  | Jump to a preset: 5, 10, 20, 30, 60, 100, 200, 400, 800 tok\u002Fs       |\n| `space`  | Pause \u002F resume                                                      |\n| `q`      | Quit                                                                |\n\n## What counts as a token\n\n`tokenspeed` approximates BPE-style tokenization. It is **not** trying to exactly reproduce `tiktoken`, Claude's tokenizer, or any vendor-specific encoder — those disagree with each other in the details anyway.\n\nRoughly: short words are often one token; longer identifiers frequently split into multiple chunks (e.g., `processUserInput` → `process` + `User` + `Input`, `calculate_score` → `calculate` + `_score`); and punctuation and operators usually count as tokens too.\n\nThe point worth internalizing: code is more token-dense than prose, so the same tok\u002Fs can feel very different depending on what's streaming. 30 tok\u002Fs of code lands far less visible content per second than 30 tok\u002Fs of English. The benchmark number is honest; the perceptual effect just varies a lot by content type, which is exactly the gap this tool exists to expose.\n\n(English prose averages ~1.3 tokens per word, so 30 tok\u002Fs ≈ 23 words\u002Fs.)\n","tokenspeed 是一个用于模拟不同速率下生成令牌的终端小工具，帮助用户直观感受各种本地大语言模型（LLM）所报告的吞吐量数值。其核心功能包括以用户指定的速度生成伪代码、文本或思考过程三种模式的令牌流，并支持通过键盘快捷键实时调整速率。该工具无需额外依赖，仅需Python 3环境即可运行。适用于希望更好地理解不同硬件平台上LLM性能表现的研究者或开发者，特别是那些对模型输出速度有特定需求的应用场景中，如聊天机器人响应时间优化等。",2,"2026-06-11 03:56:38","CREATED_QUERY"]