[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-78161":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},78161,"TransitLM","HotTricker\u002FTransitLM","HotTricker","TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation",null,"Python",124,4,1,0,8,19,3,2.1,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:46","# TransitLM Route Evaluation\n\nThis repository provides the companion evaluation code for the paper *TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation*, and offers a unified funnel-style pipeline for systematically evaluating route-planning model outputs.\n\nIt supports the following four representative settings:\n\n- Single-route planning\n- Preference-aware planning\n- Multi-route diversity\n- General-purpose LLM evaluation through a remote route-eval API\n\nRather than producing only a single aggregate score, the evaluator decomposes quality into several dimensions, including reachability, station grounding, structural consistency, and the plausibility of distance, time, and fare estimates.\n\n## Paper And Resources\n\n- Paper: [`TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation`](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.22355)\n- Dataset: [`GD-ML\u002FTransitLM`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FGD-ML\u002FTransitLM)\n- This repository: the evaluation code accompanying the TransitLM paper, intended for reproducing the evaluation pipeline and metrics used in the benchmark\n\n## Why This Repo\n\n- **Scenario coverage**: one codebase for benchmark 1, 2, 3, and general LLM evaluation\n- **Layered evaluation**: reachability, grounding, overlap, and estimation accuracy are reported separately\n- **Lightweight setup**: Python standard library only\n- **Fast validation**: bundled example CSVs let you run every evaluator immediately\n\n## Quick Start\n\nRequirements:\n\n- Python 3.8+\n\nRun the built-in examples:\n\n```bash\npython3 single_route\u002Fevaluate.py --input_field generate_results\npython3 personalized\u002Fevaluate.py --input_field generate_results\npython3 diversity\u002Fevaluate.py --input_field generate_results\npython3 general_llm\u002Fevaluate.py\n```\n\nThe default `--input` of each script is already configured to the corresponding example CSV.\n\n## What Each Evaluator Covers\n\n| Scenario | Script | Extra Logic |\n|---|---|---|\n| Single-route planning | `single_route\u002Fevaluate.py` | 4-round core funnel |\n| Personalized planning | `personalized\u002Fevaluate.py` | Adds Round 5: Preference Compliance |\n| Route diversity | `diversity\u002Fevaluate.py` | Adds Best-Match and Route Diversity (RD) |\n| General LLM | `general_llm\u002Fevaluate.py` | Uses remote route-eval API |\n\n### Personalized Planning\n\nRound 5 verifies whether the prediction satisfies user preference constraints such as:\n\n- fewer transfers\n- no subway\n- subway first\n- shorter travel time\n\n### Diversity Planning\n\nAdditional metrics include:\n\n- **Best-Match**: if the first route is not an exact match, search `second` and `third` for a reachable exact alternative\n- **Route Diversity (RD)**: `mean(1 - IoU(L_i, L_j))`, with transfer modes included in the route signature\n\n### General LLM Evaluation\n\n`general_llm\u002Fevaluate.py` additionally supports:\n\n- `--api_url`: defaults to `TRANSIT_LM_API`, then falls back to `http:\u002F\u002Ftransit-lm.amap.com`\n- `--batch_size`: default `50`\n\nExample command:\n\n```bash\npython3 general_llm\u002Fevaluate.py \\\n  --api_url http:\u002F\u002Ftransit-lm.amap.com \\\n  --batch_size 50\n```\n\nIn this setting, the evaluation field is fixed to `generate_results`, and station sequences should use station names rather than `stop_id`.\n\n## Evaluation Pipeline\n\nAll scenarios share the following four core rounds:\n\n| Round | Check | What It Verifies |\n|---|---|---|\n| 1 | Reachability | Adjacent stations are connected in `next_stop_ids` |\n| 2 | Station Grounding & Distance Plausibility | Verifies start\u002Fend grounding and the plausibility of transfer distances |\n| 3 | Line IoU + Station IoU + Expert Score + Transfer Mode Consistency | Verifies structural consistency between prediction and reference |\n| 4 | Estimation Accuracy | Verifies the accuracy of distance, time, fare, and transfer estimates |\n\nRound 2 thresholds:\n\n- walk: 3 km\n- bike: 5 km\n- taxi: 10 km\n\nRound 4 tolerances:\n\n- distance: `+-10%` or `0.5 km`\n- time: `+-10%` or `5 min`\n- fare: `+-10%` or `1 CNY`\n- transfer distance: `+-0.5 km`\n\nExpert score formula used in Round 3:\n\n```text\nS = T_sec \u002F 300 + (N_lines + cycling_segments) + fare\n```\n\n## Input Contract\n\n### CSV Fields\n\nCommon fields are as follows:\n\n| Field | Description |\n|---|---|\n| `index_id` | Sample identifier |\n| `sft_prompt` | Prompt JSON string, usually including query \u002F start \u002F end \u002F city |\n| `sft_label` | Ground-truth route JSON |\n| `generate_results` | Model output JSON |\n\nBenchmark 2 also requires:\n\n| Field | Description |\n|---|---|\n| `req_type` | Preference type, such as `2`, `5`, `7`, `8` |\n\n### Route JSON\n\nSingle-route tasks expect a JSON object of the following form:\n\n```json\n{\n  \"station_sequence\": [\"stop_id_1\", \"stop_id_2\", \"stop_id_3\"],\n  \"line_sequence\": [\"Line A\", \"Line B\"],\n  \"total_distance\": \"12.5\",\n  \"total_time\": \"35\",\n  \"total_fare\": \"4\",\n  \"start_transfer_mode\": \"步行\",\n  \"start_transfer_distance\": \"0.8\",\n  \"end_transfer_mode\": \"步行\",\n  \"end_transfer_distance\": \"0.6\"\n}\n```\n\nDiversity tasks expect a multi-route object of the following form:\n\n```json\n{\n  \"first\": {\n    \"station_sequence\": [\"stop_id_1\", \"stop_id_2\"],\n    \"line_sequence\": [\"Line A\"],\n    \"total_distance\": \"8.2\",\n    \"total_time\": \"24\",\n    \"total_fare\": \"3\"\n  },\n  \"second\": {\n    \"station_sequence\": [\"stop_id_3\", \"stop_id_4\"],\n    \"line_sequence\": [\"Line B\"],\n    \"total_distance\": \"9.0\",\n    \"total_time\": \"27\",\n    \"total_fare\": \"4\"\n  },\n  \"third\": {\n    \"station_sequence\": [\"stop_id_5\", \"stop_id_6\"],\n    \"line_sequence\": [\"Line C\"],\n    \"total_distance\": \"10.5\",\n    \"total_time\": \"30\",\n    \"total_fare\": \"4\"\n  }\n}\n```\n\nFor the general LLM setting:\n\n- labels still use `station_sequence`\n- prediction-side station sequences should contain station names\n- both `station_sequence` and `station_name` are accepted\n- the remote evaluator primarily relies on boarding \u002F alighting stations and normalized transfer structure\n\n## Key Output Signals\n\nThe scripts print round-by-round summaries to stdout. In practice, the following signals are usually the most informative:\n\n- reachability pass rate\n- number of samples with `station_iou == 1`\n- overall accuracy\n- preference compliance for benchmark 2\n- best-match hits and route diversity for benchmark 3\n\n## Repository Layout\n\n```text\n.\n├── common.py\n├── single_route\u002Fevaluate.py\n├── personalized\u002Fevaluate.py\n├── diversity\u002Fevaluate.py\n├── general_llm\u002Fevaluate.py\n└── data\u002F\n    ├── station_info.csv\n    ├── benchmark1_single_route_example.csv\n    ├── benchmark2_personalized_example.csv\n    ├── benchmark3_diversity_example.csv\n    └── general_llm_example.csv\n```\n\n## Notes\n\n- `common.py` automatically converts `coord_x` \u002F `coord_y` into internal longitude \u002F latitude values\n- `next_hop_stations` is normalized into `next_stop_ids`\n- if `station_name` exists in `station_info.csv`, it is loaded automatically\n","TransitLM是一个用于无地图公交路线生成的大规模数据集和基准测试项目。它提供了一个统一的漏斗式评估管道，能够系统地评估路线规划模型输出，并支持单一路线规划、偏好感知规划、多路线多样性以及通过远程API进行通用大语言模型评价等多种场景。该项目的核心功能包括对可达性、站点准确性、结构一致性及距离、时间和费用估计合理性等多维度的质量分解评估。其特点为轻量级设置（仅依赖Python标准库）、快速验证（内置示例CSV文件可立即运行）以及分层评估机制。适用于需要在没有地图数据的情况下生成或评估公交路线的应用场景，如智能交通系统开发、城市出行服务优化等领域。",2,"2026-06-11 03:56:30","CREATED_QUERY"]