[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76202":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":25,"discoverSource":26},76202,"SheetasToken","xiaoqi-7\u002FSheetasToken","xiaoqi-7","Implementation and resources for Sheet as Token, a graph-enhanced framework for multi-sheet spreadsheet understanding and retrieval.",null,"Python",612,109,65,2,0,528,10.12,false,"main",true,[],"2026-06-12 02:03:40","# SheetAgent Paper Repository\n\nThis repository contains the code for our two-stage spreadsheet retrieval pipeline, including the Stage 1 sheet encoder, the Stage 2 graph retriever, and the experiment scripts used in the paper.\n\n## Overview\n\nOur system separates spreadsheet understanding into two stages:\n\n- **Stage 1: Sheet Token Encoder**\n  - Learns reusable sheet-level representations from pairwise sheet supervision.\n  - Supports two main variants:\n    - `with_example`: sheet serialization includes column examples\n    - `wo_example`: sheet serialization excludes column examples\n\n- **Stage 2: Graph Retriever**\n  - Performs query-conditioned cross-sheet retrieval over a candidate workspace.\n  - Supports two main variants:\n    - `baseline`: shallower graph retriever\n    - `enhanced`: graph-enhanced retriever with stronger relational composition\n\nThe final paper model uses:\n\n- **Stage 1 with examples**\n- **Stage 2 enhanced**\n- **frozen Stage 1 encoder during Stage 2 training**\n\n---\n\n## Repository Structure\n\n```text\n.\n├── api\u002F                                  # Optional API serving code\n├── configs\u002F                              # Configuration files\n├── data\u002F                                 # Training \u002F evaluation data\n├── docs\u002F                                 # Notes or documentation\n├── models\u002F\n│   ├── stage1\u002F\n│   │   ├── biencoder_model.py            # Legacy Stage 1 baseline (reference only)\n│   │   ├── biencoder_model_with_example.py\n│   │   └── biencoder_model_wo_example.py\n│   └── stage2\u002F\n│       ├── stage2_gtn_baseline.py\n│       └── stage2_gtn_v2.py\n├── scripts\u002F\n│   ├── stage1\u002F\n│   │   ├── train_with_example.sh\n│   │   └── train_wo_example.sh\n│   └── stage2\u002F\n│       ├── train_baseline_freeze.sh\n│       └── train_enhanced_freeze.sh\n├── utils\u002F                                # Utility functions\n├── requirements.txt\n└── README.md\n```\n\n---\n\n## Main Files\n\n### Stage 1\n- `models\u002Fstage1\u002Fbiencoder_model_with_example.py`  \n  Stage 1 encoder using example-enhanced sheet serialization.\n\n- `models\u002Fstage1\u002Fbiencoder_model_wo_example.py`  \n  Stage 1 encoder without column examples.\n\n- `models\u002Fstage1\u002Fbiencoder_model.py`  \n  Legacy \u002F early Stage 1 baseline, kept for reference only.  \n  Current paper experiments use the two variants above.\n\n### Stage 2\n- `models\u002Fstage2\u002Fstage2_gtn_baseline.py`  \n  Shallow graph retriever used as the architecture ablation \u002F shadow model.\n\n- `models\u002Fstage2\u002Fstage2_gtn_v2.py`  \n  Enhanced graph retriever used as the full model.\n\n---\n\n## Data Format\n\nThe code expects the dataset under `data\u002F`.\n\nTypical files include:\n\n- `data\u002Fsheets.json`  \n  Sheet metadata and serialized sheet content.\n\n- `data\u002Ftrain.json`  \n  Pairwise Stage 1 supervision data.\n\n- `data\u002Fquery.json`  \n  Query-conditioned Stage 2 retrieval data.\n\nAdjust paths if your local setup differs.\n\n---\n\n## Environment Setup\n\nInstall dependencies first:\n\n```bash\npip install -r requirements.txt\n```\n\nThe scripts default to the Hugging Face model name `bert-base-uncased`.\n\nIf you want to use a local pretrained model snapshot, you can override `MODEL_NAME` when running a script.\n\nExample:\n\n```bash\nMODEL_NAME=\u002Fpath\u002Fto\u002Flocal\u002Fmodel bash scripts\u002Fstage2\u002Ftrain_enhanced_freeze.sh\n```\n\n---\n\n## Training Scripts\n\n### Stage 1\n\nTrain Stage 1 with example-enhanced serialization:\n\n```bash\nbash scripts\u002Fstage1\u002Ftrain_with_example.sh\n```\n\nTrain Stage 1 without column examples:\n\n```bash\nbash scripts\u002Fstage1\u002Ftrain_wo_example.sh\n```\n\n### Stage 2\n\nTrain the Stage 2 baseline retriever with frozen Stage 1:\n\n```bash\nbash scripts\u002Fstage2\u002Ftrain_baseline_freeze.sh\n```\n\nTrain the Stage 2 enhanced retriever with frozen Stage 1:\n\n```bash\nbash scripts\u002Fstage2\u002Ftrain_enhanced_freeze.sh\n```\n\n---\n\n## Optional Script Overrides\n\nThe shell scripts support environment-variable overrides.\n\nCommon overrides include:\n\n- `MODEL_NAME`\n- `DATA_DIR`\n- `STAGE1_CKPT`\n- `OUTPUT_DIR`\n- `TB_DIR`\n- `BEST_MODEL_DIR`\n- `FINAL_MODEL_DIR`\n\nExample:\n\n```bash\nMODEL_NAME=\u002Fpath\u002Fto\u002Flocal\u002Fmodel \\\nSTAGE1_CKPT=best_model_with_example\u002Fclassifier.pt \\\nbash scripts\u002Fstage2\u002Ftrain_enhanced_freeze.sh\n```\n\nThis makes the scripts usable on both local machines and remote servers without hardcoding machine-specific paths.\n\n---\n\n## Paper Experiment Mapping\n\n### Full Model\n- Stage 1: `with_example`\n- Stage 2: `enhanced`\n- Stage 1 encoder frozen during Stage 2 training\n\n### Architecture Ablation\n- Stage 1: `with_example`\n- Stage 2: `baseline`\n- Stage 1 encoder frozen during Stage 2 training\n\n### Feature Ablation\n- Stage 1: `wo_example`\n- Stage 2: `enhanced`\n- Stage 1 encoder frozen during Stage 2 training\n\n---\n\n## Outputs\n\nTraining scripts typically write outputs to:\n\n- `runs\u002F...` for TensorBoard logs\n- `outputs\u002F...` for experiment outputs\n- `best_model_*` \u002F `final_model_*` for Stage 1 checkpoints\n\nThese training artifacts are local experiment outputs and should generally not be committed to Git.\n\n---\n\n## Recommended Git Ignore\n\nA typical `.gitignore` should include at least:\n\n```gitignore\nbest_model\u002F\nbest_model_with_example\u002F\nbest_model_wo_example\u002F\nfinal_model\u002F\nfinal_model_with_example\u002F\nfinal_model_wo_example\u002F\noutputs\u002F\nruns\u002F\n*.log\n__pycache__\u002F\n```\n\nYou can expand this as needed for your environment.\n\n\n## Citation\n\nIf you use this repository, please cite the associated paper：\n\n```\n@misc{lei2026sheet,\n  title={Sheet as Token: A Graph-Enhanced Representation for Multi-Sheet Spreadsheet Understanding},\n  author={Lei, Yiming and Wang, Yiqi and Zhang, Yujia and Guan, Bo and Zhu, Depei and Wang, Chunhui and Hao, Zhuonan and Shi, Tianyu},\n  year={2026},\n  eprint={2605.05811},\n  archivePrefix={arXiv},\n  primaryClass={cs.AI},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.05811}\n}\n```\n\n\n\n\n\n## Contact\n\nIf you have any questions about this repository or the project, please contact:\n\n- Yiqi Wang: yiqi.wang.jennie@gmail.com\n- Zhuonan Hao: znhao@g.ucla.edu\n- Tianyu Shi: tianyu.shi3@mcgill.ca\n","SheetasToken 是一个用于多表格电子表格理解和检索的图增强框架。其核心功能分为两个阶段：第一阶段是表格编码器，通过成对的表格监督学习可复用的表格级表示，并支持包含或不包含列示例的两种变体；第二阶段是图检索器，在候选工作空间上执行基于查询的跨表检索，支持基础版和增强版两种模式。项目采用Python开发，最终模型结合了带示例的第一阶段编码器和增强型第二阶段检索器。此工具适用于需要从复杂多表数据中高效提取信息的场景，如企业数据分析、科研资料整理等。","2026-06-11 03:54:46","CREATED_QUERY"]