[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80789":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},80789,"UniRank","salmon1802\u002FUniRank","salmon1802","UniRank: A Ranking Model Benchmark for Unified Sequential Modeling and Feature Interaction",null,"Python",42,11,39,0,3,43.54,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:30","# UniRank \u003Csub>v0.1.0, work in progress\u003C\u002Fsub>\n\n**A Ranking Model Benchmark for Unified Sequential Modeling and Feature Interaction**\n\nUniRank is an open PyTorch benchmark for large-scale recommendation ranking models. It focuses on a practical setting that is increasingly common in industrial recommender systems: ranking models must jointly learn from heterogeneous non-sequential features, target item features, and long user behavior sequences under multi-feedback objectives such as click, follow, like, share, comment, long-view, and conversion.\n\nThe project is built to make modern unified ranking architectures easier to compare, reproduce, and extend. It provides standardized dataset configurations, model implementations, distributed training utilities, mixed precision support, blocked data loading for large datasets, and sparse attention acceleration for long-sequence models.\n\n## Why UniRank?\n\nModern ranking research is moving from isolated feature interaction or sequence pooling modules toward unified architectures that model feature fields and user behavior tokens together. However, many strong ranking models are released from industrial systems where data, implementations, and infrastructure are not fully available. This makes it difficult to answer basic research questions:\n\n- Which architecture works best under the same data split, sequence length, and metric protocol?\n- How should feature interaction and sequential modeling be combined?\n- How do models behave across different feedback tasks rather than only CTR?\n- What engineering support is needed to train ranking models on industrial-scale data?\n\nUniRank addresses these gaps by collecting representative ranking models, unified data processing logic, and reproducible experiment settings in one benchmark.\n\n## Architecture Design\n\nUniRank follows a unified ranking pipeline. Raw user, item, context, and action features are embedded, converted into model-specific tokens, passed through feature interaction or sequence interaction layers, and finally predicted by task-specific towers.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"900\" alt=\"Traditional New Impression Only Paradigm\" src=\".\u002Fassets\u002Ffigures\u002Fnew_impression_only_paradigm.png\">\n\u003C\u002Fp>\n\n**Figure 1. Traditional New Impression Only Paradigm.** Most conventional ranking systems train on the latest impressed target item only. Historical positive feedback is used as auxiliary behavior context, usually through target attention, pooling, or aggregation, before being combined with the target item, user profile, and context features in a feature interaction layer. This paradigm is efficient, but it treats each target impression as an independent sample and does not fully exploit the step-by-step evolution of user behavior.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"1200\" alt=\"UniRank Auto-Regressive Paradigm\" src=\".\u002Fassets\u002Ffigures\u002Fauto_regressive_paradigm.png\">\n\u003C\u002Fp>\n\n**Figure 2. UniRank Auto-Regressive Paradigm.** UniRank reorganizes user histories as sequential training samples. Each behavior step can be represented with action-aware sequential tokens, target item, and non-sequential feature tokens. Instead of only predicting the latest impression, the model learns from the chronological behavior sequence and supports multi-task prediction at different positions. This design better matches long user histories and enables unified sequence-feature interaction.\n\nFollowing the paper, UniRank organizes representative unified ranking models into two architectural paradigms:\n\n| Paradigm | Description | Representative Models |\n|:--|:--|:--|\n| Unified Interaction after Sequence Pooling and Non-sequence Tokenization | Behavior sequences are first pooled or aggregated into compact sequential representations. These representations are then tokenized together with non-sequential features into a **unified token** space for subsequent interaction modeling. | HiFormer, RankMixer, Zenith, UniMixer, HeMix |\n| Layer-wise Unified Interaction | Keep sequence tokens and non-sequence tokens inside the interaction layers, allowing behavior tokens, field tokens, and target tokens to exchange information throughout the **unified interaction network**. | OneTrans, HyFormer, MixFormer, INFNet, EST, SORT, TokenFormer, LONGER, UltraHSTU |\n\nDesign choices in this repository are intentionally practical:\n\n- **Multi-feedback ranking**: each dataset can define multiple binary feedback tasks and evaluate AUC\u002FgAUC per task.\n- **Auto-regressive \u002F user-centric training support**: long behavior histories can be represented as structured action sequences rather than only a latest-impression sample.\n- **Distributed training**: `torchrun` + DDP are supported through `run_expid.py`.\n- **Large data loading**: blocked parquet loading is supported for large datasets such as TencentGR-10M.\n- **Mixed precision and operator acceleration**: bf16 training and sparse\u002Fflex attention paths are available for compatible models.\n\n## Repository Structure\n\n```text\nUniRank\u002F\n+-- config\u002F\n|   +-- dataset_config.yaml      # Dataset paths, feature schemas, labels, and blocked-loading options\n|   +-- model_config.yaml        # Experiment ids and hyperparameters\n+-- data\u002F\n|   +-- QK_Video_Action\u002F\n|   +-- KuaiRand_Video_Action\u002F\n|   +-- TencentGR_10M_Action_Blocked\u002F\n+-- fuxictr\u002F                     # Training, feature, metric, and layer utilities based on FuxiCTR\n+-- model_zoo\u002F                   # Ranking model implementations\n+-- checkpoints\u002F                 # Saved models and experiment logs\n+-- test\u002F                        # Metric and utility tests\n+-- UniRank_Dataloader.py        # UniRank-specific sequence\u002Faction dataloader\n+-- run_expid.py                 # Run one experiment\n+-- run_all.sh                   # Run a list of experiments\n+-- run_param_tuner.py           # Hyperparameter tuning entry\n+-- autotuner.py                 # Tuning utilities\n+-- requirements.txt\n+-- README.md\n```\n\n## Datasets\n\n### Raw Datasets\n\n- [QK-Video](https:\u002F\u002Fstatic.qblv.qq.com\u002Fqblv\u002Fh5\u002Falgo-frontend\u002Ftenrec_dataset.html)\n- [KuaiRand](https:\u002F\u002Fkuairand.com\u002F)\n- [TAAC2025 \u002F TencentGR-10M](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTAAC2025\u002FTencentGR-10M)\n\n### Preprocessed Datasets\n\n- [QK_Video_Action](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsalmon1802\u002FQK_Video_Action)\n- [KuaiRand_Video_Action](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsalmon1802\u002FKuaiRand_Video_Action)\n- [TencentGR_10M_Action](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsalmon1802\u002FTencentGR_10M_Action_Blocked)\n\nPlace the downloaded preprocessed datasets under `.\u002Fdata\u002F` using the same directory names as the dataset ids in `config\u002Fdataset_config.yaml`.\n\n## Models\n\n| No. | Model | Publication |\n|:--:|:--|:--|\n| 1 | [DIN](.\u002Fmodel_zoo\u002FDIN.py) | [Deep Interest Evolution Network for Click-Through Rate Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F1809.03672) |\n| 2 | [HiFormer](.\u002Fmodel_zoo\u002FHiFormer.py) | [Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2311.05884) |\n| 3 | [RankMixer](.\u002Fmodel_zoo\u002FRankMixer.py) | [RankMixer: Scaling Up Ranking Models in Industrial Recommenders](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.15551) |\n| 4 | [Zenith](.\u002Fmodel_zoo\u002FZenith.py) | [Zenith: Scaling up Ranking Models for Billion-scale Livestreaming Recommendation](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2601.21285) |\n| 5 | [UniMixer](.\u002Fmodel_zoo\u002FUniMixer.py) | [UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2604.00590) |\n| 6 | [HeMix](.\u002Fmodel_zoo\u002FHeMix.py) | [Query-Mixed Interest Extraction and Heterogeneous Interaction: A Scalable CTR Model for Industrial Recommender Systems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.09387) |\n| 7 | [LONGER](.\u002Fmodel_zoo\u002FLONGER.py) | [LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.04421) |\n| 8 | [OneTrans](.\u002Fmodel_zoo\u002FOneTrans.py) | [OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.26104) |\n| 9 | [HyFormer](.\u002Fmodel_zoo\u002FHyFormer.py) | [HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction](https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.12681) |\n| 10 | [MixFormer](.\u002Fmodel_zoo\u002FMixFormer.py) | [MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.14110) |\n| 11 | [INFNet](.\u002Fmodel_zoo\u002FINFNet.py) | [INFNet: A Task-aware Information Flow Network for Large-Scale Recommendation Systems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2508.11565v1) |\n| 12 | [EST](.\u002Fmodel_zoo\u002FEST.py) | [EST: Towards Efficient Scaling Laws in Click-Through Rate Prediction via Unified Modeling](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.10811) |\n| 13 | [SORT](.\u002Fmodel_zoo\u002FSORT.py) | [SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders](https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.03988) |\n| 14 | [TokenFormer](.\u002Fmodel_zoo\u002FTokenFormer.py) | [TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.13737) |\n| 15 | [UltraHSTU](.\u002Fmodel_zoo\u002FUltraHSTU.py) | [Bending the Scaling Law Curve in Large-Scale Recommendation Systems](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.16986) |\n\nAdditional experimental or auxiliary implementations may also appear in `model_zoo\u002F`.\n\n## Benchmark\n\nThe table below reports the preliminary benchmarking results under a fixed sequence length of 100. For a fair comparison, all models are configured with three layers. The token dimension is set to 128 for QK-Video and 256 for KuaiRand and TAAC-25.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"1548\" alt=\"Preliminary UniRank benchmark results\" src=\".\u002Fassets\u002Ffigures\u002Fpreliminary_benchmark_results.png\">\n\u003C\u002Fp>\n\n**Figure 3. Preliminary Benchmark Results.** The benchmark evaluates 15 ranking models on QK-Video, KuaiRand, and TAAC-25 under AUC and gAUC. Results are reported for multiple feedback tasks, including click, follow, like, share, comment, long view, and conversion. Bold values indicate top-performing results for each task-metric pair.\n\n## Installation\n\n```bash\nconda create -n UniRank python=3.9\nconda activate UniRank\n\npip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\npip install -r requirements.txt\n```\n\n## How to Use\n\n### 1. Download datasets\n\nDownload the preprocessed datasets from Hugging Face and place them under `.\u002Fdata\u002F`:\n\n```text\ndata\u002F\n+-- QK_Video_Action\u002F\n+-- KuaiRand_Video_Action\u002F\n+-- TencentGR_10M_Action_Blocked\u002F\n```\n\nCheck `config\u002Fdataset_config.yaml` if you want to change paths, feature schemas, labels, or blocked-loading settings.\n\n### 2. Run one experiment\n\nSingle GPU:\n\n```bash\npython run_expid.py --config .\u002Fconfig --expid DIN_KuaiRand_Video_Action --gpu 0\n```\n\nMulti-GPU DDP:\n\n```bash\ntorchrun --standalone --nproc_per_node=2 run_expid.py \\\n  --config .\u002Fconfig \\\n  --expid DIN_KuaiRand_Video_Action \\\n  --gpu 0,1\n```\n\nExperiment ids are defined in `config\u002Fmodel_config.yaml` and usually follow:\n\n```text\n\u003CModel>_\u003CDataset>\n```\n\nExamples:\n\n```text\nUltraHSTU_QK_Video_Action\nTokenFormer_KuaiRand_Video_Action\nLONGER_TencentGR_10M_Action\n```\n\n### 3. Run a batch of experiments\n\nEdit `run_all.sh` to uncomment the experiments you want, then run:\n\n```bash\nchmod +x run_all.sh\n.\u002Frun_all.sh\n```\n\nLogs and checkpoints are written to `.\u002Fcheckpoints\u002F` and `.\u002Flogs\u002F` when enabled by the running script\u002Fconfiguration.\n\n### 4. Add a new model\n\n1. Add the model implementation to `model_zoo\u002FYourModel.py`.\n2. Export it in `model_zoo\u002F__init__.py`.\n3. Add an experiment block to `config\u002Fmodel_config.yaml`.\n4. Reuse `UniRank_Dataloader.py` unless the model needs a custom input format.\n5. Run `python run_expid.py --config .\u002Fconfig --expid YourModel_Dataset --gpu 0`.\n\n## Configuration Notes\n\n- `dataset_config.yaml` defines feature columns, label columns, parquet paths, sequence length metadata, and blocked data loading.\n- `model_config.yaml` defines model hyperparameters, batch size, optimizer, task list, metrics, monitor rule, and sequence length.\n- `run_expid.py` initializes feature encoders, builds dataloaders, sets up DDP, constructs the model from `model_zoo`, trains, validates, and optionally evaluates on the test split.\n- `UniRank_Dataloader.py` handles action-aware sequence construction and large blocked parquet loading.\n\n## Acknowledgement\n\nUniRank is built on top of, and deeply inspired by, the excellent [FuxiCTR](https:\u002F\u002Fgithub.com\u002Freczoo\u002FFuxiCTR) project. We sincerely thank the FuxiCTR authors and contributors for their open-source work on reproducible CTR and ranking model research.\n","UniRank是一个基于PyTorch的大型推荐系统排序模型基准测试项目，旨在统一处理异构非序列特征、目标项特征以及用户行为长序列，并支持多反馈目标如点击、关注、点赞等。该项目通过提供标准化的数据集配置、模型实现、分布式训练工具、混合精度支持、大规模数据集的分块加载以及针对长序列模型的稀疏注意力加速等功能，简化了现代统一排序架构之间的比较、复现与扩展工作。UniRank特别适用于工业级推荐系统场景，在这些场景中需要综合考虑不同类型的特征交互和序列建模，并且能够应对多样化的用户反馈任务。此外，它还帮助研究者解决在相同数据分割、序列长度及度量协议下哪种架构表现最佳等问题。",2,"2026-06-11 04:02:20","CREATED_QUERY"]