[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80743":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":11,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":12,"stars7d":12,"stars30d":12,"stars90d":12,"forks30d":12,"starsTrendScore":12,"compositeScore":12,"rankGlobal":9,"rankLanguage":9,"license":13,"archived":14,"fork":14,"defaultBranch":15,"hasWiki":14,"hasPages":14,"topics":16,"createdAt":9,"pushedAt":9,"updatedAt":17,"readmeContent":18,"aiSummary":19,"trendingCount":12,"starSnapshotCount":12,"syncStatus":20,"lastSyncTime":21,"discoverSource":22},80743,"CLIF-Co-Orchestrating-LLM-Inference-Serving-and-Fine-tuning.","hsy23\u002FCLIF-Co-Orchestrating-LLM-Inference-Serving-and-Fine-tuning.","hsy23","CLIF: Continuous Learning and Inference Framework for PEFT serving",null,"Python",41,0,"MIT License",false,"main",[],"2026-06-12 02:04:06","# CLIF\n\nCLIF is a system for running PEFT fine-tuning alongside online LLM inference. It keeps inference replicas serving requests, detects when there is enough serving slack, and uses that slack to run federated adapter updates.\n\n## What CLIF Provides\n\n- A stateful replica pool with `SERVING`, `IDLE`, and `COMBINED` modes.\n- A proactive dispatcher that routes requests and maintains inference batch bounds.\n- A fine-tuning launcher that checks service pressure before admitting replicas into PEFT rounds.\n- A coordinator that sets training and inference batch sizes for replicas running in `COMBINED`.\n- A dual-adapter replica path that trains a shadow adapter while serving from the active adapter.\n- Structured metrics for requests, serving batches, training rounds, replica states, and GPU usage.\n\n## System Overview\n\n![CLIF system overview](docs\u002Fimages\u002Fclif_overview.png)\n\nAt runtime, requests enter the dispatcher and are assigned to replicas. The launcher observes replica states and service pressure, then decides whether a new PEFT round can run. Replicas admitted to `COMBINED` continue serving while local adapter updates are executed.\n\n![CLIF inference-training coordinator](docs\u002Fimages\u002Fit_coordinator.png)\n\nThe coordinator is used only for replicas in `COMBINED`. It uses recent runtime metrics to keep inference batch sizes within safe bounds while allocating a training batch size for local PEFT updates.\n\n## Repository Layout\n\n- `run.py`: CLI entry point.\n- `main.py`: runtime assembly and metric export.\n- `core\u002Fdispatcher.py`: CLIF request dispatcher.\n- `core\u002Ffl_launcher.py`: PEFT round admission, execution, and aggregation.\n- `core\u002Fcoordinator.py`: combined-state batch-size coordinator.\n- `core\u002Freplica.py`: replica and dual-adapter execution logic.\n- `common\u002F`: data loading, training utilities, state tracking, request generation, and monitoring.\n- `scripts\u002Frun_smoke_fixed.sh`: small fixed-load smoke test.\n\n## Installation\n\n```bash\nconda create -n clif python=3.12 -y\nconda activate clif\npip install -r requirements.txt\n```\n\nUse a CUDA-enabled PyTorch build that matches your machine. Model access and GPU memory requirements depend on the model configured at runtime.\n\n## Environment\n\nCLIF is intended for GPU server and research-cluster environments rather than CPU-only execution. The public artifact assumes:\n\n- NVIDIA GPUs with a CUDA-compatible PyTorch installation.\n- Enough GPU memory to host one or more LLM replicas and LoRA adapters.\n- Explicit multi-GPU mapping through `--replica_gpus`, for example `[[0],[1],[2],[3]]` for one replica per GPU or `[[0,1],[2,3]]` for model-parallel replicas.\n- A shared filesystem or pre-synchronized model and dataset cache when running on multiple nodes.\n\n## Quick Smoke Test\n\nThe smoke test is a lightweight wiring check for the public artifact. It uses a fixed request stream and a small public model setting. It is not intended to reproduce paper-scale experiments.\n\n```bash\nbash scripts\u002Frun_smoke_fixed.sh\n```\n\nBefore running, adjust `MODEL_NAME` and `REPLICA_GPUS` if needed:\n\n```bash\nMODEL_NAME=meta-llama\u002FLlama-3.2-1B REPLICA_GPUS='[[0],[1]]' bash scripts\u002Frun_smoke_fixed.sh\n```\n\n## Output Files\n\nRuns write metrics under `output\u002F`:\n\n- `serve_metrics.xlsx`: served-batch latency, success, token, and quality metrics.\n- `train_metrics.xlsx`: local PEFT round metrics.\n- `train_step_metrics.xlsx`: per-update training metrics.\n- `dispatch_metrics.xlsx`: dispatcher queueing and routing records.\n- `request_gen_metrics.xlsx`: generated request records.\n- `state_metrics.xlsx`: replica state transitions.\n- `fl_round_metrics.xlsx`: federated round summaries.\n- `gpu_monitor.xlsx`: GPU utilization and memory metrics when available.\n- `summary.xlsx`: aggregate run summary.\n","CLIF是一个用于同时运行PEFT微调和在线大语言模型推理的系统。它通过维护一个具有`SERVING`、`IDLE`和`COMBINED`模式的状态化副本池，利用空闲资源进行联邦适配器更新，确保在不影响服务的情况下执行微调任务。该系统包括主动调度器、微调启动器、协调器以及双适配器路径等组件，支持结构化的性能度量。适用于需要持续学习与推理并行处理的GPU服务器或研究集群环境，特别是在对模型进行实时优化的同时保持高效的服务响应场景下表现出色。",2,"2026-06-11 04:01:51","CREATED_QUERY"]