[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80164":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":29,"discoverSource":30},80164,"PRISM-VL","kepengxu\u002FPRISM-VL","kepengxu","PRISM-VL studies measurement-grounded VLM learning with RAW-derived Meas.-XYZ inputs, camera-conditioned grounding, and exposure-bracketed supervision transfer.",null,"Python",423,15,5,2,0,17,127,249,87,3.61,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:58","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fprsimvl_logo.svg\" alt=\"PRSIMVL logo\" width=\"156\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">PRSIMVL\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Allegory of the Cave: Measurement-Grounded Vision-Language Learning\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fkepengxu.github.io\u002F\">Kepeng Xu\u003C\u002Fa> · Li Xu · Gang He · Wenxin Yu\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fkepengxu.github.io\u002Fprojects\u002Fprism-vl\u002F\">Project Page\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.11727\">arXiv:2605.11727\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fopenreview.net\u002Fforum?id=fsCtGojL2R\">Synthetic RAW precursor\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-Bench-V1\">MeasL-Bench-V1\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-150K-V1\">MeasL-150K-V1\u003C\u002Fa> ·\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fkepeng\u002FPRSIMVL-LoRA-V1\">Weights\u003C\u002Fa> ·\n  \u003Ca href=\"README_CN.md\">中文\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg alt=\"Method\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FMethod-PRSIMVL-111827\">\n  \u003Cimg alt=\"Input\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FInput-Meas.--XYZ-0a7ea4\">\n  \u003Cimg alt=\"Benchmark\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBenchmark-MeasL--Bench--V1-2ea44f\">\n  \u003Cimg alt=\"Training\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTraining-MeasL--150K--V1-f0883e\">\n  \u003Cimg alt=\"Backbone\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBase-Qwen3--VL-1f6feb\">\n  \u003Cimg alt=\"Adapters\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLoRA-2B%20%7C%204B%20%7C%208B-8957e5\">\n\u003C\u002Fp>\n\n**PRSIMVL** is a research release for asking a simple but under-tested question: when the RGB image has already lost sensor evidence, can a vision-language model reason better from measurement-domain observations?\n\nPRSIMVL keeps the familiar Qwen3-VL training and inference workflow, but changes the visual interface from post-ISP RGB to RAW-derived **Meas.-XYZ** plus camera metadata. The release includes the benchmark, training corpus, evaluation pipeline, service demo, and LoRA checkpoints needed to reproduce the core findings.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fprsimvl_framework.png\" alt=\"PRSIMVL framework\" width=\"92%\">\n\u003C\u002Fp>\n\n## The 30-Second Version\n\n| What | Release |\n|---|---|\n| Core idea | Use RAW-derived Meas.-XYZ and capture metadata when RGB rendering clips, denoises, tone maps, or quantizes away evidence. |\n| Benchmark | [MeasL-Bench-V1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-Bench-V1), 2,183 held-out matched examples over 14 measurement-sensitive capability slices. |\n| Training data | [MeasL-150K-V1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-150K-V1), 152,517 instruction-tuning examples with 48,000 release images. |\n| Model family | Qwen3-VL 2B, 4B, and 8B with released PRSIMVL LoRA adapters hosted on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fkepeng\u002FPRSIMVL-LoRA-V1). |\n| Headline result | PRSIMVL-8B improves over RGB Qwen3-VL-8B by **+0.1074 BLEU**, **+0.1071 ROUGE-L**, and **+4.46 LLM-Judge points** on MeasL-Bench. |\n\n## Start Here\n\n| Goal | Entry point | What you get |\n|---|---|---|\n| Ask one image question | [`inference\u002FREADME.md`](inference\u002FREADME.md) | Start `swift deploy`, send a local image with `ask_service.py`, inspect the answer. |\n| Run the benchmark | [`eval\u002FREADME.md`](eval\u002FREADME.md) | Evaluate Meas.-XYZ or matched RGB with the packaged wrapper. |\n| Inspect benchmark data | [`eval_data\u002FREADME.md`](eval_data\u002FREADME.md) | Dataset card, taxonomy, schema, HF loading snippet, path rules. |\n| Inspect training data | [`training_data\u002FREADME.md`](training_data\u002FREADME.md) | Dataset card, release contents, quality checks, registry aliases. |\n| Understand release scope | [`RELEASE_MANIFEST.md`](RELEASE_MANIFEST.md) | What is included, pruned, and expected as large artifacts. |\n\n## Quick Start\n\nInstall the release snapshot:\n\n```bash\ngit clone \u003Crepo-url> PRSIMVL\ncd PRSIMVL\nbash install_editable.sh\n```\n\nRun a dry-run check without launching model inference:\n\n```bash\nMODEL_SIZE=2b CUDA_VISIBLE_DEVICES=0 bash eval\u002Frun_infer_and_eval.sh --dry-run\n```\n\nRun one PRSIMVL adapter on the default Meas.-XYZ benchmark split:\n\n```bash\nMODEL_SIZE=2b CUDA_VISIBLE_DEVICES=0 bash eval\u002Frun_infer_and_eval.sh\n```\n\nLarge artifacts are expected at these release-local paths:\n\n```text\neval_data\u002F       # MeasL-Bench-V1 JSONL + image\u002F\ntraining_data\u002F   # MeasL-150K-V1 JSONL + image\u002F\nexps\u002F            # released LoRA adapters\n```\n\nDownload the public data from Hugging Face: [MeasL-Bench-V1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-Bench-V1) for evaluation and [MeasL-150K-V1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-150K-V1) for training. Released LoRA weights are hosted at [kepeng\u002FPRSIMVL-LoRA-V1](https:\u002F\u002Fhuggingface.co\u002Fkepeng\u002FPRSIMVL-LoRA-V1); restore them under `exps\u002F` before running adapter inference or full evaluation.\n\n## Why Measurement Grounding Matters\n\nRGB is a display-oriented product of an image signal processor. It is useful, compact, and familiar, but it may remove the evidence that a downstream model needs. PRSIMVL treats the camera measurement as a first-class observation: Meas.-XYZ preserves a linear, three-channel view derived from RAW measurements, and metadata supplies capture context such as ISO, exposure time, and aperture.\n\nThe examples below show low-illumination text cases where RGB rendering exposes misleading evidence while Meas.-XYZ keeps the answer region recoverable.\n\n\n| Case | RGB Observation | Meas.-XYZ Observation |\n|---|---|---|\n| Illuminated shop name | \u003Cimg src=\"assets\u002Fexample_ler_shop_rgb.png\" width=\"260\">\u003Cbr>RGB answer: **Hua Tian Hua** (wrong) | \u003Cimg src=\"assets\u002Fexample_ler_shop_measxyz.png\" width=\"260\">\u003Cbr>PRSIMVL answer: **正美口腔** |\n| Yellow sign text | \u003Cimg src=\"assets\u002Fexample_ler_sign_rgb.png\" width=\"260\">\u003Cbr>RGB answer: **diamond** (wrong) | \u003Cimg src=\"assets\u002Fexample_ler_sign_measxyz.png\" width=\"260\">\u003Cbr>PRSIMVL answer: **BLACK** |\n\nZoomed evidence crops:\n\n| RGB crop | Meas.-XYZ crop | RGB crop | Meas.-XYZ crop |\n|---|---|---|---|\n| \u003Cimg src=\"assets\u002Fexample_ler_shop_crop_rgb.png\" width=\"180\"> | \u003Cimg src=\"assets\u002Fexample_ler_shop_crop_measxyz.png\" width=\"180\"> | \u003Cimg src=\"assets\u002Fexample_ler_sign_crop_rgb.png\" width=\"180\"> | \u003Cimg src=\"assets\u002Fexample_ler_sign_crop_measxyz.png\" width=\"180\"> |\n\n## Earlier Synthetic-RAW Version\n\nThis release builds on our earlier synthetic-RAW prototype, [**End-to-End RAW Synergy for Elevated Vision-Language Reasoning**](https:\u002F\u002Fopenreview.net\u002Fforum?id=fsCtGojL2R), which introduced Raw-VLM with a learnable ISP frontend and RAW-tokenization for VLM reasoning. That first version used synthetic RAW data to study whether RAW sensor information can improve captioning, VQA, and hallucination behavior.\n\nPRSIMVL extends that direction toward a release centered on measurement-grounded inputs: RAW-derived **Meas.-XYZ**, camera metadata grounding, MeasL-Bench-V1, MeasL-150K-V1, and released Qwen3-VL LoRA adapters.\n\n## Main Results\n\nThe table reports the held-out MeasL-Bench protocol. BLEU and ROUGE-L are lexical metrics; LLM-Judge is reported as accuracy percentage.\n\n| Model | Visual Input | BLEU | ROUGE-L | LLM-Judge |\n|---|---|---:|---:|---:|\n| Qwen3-VL-2B | RGB | 0.3407 | 0.3171 | 69.54 |\n| Qwen3-VL-4B | RGB | 0.4442 | 0.3453 | 77.37 |\n| Qwen3-VL-8B | RGB | 0.5046 | 0.3500 | 78.20 |\n| **PRSIMVL-2B** | **Meas.-XYZ + metadata** | **0.5865** | **0.4244** | **77.99** |\n| **PRSIMVL-4B** | **Meas.-XYZ + metadata** | **0.6021** | **0.4465** | **80.83** |\n| **PRSIMVL-8B** | **Meas.-XYZ + metadata** | **0.6120** | **0.4571** | **82.66** |\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fprsimvl_radar.png\" alt=\"Capability radar\" width=\"82%\">\n\u003C\u002Fp>\n\n### Where It Helps Most\n\n| Capability | RGB Qwen3-VL-8B BLEU \u002F ROUGE-L | PRSIMVL-2B BLEU \u002F ROUGE-L |\n|---|---:|---:|\n| HDR Evidence Recovery (HER) | 0.5343 \u002F 0.3614 | **0.6066 \u002F 0.4533** |\n| Low-Illumination Evidence Recovery (LER) | 0.3470 \u002F 0.2851 | **0.5174 \u002F 0.4249** |\n| Scene Text Recognition (STR) | 0.3719 \u002F 0.3604 | **0.5084 \u002F 0.4669** |\n| General Visual Grounding (GVG) | 0.5109 \u002F 0.3644 | **0.6117 \u002F 0.4505** |\n| Agent and Entity Identification (AEI) | 0.5304 \u002F 0.4332 | **0.6210 \u002F 0.5307** |\n| Binary Visual Verification (BVV) | 0.5367 \u002F 0.3580 | **0.6186 \u002F 0.3732** |\n\n## What Is In This Release\n\n| Artifact | Location | Notes |\n|---|---|---|\n| Benchmark | [`eval_data\u002F`](eval_data\u002F) and [HF](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-Bench-V1) | Matched Meas.-XYZ\u002FRGB JSONL files and 3,812 images. |\n| Training data | [`training_data\u002F`](training_data\u002F) and [HF](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-150K-V1) | 152,517 instruction-tuning examples and 48,000 images. |\n| Demo inference | [`inference\u002F`](inference\u002F) | OpenAI-compatible `swift deploy` service demo for local images. |\n| Evaluation wrapper | [`eval\u002F`](eval\u002F) | Reproducible MeasL-Bench inference and offline evaluation entrypoint. |\n| Training configs | [`configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002F`](configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002F) | Launch scripts and SFT configs for 2B, 4B, and 8B. |\n| Released adapters | [`exps\u002F`](exps\u002F) and [HF](https:\u002F\u002Fhuggingface.co\u002Fkepeng\u002FPRSIMVL-LoRA-V1) | LoRA checkpoints for Qwen3-VL 2B, 4B, and 8B. |\n\n## Benchmark And Data\n\n### MeasL-Bench-V1\n\n[MeasL-Bench-V1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-Bench-V1) is the held-out benchmark for measurement-grounded language-vision evaluation.\n\n| File | Rows \u002F Files | Purpose |\n|---|---:|---|\n| `eval_data\u002Ftest-raw-measl-bench.jsonl` | 2,183 rows | Main Meas.-XYZ benchmark. |\n| `eval_data\u002Ftest-rgb-measl-bench.jsonl` | 2,183 rows | Matched RGB benchmark. |\n| `eval_data\u002Fimage\u002F` | 3,812 files | Local image assets referenced by both JSONL files. |\n\nAll JSONL image paths use the release-local form `eval_data\u002Fimage\u002F...`. When reading directly from the Hugging Face dataset repository root, remove the leading `eval_data\u002F` prefix.\n\n\u003Cdetails>\n\u003Csummary>Capability taxonomy\u003C\u002Fsummary>\n\n| Label | Capability | Count |\n|---|---|---:|\n| CAG | Chromatic Attribute Grounding | 150 |\n| NG | Numerosity Grounding | 150 |\n| DSG | Descriptive Scene Grounding | 150 |\n| HER | HDR Evidence Recovery | 150 |\n| LER | Low-Illumination Evidence Recovery | 233 |\n| STR | Scene Text Recognition | 150 |\n| GVG | General Visual Grounding | 150 |\n| CVR | Compositional Visual Reasoning | 150 |\n| SRU | Spatial Relation Understanding | 150 |\n| MSQ | Manner and State Queries | 150 |\n| EAQ | Entity and Attribute Queries | 150 |\n| DS | Discriminative Selection | 150 |\n| AEI | Agent and Entity Identification | 150 |\n| BVV | Binary Visual Verification | 150 |\n\n\u003C\u002Fdetails>\n\n### MeasL-150K-V1\n\n[MeasL-150K-V1](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fkepeng\u002FMeasL-150K-V1) is the released instruction-tuning corpus.\n\n| File | Rows \u002F Files | Purpose |\n|---|---:|---|\n| `training_data\u002Ftrain-measl-150k-v1.jsonl` | 152,517 rows | Final instruction-tuning set. |\n| `training_data\u002Fimage\u002F` | 48,000 files | Release image subset referenced by the JSONL file. |\n\nThe corpus was built from approximately 700K auto-annotated candidates, filtered to 518,433 post-scoring records, balanced by source and question structure, and decontaminated against MeasL-Bench before release.\n\n## Demo Inference\n\nStart an OpenAI-compatible service with a released adapter:\n\n```bash\nconda activate msswiftv1_service\nCUDA_VISIBLE_DEVICES=0 swift deploy \\\n  --model Qwen\u002FQwen3-VL-2B-Instruct \\\n  --adapters exps\u002FBANALCED_150K_META_VIT_PROXY\u002Foutput-Qwen3-VL-2B-Instruct\u002Fv8-20260421-133546\u002Fcheckpoint-95000\n```\n\nAsk a question from another terminal:\n\n```bash\nconda activate msswiftv1_service\npython inference\u002Fask_service.py \\\n  --image inference\u002Fdemo_data\u002Fimages\u002Fdemo1_pole_color.png \\\n  --question \"This is a linear Image with Metadata: ISO: 250, Exposure Time: 1\u002F640, Aperture: f\u002F9. What is the color of the vertical pole visible through the windshield?\"\n```\n\nSee [`inference\u002FREADME.md`](inference\u002FREADME.md) for demo images, request options, and troubleshooting.\n\n## Evaluation\n\nRun the default Meas.-XYZ benchmark:\n\n```bash\nMODEL_SIZE=4b CUDA_VISIBLE_DEVICES=0 bash eval\u002Frun_infer_and_eval.sh\n```\n\nRun the matched RGB benchmark:\n\n```bash\nDATASET=rgb MODEL_SIZE=4b CUDA_VISIBLE_DEVICES=0 bash eval\u002Frun_infer_and_eval.sh\n```\n\nEnable LLM-as-judge through an OpenAI-compatible endpoint:\n\n```bash\nexport JUDGE_API_KEY=YOUR_KEY\nJUDGE_URL=https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1 \\\nJUDGE_MODEL=openai\u002Fgpt-5 \\\nMODEL_SIZE=2b CUDA_VISIBLE_DEVICES=0 \\\nbash eval\u002Frun_infer_and_eval.sh\n```\n\nThe evaluation entrypoint defaults to `eval_data\u002Ftest-raw-measl-bench.jsonl` or `eval_data\u002Ftest-rgb-measl-bench.jsonl`. Use `DATASET_FILE` and `IMAGE_ROOT` only for external datasets or non-standard image locations. Full options are documented in [`eval\u002FREADME.md`](eval\u002FREADME.md).\n\n## Training\n\nFinal training configs are under [`configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002F`](configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002F).\n\n```bash\nbash configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002Ftrain_prsimvl_2b.sh\nbash configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002Ftrain_prsimvl_4b.sh\nbash configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002Ftrain_prsimvl_8b.sh\n```\n\nThe corresponding config files are:\n\n- `sft_qwen3_vl_2b_prsimvl_v1.yaml`\n- `sft_qwen3_vl_4b_prsimvl_v1.yaml`\n- `sft_qwen3_vl_8b_prsimvl_v1.yaml`\n\n## Released Weights\n\nReleased PRSIMVL LoRA weights are hosted on Hugging Face: [kepeng\u002FPRSIMVL-LoRA-V1](https:\u002F\u002Fhuggingface.co\u002Fkepeng\u002FPRSIMVL-LoRA-V1). The local release expects the same checkpoint layout under `exps\u002FBANALCED_150K_META_VIT_PROXY\u002F`.\n\n| Size | Base Model | Local LoRA Checkpoint |\n|---|---|---|\n| 2B | `Qwen\u002FQwen3-VL-2B-Instruct` | `exps\u002FBANALCED_150K_META_VIT_PROXY\u002Foutput-Qwen3-VL-2B-Instruct\u002Fv8-20260421-133546\u002Fcheckpoint-95000` |\n| 4B | `Qwen\u002FQwen3-VL-4B-Instruct` | `exps\u002FBANALCED_150K_META_VIT_PROXY\u002Foutput-Qwen3-VL-4B-Instruct\u002Fv12-20260425-113029\u002Fcheckpoint-85000` |\n| 8B | `Qwen\u002FQwen3-VL-8B-Instruct` | `exps\u002FBANALCED_150K_META_VIT_PROXY\u002Foutput-Qwen3-VL-8B-Instruct\u002Fv2-20260423-205317\u002Fcheckpoint-95000` |\n\n## Repository Layout\n\n```text\nPRSIMVL\u002F\n├── assets\u002F                       README figures and qualitative examples\n├── eval\u002F                         Benchmark inference and evaluation entrypoint\n├── inference\u002F                    Service-based VQA demo\n├── eval_data\u002F                    MeasL-Bench-V1 artifact folder\n├── training_data\u002F                MeasL-150K-V1 artifact folder\n├── exps\u002F                         Released LoRA adapters\n├── configs\u002Fqwen3_vl_150k_llmmeta_vit_proxy\u002F\n│   └── PRSIMVL v1 training configs and launch scripts\n├── scripts\u002Ftest_raw_eval_pipeline_opt\u002F\n│   └── shared inference\u002Fevaluation implementation\n└── swift\u002F, libs\u002F                 Training and inference code snapshot\n```\n\n## Release Notes\n\n- Dataset license: CC BY-NC 4.0 for non-commercial research and education; citation is required.\n- Evaluation outputs are written under `eval\u002Foutput_benchmark\u002F` by default.\n- Generated scratch outputs, conversion utilities, and local environment checks are pruned from this public release snapshot.\n- Contribution and release hygiene notes are available in [`CONTRIBUTING.md`](CONTRIBUTING.md), [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md), and [`RELEASE_CHECKLIST.md`](RELEASE_CHECKLIST.md).\n\n## Citation\n\nMain paper: [Allegory of the Cave: Measurement-Grounded Vision-Language Learning](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.11727)  \nEarlier synthetic-RAW version: [End-to-End RAW Synergy for Elevated Vision-Language Reasoning](https:\u002F\u002Fopenreview.net\u002Fforum?id=fsCtGojL2R)  \nProject page: \u003Chttps:\u002F\u002Fkepengxu.github.io\u002Fprojects\u002Fprism-vl\u002F>  \nAuthor homepage: \u003Chttps:\u002F\u002Fkepengxu.github.io\u002F>\n\n```bibtex\n@misc{xu2026allegory,\n  title         = {Allegory of the Cave: Measurement-Grounded Vision-Language Learning},\n  author        = {Xu, Kepeng and Xu, Li and He, Gang and Yu, Wenxin},\n  year          = {2026},\n  eprint        = {2605.11727},\n  archivePrefix = {arXiv},\n  url           = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.11727}\n}\n\n@inproceedings{xu2025rawvlm,\n  title     = {End-to-End RAW Synergy for Elevated Vision-Language Reasoning},\n  author    = {Xu, Kepeng and Qiao, Tong and Liu, Zhenyang and Xu, Li and He, Gang},\n  booktitle = {IJCAI 2025 Workshop on Multimodal Knowledge and Language Modeling (MKLM)},\n  year      = {2025},\n  url       = {https:\u002F\u002Fopenreview.net\u002Fforum?id=fsCtGojL2R}\n}\n```\n","PRISM-VL 是一个研究项目，旨在探索基于测量域数据（如RAW图像衍生的Meas.-XYZ输入）的视觉语言模型学习。该项目的核心功能包括使用相机条件定位和曝光包围监督转移来改进模型推理能力。技术上，它基于Qwen3-VL框架，并引入了新的视觉接口，从传统的ISP后RGB图像转换为更原始的测量域数据加上相机元数据。适合于需要更高精度视觉理解的应用场景，尤其是在RGB图像处理过程中丢失了传感器证据的情况下。此外，项目还提供了基准测试集、训练语料库、评估流水线及LoRA权重等资源，便于复现研究成果。","2026-06-11 03:59:30","CREATED_QUERY"]