Vision-OPD

VisionOPD

Vision-OPD is a regional-to-global on-policy self-distillation framework that transfers a model's own privileged crop-conditioned perception to its full-image policy, enabling fine-grained visual understanding in a single forward pass without external teachers, labels, or verifiers.

AI 简介

Vision-OPD 是一种面向多模态大语言模型（MLLM）的区域到全局在线策略自蒸馏框架，用于提升模型对图像细粒度内容的理解能力。其核心特点是无需外部教师模型、人工标注或验证器，仅通过模型自身对局部裁剪区域的“特权感知”向全图策略进行知识迁移，在单次前向推理中实现高分辨率细节识别。技术上采用 on-policy self-distillation 机制，支持端到端训练与高效部署（如 vLLM 推理服务）。适用于需要高精度视觉理解的场景，例如医学影像分析、工业质检、遥感图像解析及复杂图文问答任务。

Python

Apache License 2.0

在 GitHub 查看

176

Stars

Forks

Watchers

Issues

Star 增长

今日0

近 7 天0

近 30 天+25

综合评分2.33

默认分支main

Vision-OPD

Star 增长

加入交流群