[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79974":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":13,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":14,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":15,"fork":15,"defaultBranch":16,"hasWiki":15,"hasPages":15,"topics":17,"createdAt":8,"pushedAt":8,"updatedAt":18,"readmeContent":19,"aiSummary":20,"trendingCount":13,"starSnapshotCount":13,"syncStatus":21,"lastSyncTime":22,"discoverSource":23},79974,"Odyseus-Spatial-VLM","MercuriusTech\u002FOdyseus-Spatial-VLM","MercuriusTech",null,"Python",74,4,75,0,2.1,false,"main",[],"2026-06-12 02:03:56","# Odyseus Spatial VLM\n\n![LLM vs Spatial-VLM](https:\u002F\u002Fgithub.com\u002FMercuriusTech\u002FOdyseus-Spatial-VLM\u002Fblob\u002Fmain\u002Fmedia\u002FSpatialVLM-demo2-low-res.gif)\n\nI've been recently fascinated by the possibilites provided by recent advancements in monocular depth estimation models and decided to expeirment combining them with a capable VLM, so below is an example demo to get 3D outputs from a VLM that can be more useful for a physical AI agent.\n\n\nQuick Live Demo 👉 [app.odyseus.xyz](https:\u002F\u002Fapp.odyseus.xyz)\n\nOr follow the setup on this repo for custom deployment\n\n## Setup\n\nThis repo is currently set up primarily for Linux.\n\nIf you clone this as a git repo, prefer pulling the external DA3 dependency as a submodule:\n\n```bash\ngit clone --recurse-submodules https:\u002F\u002Fgithub.com\u002FMercuriusTech\u002FOdyseus-Spatial-VLM.git\ncd spatial-vlm\n```\n\nIf you already cloned without submodules:\n\n```bash\ngit submodule update --init --recursive\n```\n\nIf you are packaging this repo yourself, `Depth-Anything-3\u002F` is intended to track the upstream project as a submodule.\n\nSet up the VLM environment:\n\n```bash\n.\u002Fsetup-vlm.sh\n```\n\nSet up the depth demo environment:\n\n```bash\n.\u002Fsetup.sh\n```\n\n## Run\n\nStart the VLM server:\n\n```bash\n.\u002Frun-vlm.sh\n```\n\nStart the depth demo:\n\n```bash\n.\u002Frun.sh\n```\n\nThen open:\n\n```text\nhttp:\u002F\u002Flocalhost:8080\n```\n\n## Hosted Demo\n\n\nThe local repo remains the reference implementation for running and modifying the demo yourself.\n\n## Use\n\n1. Upload an image.\n2. Enter a prompt like `select the chair near the desk and the closest door`.\n3. Click `Run Demo`.\n4. Inspect:\n   - the 2D target overlay\n   - the 3D point cloud\n   - labeled 3D targets\n   - the camera frustum and guide vectors\n\n## Flow\n\n```mermaid\nflowchart LR\n    A[User Prompt + Image] --> B[VLM]\n    B --> C[2D Target Coordinates]\n    A --> D[DA3 Metric Depth]\n    C --> E[Depth Sampling]\n    D --> E\n    E --> F[3D Projection]\n    F --> G[Three.js Viewer]\n```\n\n## Notes\n\n- Linux is the best-supported path right now.\n- PowerShell \u002F Windows setup help is welcome. Contributions for improving `setup-vlm.ps1` or adding fuller Windows support are encouraged.\n","Odyseus Spatial VLM 是一个结合了单目深度估计模型和视觉语言模型（VLM）的项目，旨在生成对物理AI代理更有用的3D输出。其核心功能包括通过用户提供的图像和自然语言指令来识别并标注出特定物体的位置，并将其转换为3D点云数据，从而实现更直观的空间理解。技术上，该项目依赖于Python开发环境，且目前主要支持Linux系统下的部署与运行。它非常适合需要将平面图像信息转化为三维空间数据的应用场景，如机器人导航、增强现实等。",2,"2026-06-11 03:58:46","CREATED_QUERY"]