[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83577":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":37,"readmeContent":38,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":39,"lastSyncTime":40,"discoverSource":41},83577,"CyberVerse","Lynpoint\u002FCyberVerse","Lynpoint","Self hosted, real-time digital human agent platform. Build voice-first AI agents with WebRTC, persona memory, tools, RAG, and optional digital-human video.","https:\u002F\u002Fwww.cyberverse.cc",null,"Python",1149,161,7,6,0,29,87,19.63,"GNU General Public License v3.0",false,"main",true,[25,26,27,28,29,30,31,32,33,34,35,36],"ai-agents","ai-companion","digital-human","digital-life","jarvis-assistant","lip-sync","streaming","talking-avatar","talking-head","voice-agent","voice-assistant","webrtc","2026-06-12 02:04:35","\u003Ch1 align=\"center\">CyberVerse\u003C\u002Fh1>\n\u003Cp align=\"center\">\u003Cem>CyberVerse is an open-source \u003Cstrong>real-time digital-human Agent framework\u003C\u002Fstrong>. It uses WebRTC, persona memory, tools, RAG, and optional digital-human video capabilities to help you build AI agents centered on voice interaction.\u003C\u002Fem>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"README.md\">\u003Cstrong>English\u003C\u002Fstrong>\u003C\u002Fa> · \u003Ca href=\"README.zh-CN.md\">简体中文\u003C\u002Fa> · \u003Ca href=\"README.ja.md\">日本語\u003C\u002Fa> · \u003Ca href=\"README.ko.md\">한국어\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-GPL%20v3-blue.svg\" alt=\"License: GPL v3\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdsd2077\u002FCyberVerse\u002Fpulls\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg\" alt=\"PRs Welcome\"\u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Foosmetrics.com\u002Frepo\u002Fdsd2077\u002FCyberVerse\">\u003Cimg src=\"https:\u002F\u002Fapi.oosmetrics.com\u002Fapi\u002Fv1\u002Fbadge\u002Fachievement\u002F4795438a-70e7-4997-bd8a-93e7a13c8d81.svg\" alt=\"oosmetrics: Top 1 in Streaming by velocity - 2026-05-12\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"docs\u002Fassets\u002Flogo.png\">\u003Cimg src=\"docs\u002Fassets\u002Flogo.png\" alt=\"CyberVerse logo\" width=\"100%\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n### One Photo. A Living Digital Human.\n\n> Ever dreamed of having your own J.A.R.V.I.S. — an AI that truly sees you, hears you, and talks back in real time?\n>\n> Want to see someone you've lost again, hear their voice, watch them smile at you?\n>\n> Or maybe there's a character you've always wished you could bring to life?\n>\n> **Just one photo. CyberVerse makes them alive.**\n\n## What is a Digital-Human Agent?\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"docs\u002Fassets\u002Fdigital-human-agent.jpeg\">\u003Cimg src=\"docs\u002Fassets\u002Fdigital-human-agent.jpeg\" alt=\"CyberVerse digital-human Agent\" width=\"100%\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n## Demo\n\u003Cp align=\"center\">\u003Cem>The following characters are demo examples only. They are not bundled with CyberVerse and are not provided for commercial use.\u003C\u002Fem>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"docs\u002Fassets\u002Fcharacter1.png\">\u003Cimg src=\"docs\u002Fassets\u002Fcharacter1.png\" alt=\"CyberVerse character selection gallery\" width=\"100%\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"docs\u002Fassets\u002Fcharacter2.png\">\u003Cimg src=\"docs\u002Fassets\u002Fcharacter2.png\" alt=\"CyberVerse character gallery examples\" width=\"100%\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n\n| [![](docs\u002Fassets\u002F爱丽丝.mov.png)](https:\u002F\u002Fyoutu.be\u002FLk88sew2x4o) | [![](docs\u002Fassets\u002F丽娜.mov.png)](https:\u002F\u002Fyoutu.be\u002F8jdQ3ThcwgA) |\n|:---:|:---:|\n| [**Alice — watch on YouTube**](https:\u002F\u002Fyoutu.be\u002FLk88sew2x4o) | [**Lina — watch on YouTube**](https:\u002F\u002Fyoutu.be\u002F8jdQ3ThcwgA) |\n\n| [![](docs\u002Fassets\u002F小龙女.mov.png)](https:\u002F\u002Fyoutu.be\u002FWjEHUYZx5Gs) |\n|:---:|\n| [**Xiaolongnü — watch on YouTube**](https:\u002F\u002Fyoutu.be\u002FWjEHUYZx5Gs) |\n\n\u003C\u002Fdiv>\n\n## Features\n\n### Realtime Voice Agent\n\nVoice is CyberVerse's default interaction mode, designed for low-latency realtime conversations that can run for long sessions. Users can continuously talk with an Agent through a microphone, interrupt the model while it is speaking, and mix voice and text input in the same conversation turn.\n\nEach character can have its own voice, welcome message, and personality configuration, and voice cloning is supported. Conversations support pause and resume; when `inference.avatar.enabled` is set to `false`, the platform runs in pure voice mode, publishes only the audio stream, requires no local Avatar GPU, and keeps the core voice experience intact.\n\n### Audio\u002FVideo over WebRTC\n\nThe session pipeline is built on WebRTC and can choose direct P2P (embedded TURN \u002F NAT traversal) or LiveKit SFU mode based on the deployment scenario, balancing low latency with connectivity in complex network environments.\n\nIn standard mode and supported omni sessions, the Agent can also receive user camera frames or screen-sharing frames as visual input, enabling face-to-face interaction that can listen and see instead of being limited to plain text context.\n\n### PersonaAgent + SubAgent Tasks\n\nCyberVerse uses a multi-agent architecture: PersonaAgent stays in the foreground to maintain fluid conversation, respond quickly to interruptions, and handle context switches; long-running work such as search, research, material organization, summarization, and HTML report generation is delegated to background SubAgents asynchronously.\n\nThis keeps complex tasks from slowing down voice turns. Users can keep speaking, ask follow-up questions, or adjust direction, and PersonaAgent can return the SubAgent result once it is ready.\n\n### Character Memory and RAG\n\nEach character's conversation history is persisted to local disk and automatically loaded when you re-enter a conversation, preserving continuity across sessions. You can also import knowledge bases, documents, and biographical material for a character; the system indexes them for retrieval-augmented generation, making answers better aligned with the character's background and persona.\n\n### Optional Digital Human Video\n\nWhen you have GPU resources and want the Agent to be visible, enable avatar inference: a single character reference image can drive realtime facial animation, lip-sync, and cached idle video playback through configurable backends such as FlashHead and LiveAct. If you do not have a GPU or do not need video yet, disable it to return to a pure voice Agent; the same character and persona configuration continues to work.\n\n### Plugin-Based Stack\n\nBrain, voice, hearing, tools, memory, and face are all replaceable modules. You can combine omni models, LLMs, TTS, ASR, embeddings, RAG, tool calls, and Avatar backends in `cyberverse_config.yaml`, then configure different vendors' API keys and service endpoints in the web UI at **`\u002Fsettings`** to switch providers and model combinations by scenario.\n\n## Quick Start\n\n### Prerequisites\n\n- Node 18+\n- Go 1.25 (required: `protoc-gen-go`, `protoc-gen-go-grpc`)\n- Conda\n- Python 3.10+\n- FFmpeg\n- libopus-dev、libopusfile-dev、libsoxr-dev，pkg-config\n\n> For pure voice sessions, no local avatar GPU is required. Runtime cost depends on the realtime voice\u002Fomni\u002FLLM\u002FTTS\u002FASR providers you configure.\n\nTo verify, use:\n\n```bash\nnode --version\ngo version\nprotoc --version\nffmpeg -version\nconda --version\n```\n\n### Step 1: Clone\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdsd2077\u002FCyberVerse.git\ncd CyberVerse\n```\n\n### Step 2: Create Python environment\n\n```bash\nconda create -n cyberverse python=3.10\nconda activate cyberverse\n```\n\n### Step 3: Configure environment variables\n\n```bash\ncp infra\u002F.env.example .env\n```\n\nEdit `.env` and fill in the supported API keys:\n\nAlibaba Cloud Qwen-series models:\n\n```env\nDASHSCOPE_API_KEY=your_dashscope_api_key\n```\n\nOr Volcengine Doubao-series models:\n\n```env\nDOUBAO_ACCESS_TOKEN=your_doubao_access_token\nDOUBAO_APP_ID=your_doubao_app_id\n```\n\nDoubao Voice: follow the [Volcengine quick start](https:\u002F\u002Fwww.volcengine.com\u002Fdocs\u002F6561\u002F2119699?lang=zh) to get **App ID** \u002F **API Key**, then fill in `DOUBAO_APP_ID` \u002F `DOUBAO_ACCESS_TOKEN`.\n\nAfter the stack is running, you can change API keys and service endpoints from the web UI at **`\u002Fsettings`** instead of editing `.env` only.\n\n### Step 4: Create local config and enable voice-only mode\n\n```bash\ncp infra\u002Fcyberverse_config.example.yaml cyberverse_config.yaml\n```\n\nEdit `cyberverse_config.yaml`:\n\n```yaml\ninference:\n  avatar:\n    enabled: false\n```\n\nWith `enabled: false`, CyberVerse runs as a pure voice agent assistant.\n\n\n### Step 5: Install project dependencies\n\n```bash\nmake setup\n```\n\nThis installs the base editable package (`[dev,inference]`), generates gRPC stubs, and installs frontend dependencies.\n\nInstall the voice-agent extras used by the default config:\n\n```bash\n# all optional groups at once\npip install -e \".[all]\"\n```\n\n### Step 6: Start services (3 terminals)\n\n**Terminal 1** — Python inference server:\n\n```bash\nconda activate cyberverse\nmake inference\n```\n\n**Terminal 2** — Go API server:\n\n```bash\nmake server\n```\n\n**Terminal 3** — Frontend:\n\n```bash\nmake frontend\n```\n\n### Step 7: Verify\n\n```bash\n# Check API health\ncurl -s http:\u002F\u002Flocalhost:8080\u002Fapi\u002Fv1\u002Fhealth\n```\n\nOpen http:\u002F\u002Flocalhost:5173 in your browser.\n\n## Optional: Full Digital-Human Video\n\nIf you want to drive realtime Avatar video with FlashHead or LiveAct, follow the steps below.\n\n### Additional Requirements\n\n- GPU with CUDA 12.8+\n- PyTorch 2.8 (CUDA 12.8)\n- FFmpeg with `libvpx` for video encoding\n- Avatar model weights\n\nInstall PyTorch (CUDA 12.8):\n\n```bash\npip3 install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n```\n\nInstall vllm if you use LiveAct:\n\n```bash\npip install vllm==0.11.0\n```\n\n### Download Model Weights\n\nCyberVerse currently supports **FlashHead** and **LiveAct**; download only what you need. More models will continue to be added.\n\n```bash\npip install \"huggingface_hub[cli]\"\n```\n\n#### FlashHead (SoulX-FlashHead)\n\n| Model Component | Description | Link |\n| :--- | :--- | :--- |\n| `SoulX-FlashHead-1_3B` | 1.3B FlashHead weights | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002FSoulX-FlashHead-1_3B), [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FSoul-AILab\u002FSoulX-FlashHead-1_3B) |\n| `wav2vec2-base-960h` | Audio feature extractor | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fwav2vec2-base-960h), [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002Ffacebook\u002Fwav2vec2-base-960h) |\n\n```bash\n# If you are in mainland China, you can use a mirror first:\n# export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n\nhf download Soul-AILab\u002FSoulX-FlashHead-1_3B \\\n  --local-dir .\u002Fcheckpoints\u002FSoulX-FlashHead-1_3B\n\nhf download facebook\u002Fwav2vec2-base-960h \\\n  --local-dir .\u002Fcheckpoints\u002Fwav2vec2-base-960h\n```\n\n#### LiveAct (SoulX-LiveAct)\n\n| ModelName | Download |\n|-----------|----------|\n| SoulX-LiveAct | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002FLiveAct), [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FSoul-AILab\u002FLiveAct) |\n| chinese-wav2vec2-base | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FTencentGameMate\u002Fchinese-wav2vec2-base), [ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FTencentGameMate\u002Fchinese-wav2vec2-base) |\n\n```bash\nhf download Soul-AILab\u002FLiveAct \\\n  --local-dir .\u002Fcheckpoints\u002FLiveAct\n\nhf download TencentGameMate\u002Fchinese-wav2vec2-base \\\n  --local-dir .\u002Fcheckpoints\u002Fchinese-wav2vec2-base\n```\n\n### Configure Avatar Inference\n\nSet `enabled: true`, then update the model paths to match your local checkpoints:\n\n```yaml\ninference:\n  avatar:\n    enabled: true\n    default: \"flash_head\"               # selects which avatar model to start; if set to live_act, fill the live_act section below\n    runtime:\n      cuda_visible_devices: 0      # shared GPU ID(s), e.g. 0,1 for multi-GPU\n      world_size: 1                # shared GPU count, set to 2 for dual-GPU\n    flash_head:\n      checkpoint_dir: \".\u002Fcheckpoints\u002FSoulX-FlashHead-1_3B\"  # ← your path\n      wav2vec_dir: \".\u002Fcheckpoints\u002Fwav2vec2-base-960h\"        # ← your path\n      model_type: \"lite\"           # \"pro\" for higher quality (needs more GPU)\n      compile_model: true\n      compile_vae: true\n      dist_worker_main_thread: true\n      infer_params:\n        frame_num: 33\n        motion_frames_latent_num: 2\n        tgt_fps: 20\n        sample_rate: 16000\n        sample_shift: 5\n        color_correction_strength: 1.0\n        cached_audio_duration: 8\n        num_heads: 12\n        height: 512\n        width: 512\n    live_act:\n      ckpt_dir: \".\u002Fcheckpoints\u002FLiveAct\"                     # ← your path\n      wav2vec_dir: \".\u002Fcheckpoints\u002Fchinese-wav2vec2-base\"   # ← your path\n      seed: 42\n      fp8_gemm: true\n      fp4_gemm: false\n      compile_wan_model: false\n      compile_vae_decode: false\n      dist_worker_main_thread: true\n      default_prompt: \"一个人在说话\"\n      infer_params:\n        size: \"320*480\"\n        fps: 20\n        audio_cfg: 1.0\n```\n\nYou can also adjust these options later in the web UI.\n\n### LiveAct FP4 GEMM (Optional)\n\nFP4 acceleration requires building and installing `lightx2v_kernel` from [LightX2V](https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V). Use PyTorch **2.7+** and a CUTLASS checkout on the build machine.\n\n#### Preparation\n\n```bash\npip install scikit_build_core uv\n```\n\n#### Build wheel\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fcutlass.git\ngit clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V.git\ncd LightX2V\u002Flightx2v_kernel\n# Replace \u002Fpath\u002Fto\u002Fcutlass with the absolute path to your cutlass clone.\nMAX_JOBS=$(nproc) && CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \\\nuv build --wheel \\\n    -Cbuild-dir=build . \\\n    -Ccmake.define.CUTLASS_PATH=\u002Fpath\u002Fto\u002Fcutlass \\\n    --verbose \\\n    --color=always \\\n    --no-build-isolation\n```\n\n#### Install wheel\n\n```bash\npip install dist\u002F*.whl --force-reinstall --no-deps\n```\n\n#### Enable in CyberVerse\n\nIn `cyberverse_config.yaml` (or the web UI), under `inference.avatar.live_act`:\n\n```yaml\nfp8_gemm: false\nfp4_gemm: true\n```\n\nRestart the inference service after changing these flags.\n\n### SageAttention & FlashAttention (Optional)\n\n```bash\n# SageAttention (source build)\ngit clone https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention.git\ncd SageAttention\nexport EXT_PARALLEL=4 NVCC_APPEND_FLAGS=\"--threads 8\" MAX_JOBS=32 # Optional\npython setup.py install\n```\n\n```bash\n# FlashAttention (optional)\nwget -O flash_attn-2.8.1+cu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl \\\n  \"https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.8.1\u002Fflash_attn-2.8.1%2Bcu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\"\n\npip install flash_attn-2.8.1+cu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl\n```\n\n### Avatar Hardware Benchmarks\n\nRealtime digital-human video requires GPU acceleration. Below are benchmarks for FlashHead and LiveAct avatar models:\n\n| Model | Quality | GPU | Count | Resolution | FPS | Real-time? |\n|-------|---------|-----|-------|------------|-----|------------|\n| FlashHead 1.3B | Pro | RTX 5090 | 2 | 512×512 | 25+ | ✅ Yes |\n| FlashHead 1.3B | Pro | RTX 5090 | 1 | 464x464 | 20 | ✅ Yes |\n| FlashHead 1.3B | Pro | RTX PRO 6000 | 1 | 512×512 | 20 | ✅ Yes |\n| FlashHead 1.3B | Pro | RTX 4090 | 1 | 512×512 | ~10.8 | ❌ No |\n| FlashHead 1.3B | Lite | RTX 4090 | 1 | 512×512 | 25+ | ✅ Yes |\n| LiveAct 18B | — | RTX PRO 6000 | 2 | 320×480 | 20 | ✅ Yes |\n| LiveAct 18B | — | RTX PRO 6000 | 1 | 256×417 | 20 | ✅ Yes |\n\n> **Pro** favors visual quality; **Lite** favors speed. The table reflects typical **quality–compute** balances — more GPU headroom lets you push higher quality; tighter hardware calls for lower settings (resolution, **Pro** vs **Lite**, etc.) to stay realtime.\n\nWhen avatar inference is enabled, `make inference` reads `inference.avatar.default` from `cyberverse_config.yaml` and initializes exactly that one avatar model in the current inference process. Wait until you see:\n\n- `Active avatar model initialized: \u003Cmodel_name>`\n- `CyberVerse Inference Server started on port 50051`\n\n## QA — Self-Check\n\nUse this section when avatar video **stutters, freezes, or falls behind** audio. The first step is to confirm whether inference can keep up with playback.\n\n### Check RTP from inference logs\n\n**RTP** (real-time performance factor) compares how long a chunk took to generate versus how long that chunk lasts at the configured FPS:\n\n```text\nRTP = elapsed \u002F (frames \u002F fps)\n```\n\n| RTP | Meaning |\n|-----|---------|\n| **&lt; 1** | Inference is faster than playback — headroom for realtime streaming |\n| **= 1** | Exactly realtime |\n| **&gt; 1** | Inference is slower than playback — production cannot keep up with consumption; video will lag or stutter |\n\nWatch the inference terminal (`make inference`) while the character is speaking. Look for **LiveAct** or **FlashHead** chunk lines.\n\n**LiveAct example (RTP &gt; 1 — cannot keep realtime):**\n\n```text\nINFO:inference.plugins.avatar.live_act_plugin:LiveAct chunk: idx=2 frames=32 320x480 fps=20 iter=2 elapsed=1.870s is_final=False\n```\n\n- Playback duration: `32 \u002F 20 = 1.6` s  \n- RTP: `1.870 \u002F 1.6 ≈ 1.17` (**&gt; 1** → too slow for 320×480 @ 20 fps on this GPU)\n\n**FlashHead** logs use the same idea (`elapsed` vs `num_frames` \u002F `fps`):\n\n```text\nINFO:...FlashHead video chunk generated: chunk_index=1 num_frames=33 512x512 fps=20 ... elapsed=2.100s\n```\n\nHere RTP = `2.100 \u002F (33\u002F20) ≈ 1.27` — also above realtime.\n\n### What to do when RTP &gt; 1\n\n1. **Lower resolution or quality** — e.g. LiveAct `infer_params.size`, FlashHead `height` \u002F `width`, or FlashHead `model_type: \"lite\"` instead of `\"pro\"`.\n2. **Add compute** — more GPUs (`runtime.world_size`, `cuda_visible_devices`), enable FP8\u002FFP4 GEMM or compile options where supported, or use a faster GPU.\n3. **Match the benchmark table** — pick a resolution\u002FFPS\u002FGPU row marked **Yes** under **Real-time?** in [Avatar Hardware Benchmarks](#avatar-hardware-benchmarks) above.\n\nPure voice mode (`inference.avatar.enabled: false`) does not use avatar RTP; stutter there is usually network\u002FWebRTC or upstream voice latency — see [Remote Access Notes](#remote-access-notes).\n\n## Remote Access Notes\n\nWhen `streaming_mode: direct` uses the embedded TURN server, the browser must be able to reach the server's `8443\u002FTCP`. If the page loads but audio\u002Fvideo never connects, or the server logs show `ICE connection state: failed` or `publish timeout waiting for connection`, first check whether your machine can reach port `8443` on the server:\n\n```bash\nnc -vz \u003Cserver-ip> 8443\n```\n\nIf `8443` is not reachable, the usual cause is a cloud security group, firewall, or NAT restriction. In that case, you can forward your local `8443` to the server through an SSH tunnel:\n\n```bash\nssh -L 8443:127.0.0.1:8443 user@host -p port\n```\n\nAfter the tunnel is established, the browser will access the remote TURN service through local `127.0.0.1:8443`.\n\nIf you want the browser to connect to the remote server directly instead of through an SSH tunnel, set `pipeline.ice_public_ip` in `cyberverse_config.yaml` to the server's public IP or domain. If you are using an SSH tunnel, you can keep the default value (`127.0.0.1`).\n\n## Roadmap\n\n### 1. **Realtime Audio\u002FVideo Agent Platform**\n\nMake voice-first realtime agents easy to run, customize, and embed.\n\n- [x] Character CRUD with multiple reference images, active image, fixed\u002Frandom display mode, optional face crop, tags, voice fields, personality, welcome message, and system prompt\n- [x] Realtime voice sessions over WebRTC — direct P2P (embedded TURN) or LiveKit SFU\n- [x] Pure voice sessions with `inference.avatar.enabled: false`\n- [x] Pluggable modules (omni model, LLM, TTS, ASR, embedding, RAG, avatar); configure different vendors' API keys via YAML and UI settings\n- [x] Session management: per-character chat history persisted to disk and loaded when a conversation starts\n- [x] Voice cloning: supports Doubao voice cloning\n- [x] Hybrid input: supports both voice and text in the same conversation\n- [x] Voice interruption while the model is speaking, plus session pause and resume\n- [x] User camera input and screen-sharing visual frames in standard mode and supported omni sessions\n- [x] PersonaAgent and background SubAgent task execution\n- [x] Import knowledge, documents, and biographical material for character-grounded RAG Q&A\n- [ ] Embeddable for developers (Web component or SDK) to integrate self-hosted instances into their own sites\n- [ ] Live streaming: audio\u002Fvideo output for broadcast-style use cases\n\n### 2. **Realtime Digital-Human Calls**\n\nWhen Avatar GPU resources are available, turn the voice Agent into a realtime video call.\n\n- [x] Realtime avatar video driven from reference images via configurable avatar plugins (e.g. FlashHead, LiveAct)\n- [x] Cached idle video playback for character presence\n- [x] Audio\u002Fvideo synchronization for realtime speaking segments\n- [ ] More avatar backends with different quality\u002Flatency\u002Fcost tradeoffs\n- [ ] Better avatar deployment profiles for consumer GPU, workstation GPU, and cloud GPU environments\n\n### 3. **Agent Network**\n\nConnect multiple agents so they can communicate, collaborate, and form networks.\n\n- [ ] Enable agent-to-agent communication\n- [ ] Enable multi-agent collaboration and delegation\n- [ ] Enable shared memory and shared knowledge between agents\n- [ ] Build an open network of connected agents\n\n## Community\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"docs\u002Fassets\u002Fwechat_group.jpg\">\u003Cimg src=\"docs\u002Fassets\u002Fwechat_group.jpg\" alt=\"CyberVerse WeChat group QR code\" width=\"320\"\u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">If the QR code has expired, add the maintainer on WeChat: \u003Cstrong>wx_dsd2077\u003C\u002Fstrong>. Please note \u003Cstrong>CyberVerse\u003C\u002Fstrong> in your friend request; we will invite you to the group.\u003C\u002Fp>\n\n## Star History\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fstar-history.com\u002F#dsd2077\u002FCyberVerse&Date\">\n    \u003Cimg src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=dsd2077\u002FCyberVerse&type=Date\" alt=\"Star History Chart\" width=\"100%\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n## License\n\nGNU General Public License v3.0 — see [LICENSE](LICENSE).\n\n## Acknowledgements\n\n- [SoulX-FlashHead](https:\u002F\u002Fgithub.com\u002FSoul-AILab\u002FSoulX-FlashHead) — Avatar model by Soul AI Lab\n\n- [SoulX-LiveAct](https:\u002F\u002Fgithub.com\u002FSoul-AILab\u002FSoulX-LiveAct) - Avatar model by Soul AI Lab\n- [MuseTalk](https:\u002F\u002Fgithub.com\u002FTMElyralab\u002FMuseTalk) — Real-time lip-sync model by TME Lyra Lab\n- [Pion](https:\u002F\u002Fgithub.com\u002Fpion\u002Fwebrtc) — Go WebRTC implementation\n- [Linux.do](https:\u002F\u002Flinux.do\u002F)\n",2,"2026-06-11 04:11:24","high_star"]