[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73995":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},73995,"personaplex","NVIDIA\u002Fpersonaplex","NVIDIA","PersonaPlex code.",null,"Python",9981,1395,98,35,0,14,44,166,42,115.43,"MIT License",false,"main",true,[],"2026-06-12 04:01:12","# PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models\n\n[![Weights](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗-Weights-yellow)](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fpersonaplex-7b-v1)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📄-Paper-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06053)\n[![Demo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🎮-Demo-green)](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fadlr\u002Fpersonaplex\u002F)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join-purple?logo=discord)](https:\u002F\u002Fdiscord.gg\u002F5jAXrrbwRb)\n\nPersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning. Trained on a combination of synthetic and real conversations, it produces natural, low-latency spoken interactions with a consistent persona. PersonaPlex is based on the [Moshi](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.00037) architecture and weights.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Farchitecture_diagram.png\" alt=\"PersonaPlex Model Architecture\">\n  \u003Cbr>\n  \u003Cem>PersonaPlex Architecture\u003C\u002Fem>\n\u003C\u002Fp>\n\n## Usage\n\n### Prerequisites\n\nInstall the [Opus audio codec](https:\u002F\u002Fgithub.com\u002Fxiph\u002Fopus) development library:\n```bash\n# Ubuntu\u002FDebian\nsudo apt install libopus-dev\n\n# Fedora\u002FRHEL\nsudo dnf install opus-devel\n```\n\n### Installation\n\nDownload this repository and install with:\n```bash\npip install moshi\u002F.\n```\n\nExtra step for Blackwell based GPUs as suggested in (See https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fpersonaplex\u002Fissues\u002F2):\n```bash\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu130\n```\n\n\n### Accept Model License\nLog in to your Huggingface account and accept the PersonaPlex model license [here](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fpersonaplex-7b-v1). \u003Cbr>\nThen set up your Huggingface authentication:\n```bash\nexport HF_TOKEN=\u003CYOUR_HUGGINGFACE_TOKEN>\n```\n\n### Launch Server\n\nLaunch server for live interaction (temporary SSL certs for https):\n```bash\nSSL_DIR=$(mktemp -d); python -m moshi.server --ssl \"$SSL_DIR\"\n```\n\n**CPU Offload:** If your GPU has insufficient memory, use the `--cpu-offload` flag to offload model layers to CPU. This requires the `accelerate` package (`pip install accelerate`):\n```bash\nSSL_DIR=$(mktemp -d); python -m moshi.server --ssl \"$SSL_DIR\" --cpu-offload\n```\n\nAccess the Web UI from a browser at `localhost:8998` if running locally, otherwise look for the access link printed by the script:\n```\nAccess the Web UI directly at https:\u002F\u002F11.54.401.33:8998\n```\n\n### Offline Evaluation\n\nFor offline evaluation use the offline script that streams in an input wav file and produces an output wav file from the captured output stream. The output file will be the same duration as the input file.\n\nAdd `--cpu-offload` to any command below if your GPU has insufficient memory (requires `accelerate` package). Or install cpu-only PyTorch for offline evaluation on pure CPU.\n\n**Assistant example:**\n```bash\nHF_TOKEN=\u003CTOKEN> \\\npython -m moshi.offline \\\n  --voice-prompt \"NATF2.pt\" \\\n  --input-wav \"assets\u002Ftest\u002Finput_assistant.wav\" \\\n  --seed 42424242 \\\n  --output-wav \"output.wav\" \\\n  --output-text \"output.json\"\n```\n\n**Service example:**\n```bash\nHF_TOKEN=\u003CTOKEN> \\\npython -m moshi.offline \\\n  --voice-prompt \"NATM1.pt\" \\\n  --text-prompt \"$(cat assets\u002Ftest\u002Fprompt_service.txt)\" \\\n  --input-wav \"assets\u002Ftest\u002Finput_service.wav\" \\\n  --seed 42424242 \\\n  --output-wav \"output.wav\" \\\n  --output-text \"output.json\"\n```\n\n## Voices\n\nPersonaPlex supports a wide range of voices; we pre-package embeddings for voices that sound more natural and conversational (NAT) and others that are more varied (VAR). The fixed set of voices are labeled:\n```\nNatural(female): NATF0, NATF1, NATF2, NATF3\nNatural(male):   NATM0, NATM1, NATM2, NATM3\nVariety(female): VARF0, VARF1, VARF2, VARF3, VARF4\nVariety(male):   VARM0, VARM1, VARM2, VARM3, VARM4\n```\n\n## Prompting Guide\n\nThe model is trained on synthetic conversations for a fixed assistant role and varying customer service roles.\n\n### Assistant Role\n\nThe assistant role has the prompt:\n```\nYou are a wise and friendly teacher. Answer questions or provide advice in a clear and engaging way.\n```\n\nUse this prompt for the QA assistant focused \"User Interruption\" evaluation category in [FullDuplexBench](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.04721).\n\n### Customer Service Roles\n\nThe customer service roles support a variety of prompts. Here are some examples for prompting style reference:\n```\nYou work for CitySan Services which is a waste management and your name is Ayelen Lucero. Information: Verify customer name Omar Torres. Current schedule: every other week. Upcoming pickup: April 12th. Compost bin service available for $8\u002Fmonth add-on.\n```\n```\nYou work for Jerusalem Shakshuka which is a restaurant and your name is Owen Foster. Information: There are two shakshuka options: Classic (poached eggs, $9.50) and Spicy (scrambled eggs with jalapenos, $10.25). Sides include warm pita ($2.50) and Israeli salad ($3). No combo offers. Available for drive-through until 9 PM.\n```\n```\nYou work for AeroRentals Pro which is a drone rental company and your name is Tomaz Novak. Information: AeroRentals Pro has the following availability: PhoenixDrone X ($65\u002F4 hours, $110\u002F8 hours), and the premium SpectraDrone 9 ($95\u002F4 hours, $160\u002F8 hours). Deposit required: $150 for standard models, $300 for premium.\n```\n\n### Casual Conversations\n\nThe model is also trained on real conversations from the [Fisher English Corpus](https:\u002F\u002Fcatalog.ldc.upenn.edu\u002FLDC2004T19) with LLM-labeled prompts for open-ended conversations. Here are some example prompts for casual conversations:\n```\nYou enjoy having a good conversation.\n```\n```\nYou enjoy having a good conversation. Have a casual discussion about eating at home versus dining out.\n```\n```\nYou enjoy having a good conversation. Have an empathetic discussion about the meaning of family amid uncertainty.\n```\n```\nYou enjoy having a good conversation. Have a reflective conversation about career changes and feeling of home. You have lived in California for 21 years and consider San Francisco your home. You work as a teacher and have traveled a lot. You dislike meetings.\n```\n```\nYou enjoy having a good conversation. Have a casual conversation about favorite foods and cooking experiences. You are David Green, a former baker now living in Boston. You enjoy cooking diverse international dishes and appreciate many ethnic restaurants.\n```\n\nUse the prompt `You enjoy having a good conversation.` for the \"Pause Handling\", \"Backchannel\" and \"Smooth Turn Taking\" evaluation categories of FullDuplexBench.\n\n## Generalization\n\nPersonaplex finetunes Moshi and benefits from the generalization capabilities of the underlying [Helium](https:\u002F\u002Fkyutai.org\u002Fblog\u002F2025-04-30-helium) LLM. Thanks to the broad training corpus of the backbone, we find that the model will respond plausibly to out-of-distribution prompts and lead to unexpected or fun conversations. We encourage experimentation with different prompts to test the model's emergent ability to handle scenarios outside its training distribution. As an inspiration we feature the following astronaut prompt in the WebUI:\n```\nYou enjoy having a good conversation. Have a technical discussion about fixing a reactor core on a spaceship to Mars. You are an astronaut on a Mars mission. Your name is Alex. You are already dealing with a reactor core meltdown on a Mars mission. Several ship systems are failing, and continued instability will lead to catastrophic failure. You explain what is happening and you urgently ask for help thinking through how to stabilize the reactor.\n```\n\n## License\n\nThe present code is provided under the MIT license. The weights for the models are released under the NVIDIA Open Model license.\n\n## Citation\n\nIf you use PersonaPlex in your research, please cite our paper:\n```bibtex\n@misc{roy2026personaplexvoicerolecontrol,\n      title={PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models}, \n      author={Rajarshi Roy and Jonathan Raiman and Sang-gil Lee and Teodor-Dumitru Ene and Robert Kirby and Sungwon Kim and Jaehyeon Kim and Bryan Catanzaro},\n      year={2026},\n      eprint={2602.06053},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.06053}, \n}\n```\n","PersonaPlex 是一个实时全双工语音对话模型，通过基于文本的角色提示和基于音频的语音调节来控制角色。该项目使用Python开发，其核心功能包括低延迟的自然对话生成与一致的角色表现，基于Moshi架构训练而成。技术特点上，支持角色和声音的灵活定制，并且在合成与真实对话数据集上进行了训练。适用于需要个性化语音交互的应用场景，如虚拟助手、游戏角色配音等。项目采用MIT许可证发布，拥有活跃的社区支持和详尽的文档说明，方便开发者快速上手和二次开发。",2,"2026-06-11 03:48:18","high_star"]