[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2667":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":37,"readmeContent":38,"aiSummary":39,"trendingCount":16,"starSnapshotCount":16,"syncStatus":40,"lastSyncTime":41,"discoverSource":42},2667,"NeMo","NVIDIA-NeMo\u002FNeMo","NVIDIA-NeMo","A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)","https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fspeech\u002Fnightly\u002Findex.html",null,"Python",17348,3432,232,82,0,10,48,153,45,120,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36],"asr","deeplearning","generative-ai","machine-translation","neural-networks","speaker-diariazation","speaker-recognition","speech-synthesis","speech-translation","tts","2026-06-12 04:00:15","[![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](http:\u002F\u002Fwww.repostatus.org\u002Fbadges\u002Flatest\u002Factive.svg)](http:\u002F\u002Fwww.repostatus.org\u002F#active)\n[![Documentation](https:\u002F\u002Freadthedocs.com\u002Fprojects\u002Fnvidia-nemo\u002Fbadge\u002F?version=main)](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fspeech\u002Fnightly\u002F)\n[![CodeQL](https:\u002F\u002Fgithub.com\u002Fnvidia\u002Fnemo\u002Factions\u002Fworkflows\u002Fcodeql.yml\u002Fbadge.svg?branch=main&event=push)](https:\u002F\u002Fgithub.com\u002Fnvidia\u002Fnemo\u002Factions\u002Fworkflows\u002Fcodeql.yml)\n[![NeMo core license and license for collections in this repo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-brightgreen.svg)](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNeMo\u002Fblob\u002Fmaster\u002FLICENSE)\n[![Release version](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fnemo-toolkit.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fnemo-toolkit)\n[![Python version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Fnemo-toolkit.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Fnemo-toolkit)\n[![PyPi total downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fpersonalized-badge\u002Fnemo-toolkit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fnemo-toolkit)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n\n# **NVIDIA NeMo Speech**\nCheckout our [HuggingFace🤗 collection](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fnvidia\u002Fnemotron-speech) for the latest open\nweight checkpoints and demos!\n\n## Updates\n\n> The first release of NeMo Speech after NeMo repository split is scheduled for June 2026, as the repo undergoes transformation.\n> For the latest stable released version, please use [the 26.02 NGC container](https:\u002F\u002Fcatalog.ngc.nvidia.com\u002Forgs\u002Fnvidia\u002Fcontainers\u002Fnemo?version=26.02).\n\n- 2026-04: [Parakeet-unified-en-0.6b](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fparakeet-unified-en-0.6b) has been released with high-quality offline and streaming (with a minimum latency of 160ms) inference in one model for English language with punctuation and capitalization support. \n- 2026-03: [Nemotron 3 VoiceChat](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fnemotron-voicechat\u002Fmodelcard) is now released in Early Access. Built on the Nemotron Nano v2 LLM backbone with Nemotron speech and TTS decoder, VoiceChat delivers full-duplex, natural, interruptible conversations with low latency. Try out [the demo](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fnemotron-voicechat) and apply for [early access](https:\u002F\u002Fdeveloper.nvidia.com\u002Fnemotron-voicechat-early-access).\n- 2026-03: [Nemotron-Speech-Streaming v2603](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fnemotron-speech-streaming-en-0.6b) has been\n    updated. It has been trained on a larger and more diverse corpus, resulting in lower WER across all latency modes.\n    Try out [the demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fnvidia\u002Fnemotron-speech-streaming-en-0.6b) and check out\n    [the NIM](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fnemotron-asr-streaming).\n- 2026-03: [MagpieTTS v2602](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fmagpie_tts_multilingual_357m) has been released with support\n    for 9 languages(En, Es, De, Fr, Vi, It, Zh, Hi, Ja). Try out\n    [the demo](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fmagpie_tts_multilingual_357m) and check out\n    [the NIM](https:\u002F\u002Fbuild.nvidia.com\u002Fnvidia\u002Fmagpie-tts-multilingual).\n- 2026-01: Nemotron-Speech-Streaming was released: One checkpoint that enables users to pick their optimal point\n    on the latency-accuracy Pareto curve!\n- 2026-01: MagpieTTS was released.\n- 2026: This repo has pivoted to focus on audio, speech, and multimodal LLM. For the last NeMo release with support for more\n    modalities, see [v2.7.0](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FNeMo\u002Freleases\u002Ftag\u002Fv2.7.0)\n- 2025-08: [Parakeet V3](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fparakeet-tdt-0.6b-v3) and\n    [Canary V2](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fcanary-1b-v2) have been released with speech recognition and translation\n    support for 25 European languages.\n- 2025-06: [Canary-Qwen-2.5B](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fcanary-qwen-2.5b) has been released with record-setting\n    5.63% WER on English Open ASR Leaderboard.\n\n## Introduction\n\nNVIDIA NeMo Speech is built for researchers and PyTorch developers working on Speech models including Automatic Speech\nRecognition (ASR), Text to Speech (TTS), and Speech LLMs. It is designed to help you efficiently create, customize, and\ndeploy new AI models by leveraging existing code and pre-trained model checkpoints.\n\nFor technical documentation, please see the\n[NeMo Framework User Guide](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fspeech\u002Fnightly\u002F).\n\n## Requirements\n\n- Python 3.12 or above\n- Pytorch 2.6 or above\n- NVIDIA GPU (if you intend to do model training)\n\nAs of [Pytorch 2.6](https:\u002F\u002Fdocs.pytorch.org\u002Fdocs\u002Fstable\u002Fnotes\u002Fserialization.html#torch-load-with-weights-only-true),\n`torch.load` defaults to using `weights_only=True`. Some model checkpoints may require using `weights_only=False`.\nIn this case, you can set the env var `TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1` before running code that uses `torch.load`.\nHowever, this should only be done with trusted files. Loading files from untrusted sources with more than weights only\ncan have the risk of arbitrary code execution.\n\n## Developer Documentation\n\n| Version | Status                                                                                                                                                              | Description                                                                                                                    |\n| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |\n| Latest  | [![Documentation Status](https:\u002F\u002Freadthedocs.com\u002Fprojects\u002Fnvidia-nemo\u002Fbadge\u002F?version=main)](https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fnemo\u002Fuser-guide\u002Fdocs\u002Fen\u002Fmain\u002F)     | [Documentation of the latest (i.e. main) branch.](https:\u002F\u002Fdocs.nvidia.com\u002Fnemo\u002Fspeech\u002Fnightly\u002F)          |\n| Stable  | [![Documentation Status](https:\u002F\u002Freadthedocs.com\u002Fprojects\u002Fnvidia-nemo\u002Fbadge\u002F?version=stable)](https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Fnemo\u002Fuser-guide\u002Fdocs\u002Fen\u002Fstable\u002F) | Documentation of the stable (i.e. most recent release) - To be added |\n\n## Install NeMo Speech\n\nNeMo Speech is installable via pip: `pip install 'nemo-toolkit[all]'`\nTo install with extra dependencies for CUDA 12.x or 13.x, use `pip install 'nemo-toolkit[all,cu12]'`\nor `pip install 'nemo-toolkit[all,cu13]'` respectively.\n\n## Contribute to NeMo\n\nWe welcome community contributions! Please refer to\n[CONTRIBUTING.md](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FNeMo\u002Fblob\u002Fmain\u002FCONTRIBUTING.md) for the process.\n\n## Licenses\n\nNeMo is licensed under the [Apache License 2.0](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FNeMo?tab=Apache-2.0-1-ov-file).\n","NVIDIA NeMo是一个为研究者和开发者设计的可扩展生成式AI框架，专注于大型语言模型、多模态以及语音AI（包括自动语音识别和文本到语音）。该项目使用Python编写，支持多种深度学习任务如机器翻译、神经网络构建等，并具备强大的语音合成与识别能力。NeMo特别适用于需要高性能语音处理或自然语言处理的应用场景，比如开发智能助手、语音聊天机器人及多语言服务系统。此外，NeMo还提供了丰富的预训练模型和易于使用的API接口，帮助用户快速搭建原型并进行定制化开发。",2,"2026-06-11 02:50:41","top_language"]