[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-77859":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":23,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":28,"discoverSource":29},77859,"Mega-ASR","xzf-thu\u002FMega-ASR","xzf-thu","First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come back to MEGA-ASR, after the rest fail in the wild. ⭐**",null,"Python",976,63,16,20,0,2,29,951,12,9.42,false,"main",true,[],"2026-06-12 02:03:45","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Ffigures\u002Fmega_asr_logo.png\" alt=\"Mega-ASR Logo\" width=\"15%\">\n\u003C\u002Fp>\n\n\n\u003Ch1 align=\"center\">Mega-ASR: Towards In-the-Wild^2 Speech Recognition via Scaling Up Real-world Acoustic Simulation\u003C\u002Fh1>\n\nWe introduce **MEGA-ASR**, the first foundation ASR model to target **full-scenario robust speech recognition in the wild** through systematic training on **7 atomic acoustic conditions** and **54 compound acoustic scenarios**. Built on **2.6M training samples** covering **noise, far-field speech, obstruction, echo and reverberation, recording artifacts, electronic distortion, and transmission dropout**, MEGA-ASR uses **A2S-SFT** and **DG-WGPO based RL** to achieve **up to nearly 30% gains** over leading open and closed source SOTA models in challenging acoustic environments. If you like us, please give us a star✨.\n\n\u003Cp align=\"center\">\u003Cu>\u003Cem>You’ll come back to Mega-ASR — after finding the rest fail in the real world.\u003C\u002Fem>\u003C\u002Fu>\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.19833\">Technical Report 📖\u003C\u002Fa> \u002F\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fzhifeixie\u002FVoices-in-the-Wild-2M\">Voices-in-the-wild-2M 🤗\u003C\u002Fa> \u002F\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fzhifeixie\u002FMega-ASR\">Mega-ASR Weights 🤗\u003C\u002Fa> \u002F\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fxzf-thu\u002FVoices-in-the-Wild-Bench\">Voices-in-the-Wild-Bench 🏆\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fxzf-thu\u002FMega-ASR\u002Fraw\u002Fmain\u002Fassets\u002Fwechat.jpg\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-Join%20Group-07C160?logo=wechat&logoColor=white\" alt=\"WeChat\">\u003C\u002Fa>&nbsp;\u003Ca href=\"https:\u002F\u002Fxzf-thu.github.io\u002FMega-ASR\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue\" alt=\"Project Page\">\u003C\u002Fa>&nbsp;\u003Ca href=\"https:\u002F\u002Fx.com\u002FXieZhifei14110\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-@XieZhifei14110-black?logo=x&logoColor=white\" alt=\"X\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"\u002Fdocs\u002Fassets\u002Fdataset.png\" alt=\"Mega-ASR Logo\" width=\"100%\">\n\u003C\u002Fp>\n\n### Comparison with SOTA open-source and closed-source models.\n\n#### Sample 1\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fprivate-user-images.githubusercontent.com\u002F201621992\u002F594835233-2d847f22-a6d4-4d84-9bec-79a39001f9ca.mp4?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzkyMDU0NDYsIm5iZiI6MTc3OTIwNTE0NiwicGF0aCI6Ii8yMDE2MjE5OTIvNTk0ODM1MjMzLTJkODQ3ZjIyLWE2ZDQtNGQ4NC05YmVjLTc5YTM5MDAxZjljYS5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwNTE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDUxOVQxNTM5MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mODgyYWRlZGI3OThjZWZmNzg1ZDhmNDRiNDMxZjYzZmE0Njk5OWJjYWJkZTVhZmM0OTM0OTI4MWI3ZmEzMGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZyZXNwb25zZS1jb250ZW50LXR5cGU9dmlkZW8lMkZtcDQifQ.qJS-ALDMknvRYFY73hGYmJ-WLzwtC4LRHJnHXlkpyyU\" controls width=\"300\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth valign=\"top\">Ground Truth\u003C\u002Fth>\n    \u003Cth valign=\"top\">Mega-ASR (Ours)\u003C\u002Fth>\n    \u003Cth valign=\"top\">Qwen3-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Gemini-3-Pro\u003C\u002Fth>\n    \u003Cth valign=\"top\">Seed-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Whisper\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\">...and said to him let us go and eat some honey. Whose honey? inquired Kobay cautiously. My father's, Soongoora replied. Oh, all right, I'm with you, said the tortoise eagerly, and away they went.\u003Cbr>\u003Cbr>\u003Cstrong>Reference\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">He\u003C\u002Fspan> said to him \u003Cspan style=\"color:#ef4444\">let's\u003C\u002Fspan> go and eat some honey. \u003Cspan style=\"color:#ef4444\">It's\u003C\u002Fspan> honey? inquired \u003Cspan style=\"color:#ef4444\">very\u003C\u002Fspan> cautiously. My father \u003Cspan style=\"color:#ef4444\">is Superabundant\u003C\u002Fspan> — oh, all right, \u003Cspan style=\"color:#ef4444\">I will\u003C\u002Fspan>, said \u003Cspan style=\"color:#ef4444\">to her\u003C\u002Fspan> eagerly, and away they went.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">47.1\u003C\u002Fspan> ✅\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">&lt;empty&gt;\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">100.0\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">But tell me, that's how she met\u003C\u002Fspan> my father\u003Cspan style=\"color:#ef4444\">'s sister\u003C\u002Fspan>. Oh, all right. \u003Cspan style=\"color:#ef4444\">I wish... I really...\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">86.1\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">My father \u003Cspan style=\"color:#ef4444\">is\u003C\u002Fspan>. Oh, all right, \u003Cspan style=\"color:#ef4444\">I wish you can\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">85.3\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">...to him... some honey... \u003Cspan style=\"color:#ef4444\">oh yeah\u003C\u002Fspan>...\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">92.5\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>More examples (Sample 2 – 6)\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n\u003Cbr>\n\n#### Sample 2\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fprivate-user-images.githubusercontent.com\u002F201621992\u002F594835233-2d847f22-a6d4-4d84-9bec-79a39001f9ca.mp4?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzkyMDU0NDYsIm5iZiI6MTc3OTIwNTE0NiwicGF0aCI6Ii8yMDE2MjE5OTIvNTk0ODM1MjMzLTJkODQ3ZjIyLWE2ZDQtNGQ4NC05YmVjLTc5YTM5MDAxZjljYS5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwNTE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDUxOVQxNTM5MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mODgyYWRlZGI3OThjZWZmNzg1ZDhmNDRiNDMxZjYzZmE0Njk5OWJjYWJkZTVhZmM0OTM0OTI4MWI3ZmEzMGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZyZXNwb25zZS1jb250ZW50LXR5cGU9dmlkZW8lMkZtcDQifQ.qJS-ALDMknvRYFY73hGYmJ-WLzwtC4LRHJnHXlkpyyU\" controls width=\"300\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth valign=\"top\">Ground Truth\u003C\u002Fth>\n    \u003Cth valign=\"top\">Mega-ASR (Ours)\u003C\u002Fth>\n    \u003Cth valign=\"top\">Qwen3-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Gemini-3-Pro\u003C\u002Fth>\n    \u003Cth valign=\"top\">Seed-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Whisper\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\">To waste, I skip forty years, said the baker in tears, and proceed without further remark to the day when you took me aboard your ship to help you in hunting the snark.\u003Cbr>\u003Cbr>\u003Cstrong>Reference\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">To witness,\u003C\u002Fspan> I skip forty years, said the baker in tears, and proceed without further remark to the day when you took me aboard \u003Cspan style=\"color:#ef4444\">of\u003C\u002Fspan> your ship to help you in hunting the snark.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">5.9\u003C\u002Fspan> ✅\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">I skipped 40 years. Second day in here. Ever since you left, I've been a monk...\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">64.7\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">I spent forty years at sea and never seen a rougher than\u003C\u002Fspan> the day \u003Cspan style=\"color:#ef4444\">that\u003C\u002Fspan> you took me aboard your ship...\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">64.7\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">To wait.\u003C\u002Fspan> I skip forty years. \u003Cspan style=\"color:#ef4444\">Saturday and years.\u003C\u002Fspan> And proceed without further remark...\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">38.2\u003C\u002Fspan> 🟡\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">I skip forty years... to the day you took me \u003Cspan style=\"color:#ef4444\">on a ship\u003C\u002Fspan>... to hunt the \u003Cspan style=\"color:#ef4444\">shark\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">71.5\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n#### Sample 3\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fprivate-user-images.githubusercontent.com\u002F201621992\u002F594835233-2d847f22-a6d4-4d84-9bec-79a39001f9ca.mp4?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzkyMDU0NDYsIm5iZiI6MTc3OTIwNTE0NiwicGF0aCI6Ii8yMDE2MjE5OTIvNTk0ODM1MjMzLTJkODQ3ZjIyLWE2ZDQtNGQ4NC05YmVjLTc5YTM5MDAxZjljYS5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwNTE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDUxOVQxNTM5MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mODgyYWRlZGI3OThjZWZmNzg1ZDhmNDRiNDMxZjYzZmE0Njk5OWJjYWJkZTVhZmM0OTM0OTI4MWI3ZmEzMGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZyZXNwb25zZS1jb250ZW50LXR5cGU9dmlkZW8lMkZtcDQifQ.qJS-ALDMknvRYFY73hGYmJ-WLzwtC4LRHJnHXlkpyyU\" controls width=\"300\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth valign=\"top\">Ground Truth\u003C\u002Fth>\n    \u003Cth valign=\"top\">Mega-ASR (Ours)\u003C\u002Fth>\n    \u003Cth valign=\"top\">Qwen3-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Gemini-3-Pro\u003C\u002Fth>\n    \u003Cth valign=\"top\">Seed-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Whisper\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\">The friendly gang left the drug store.\u003Cbr>\u003Cbr>\u003Cstrong>Reference\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#22c55e\">The friendly gang left the drug store.\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">8.0\u003C\u002Fspan> ✅\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">It's a\u003C\u002Fspan> friendly gang. \u003Cspan style=\"color:#ef4444\">That's the drug gang.\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">57.1\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">Friendly\u003C\u002Fspan> gang left the \u003Cspan style=\"color:#ef4444\">drugs\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">42.9\u003C\u002Fspan> 🟡\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">The friendly gang left the \u003Cspan style=\"color:#ef4444\">drugstore\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">28.6\u003C\u002Fspan> 🟢\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">A\u003C\u002Fspan> friendly \u003Cspan style=\"color:#ef4444\">young man\u003C\u002Fspan> left the drug store.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">62.3\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n#### Sample 4\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fprivate-user-images.githubusercontent.com\u002F201621992\u002F594835233-2d847f22-a6d4-4d84-9bec-79a39001f9ca.mp4?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzkyMDU0NDYsIm5iZiI6MTc3OTIwNTE0NiwicGF0aCI6Ii8yMDE2MjE5OTIvNTk0ODM1MjMzLTJkODQ3ZjIyLWE2ZDQtNGQ4NC05YmVjLTc5YTM5MDAxZjljYS5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwNTE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDUxOVQxNTM5MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mODgyYWRlZGI3OThjZWZmNzg1ZDhmNDRiNDMxZjYzZmE0Njk5OWJjYWJkZTVhZmM0OTM0OTI4MWI3ZmEzMGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZyZXNwb25zZS1jb250ZW50LXR5cGU9dmlkZW8lMkZtcDQifQ.qJS-ALDMknvRYFY73hGYmJ-WLzwtC4LRHJnHXlkpyyU\" controls width=\"300\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth valign=\"top\">Ground Truth\u003C\u002Fth>\n    \u003Cth valign=\"top\">Mega-ASR (Ours)\u003C\u002Fth>\n    \u003Cth valign=\"top\">Qwen3-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Gemini-3-Pro\u003C\u002Fth>\n    \u003Cth valign=\"top\">Seed-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Whisper\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\">The set of china hit the floor with a crash.\u003Cbr>\u003Cbr>\u003Cstrong>Reference\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#22c55e\">The set of china hit the floor with a crash.\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">8.0\u003C\u002Fspan> ✅\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">The \u003Cspan style=\"color:#ef4444\">bed is fine. It\u003C\u002Fspan> hit the floor with a crash.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">40.0\u003C\u002Fspan> 🟡\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">He said it's fine I\u003C\u002Fspan> hit the \u003Cspan style=\"color:#ef4444\">forward slash\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">100.0\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">The \u003Cspan style=\"color:#ef4444\">sound\u003C\u002Fspan> of china \u003Cspan style=\"color:#ef4444\">hits\u003C\u002Fspan> the floor with a crash.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">20.0\u003C\u002Fspan> 🟢\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">The \u003Cspan style=\"color:#ef4444\">chef\u003C\u002Fspan> of \u003Cspan style=\"color:#ef4444\">China\u003C\u002Fspan> hit the floor with a \u003Cspan style=\"color:#ef4444\">clash\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">55.0\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n#### Sample 5\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fprivate-user-images.githubusercontent.com\u002F201621992\u002F594835233-2d847f22-a6d4-4d84-9bec-79a39001f9ca.mp4?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzkyMDU0NDYsIm5iZiI6MTc3OTIwNTE0NiwicGF0aCI6Ii8yMDE2MjE5OTIvNTk0ODM1MjMzLTJkODQ3ZjIyLWE2ZDQtNGQ4NC05YmVjLTc5YTM5MDAxZjljYS5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwNTE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDUxOVQxNTM5MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mODgyYWRlZGI3OThjZWZmNzg1ZDhmNDRiNDMxZjYzZmE0Njk5OWJjYWJkZTVhZmM0OTM0OTI4MWI3ZmEzMGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZyZXNwb25zZS1jb250ZW50LXR5cGU9dmlkZW8lMkZtcDQifQ.qJS-ALDMknvRYFY73hGYmJ-WLzwtC4LRHJnHXlkpyyU\" controls width=\"300\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth valign=\"top\">Ground Truth\u003C\u002Fth>\n    \u003Cth valign=\"top\">Mega-ASR (Ours)\u003C\u002Fth>\n    \u003Cth valign=\"top\">Qwen3-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Gemini-3-Pro\u003C\u002Fth>\n    \u003Cth valign=\"top\">Seed-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Whisper\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\">Among export-led electrical and computer makers, Japan Victor Company fell fifty to two thousand three hundred twenty.\u003Cbr>\u003Cbr>\u003Cstrong>Reference\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Among export-led \u003Cspan style=\"color:#ef4444\">(missing: electrical and)\u003C\u002Fspan> computer makers, Japan Victor Company fell fifty to two thousand three hundred twenty.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">11.1\u003C\u002Fspan> ✅\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Among export-led \u003Cspan style=\"color:#ef4444\">(missing: electrical and)\u003C\u002Fspan> computer makers, Japan \u003Cspan style=\"color:#ef4444\">VictorNet sold fifty-two thousand three hundred fifty\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">38.9\u003C\u002Fspan> 🟡\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Among export-led \u003Cspan style=\"color:#ef4444\">(missing: electrical and)\u003C\u002Fspan> computer makers, Japan Victor \u003Cspan style=\"color:#ef4444\">Co.\u003C\u002Fspan> fell \u003Cspan style=\"color:#ef4444\">50\u003C\u002Fspan> to \u003Cspan style=\"color:#ef4444\">2,350 yen\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">35.7\u003C\u002Fspan> 🟡\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Among export-led \u003Cspan style=\"color:#ef4444\">in\u003C\u002Fspan> computer makers, Japan Victor Company \u003Cspan style=\"color:#ef4444\">sell 50 to 2300 unit\u003C\u002Fspan>.\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">50.0\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Among \u003Cspan style=\"color:#ef4444\">exporters,\u003C\u002Fspan> computer makers \u003Cspan style=\"color:#ef4444\">in Japan victor companies sold\u003C\u002Fspan> fifty...\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">66.7\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n#### Sample 6\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fprivate-user-images.githubusercontent.com\u002F201621992\u002F594835233-2d847f22-a6d4-4d84-9bec-79a39001f9ca.mp4?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzkyMDU0NDYsIm5iZiI6MTc3OTIwNTE0NiwicGF0aCI6Ii8yMDE2MjE5OTIvNTk0ODM1MjMzLTJkODQ3ZjIyLWE2ZDQtNGQ4NC05YmVjLTc5YTM5MDAxZjljYS5tcDQ_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwNTE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDUxOVQxNTM5MDZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mODgyYWRlZGI3OThjZWZmNzg1ZDhmNDRiNDMxZjYzZmE0Njk5OWJjYWJkZTVhZmM0OTM0OTI4MWI3ZmEzMGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZyZXNwb25zZS1jb250ZW50LXR5cGU9dmlkZW8lMkZtcDQifQ.qJS-ALDMknvRYFY73hGYmJ-WLzwtC4LRHJnHXlkpyyU\" controls width=\"300\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\u003Ctable>\n  \u003Ctr>\n    \u003Cth valign=\"top\">Ground Truth\u003C\u002Fth>\n    \u003Cth valign=\"top\">Mega-ASR (Ours)\u003C\u002Fth>\n    \u003Cth valign=\"top\">Qwen3-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Gemini-3-Pro\u003C\u002Fth>\n    \u003Cth valign=\"top\">Seed-ASR\u003C\u002Fth>\n    \u003Cth valign=\"top\">Whisper\u003C\u002Fth>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd valign=\"top\">Has exposure really been reduced?\u003Cbr>\u003Cbr>\u003Cstrong>Reference\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#22c55e\">Has exposure really been reduced\u003C\u002Fspan>\u003Cspan style=\"color:#ef4444\">.\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#22c55e\">8.0\u003C\u002Fspan> ✅\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Has exposure really \u003Cspan style=\"color:#ef4444\">done you?\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">40.0\u003C\u002Fspan> 🟡\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Has \u003Cspan style=\"color:#ef4444\">the closure\u003C\u002Fspan> really \u003Cspan style=\"color:#ef4444\">affected you?\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">80.0\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">Has exposure \u003Cspan style=\"color:#ef4444\">to beauty products.\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">60.0\u003C\u002Fspan> 🟠\u003C\u002Fstrong>\u003C\u002Ftd>\n    \u003Ctd valign=\"top\">\u003Cspan style=\"color:#ef4444\">Have those who\u003C\u002Fspan> really \u003Cspan style=\"color:#ef4444\">been refused?\u003C\u002Fspan>\u003Cbr>\u003Cbr>\u003Cstrong>WER: \u003Cspan style=\"color:#ef4444\">78.5\u003C\u002Fspan> 🔴\u003C\u002Fstrong>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003C\u002Fdetails>\n\n\u003C!-- \n### Comparson with SOTA open-source and closed-source models.\n\n| Audio | Ground Truth | Mega-ASR (Ours) | Qwen3-ASR | Gemini-3-Pro | Seed-ASR | Whisper |\n||||||||\n| \u003Cvideo src=\"assets\u002Fcase_study\u002Fempty_output_recovery.mp4\" controls width=\"240\">\u003C\u002Fvideo> | ...and said to him let us go and eat some honey. Whose honey? inquired Kobay cautiously. My father's, Soongoora replied. Oh, all right, I'm with you, said the tortoise eagerly, and away they went.\u003Cbr>\u003Cbr>*Reference* | \u003Cspan style=\"color:#ef4444\">He\u003C\u002Fspan> said to him \u003Cspan style=\"color:#ef4444\">let's\u003C\u002Fspan> go and eat some honey. \u003Cspan style=\"color:#ef4444\">It's\u003C\u002Fspan> honey? inquired \u003Cspan style=\"color:#ef4444\">very\u003C\u002Fspan> cautiously. My father \u003Cspan style=\"color:#ef4444\">is Superabundant\u003C\u002Fspan> — oh, all right, \u003Cspan style=\"color:#ef4444\">I will\u003C\u002Fspan>, said \u003Cspan style=\"color:#ef4444\">to her\u003C\u002Fspan> eagerly, and away they went.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">47.1\u003C\u002Fspan> ✅** | \u003Cspan style=\"color:#ef4444\">&lt;empty&gt;\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">100.0\u003C\u002Fspan> 🔴** | \u003Cspan style=\"color:#ef4444\">But tell me, that's how she met\u003C\u002Fspan> my father\u003Cspan style=\"color:#ef4444\">'s sister\u003C\u002Fspan>. Oh, all right. \u003Cspan style=\"color:#ef4444\">I wish... I really...\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">86.1\u003C\u002Fspan> 🔴** | My father \u003Cspan style=\"color:#ef4444\">is\u003C\u002Fspan>. Oh, all right, \u003Cspan style=\"color:#ef4444\">I wish you can\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">85.3\u003C\u002Fspan> 🔴** | ...to him... some honey... \u003Cspan style=\"color:#ef4444\">oh yeah\u003C\u002Fspan>...\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">92.5\u003C\u002Fspan> 🔴** |\n| \u003Cvideo src=\"assets\u002Fcase_study\u002Flong_utterance_recovery.mp4\" controls width=\"240\">\u003C\u002Fvideo> | To waste, I skip forty years, said the baker in tears, and proceed without further remark to the day when you took me aboard your ship to help you in hunting the snark.\u003Cbr>\u003Cbr>*Reference* | \u003Cspan style=\"color:#ef4444\">To witness,\u003C\u002Fspan> I skip forty years, said the baker in tears, and proceed without further remark to the day when you took me aboard \u003Cspan style=\"color:#ef4444\">of\u003C\u002Fspan> your ship to help you in hunting the snark.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">5.9\u003C\u002Fspan> ✅** | \u003Cspan style=\"color:#ef4444\">I skipped 40 years. Second day in here. Ever since you left, I've been a monk...\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">64.7\u003C\u002Fspan> 🟠** | \u003Cspan style=\"color:#ef4444\">I spent forty years at sea and never seen a rougher than\u003C\u002Fspan> the day \u003Cspan style=\"color:#ef4444\">that\u003C\u002Fspan> you took me aboard your ship...\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">64.7\u003C\u002Fspan> 🟠** | \u003Cspan style=\"color:#ef4444\">To wait.\u003C\u002Fspan> I skip forty years. \u003Cspan style=\"color:#ef4444\">Saturday and years.\u003C\u002Fspan> And proceed without further remark...\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">38.2\u003C\u002Fspan> 🟡** | I skip forty years... to the day you took me \u003Cspan style=\"color:#ef4444\">on a ship\u003C\u002Fspan>... to hunt the \u003Cspan style=\"color:#ef4444\">shark\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">71.5\u003C\u002Fspan> 🟠** |\n\n\u003Cdetails>\n\u003Csummary>More examples\u003C\u002Fsummary>\n\n| Audio | Ground Truth | Mega-ASR (Ours) | Qwen3-ASR | Gemini-3-Pro | Seed-ASR | Whisper |\n||||||||\n| \u003Cvideo src=\"assets\u002Fcase_study\u002Fbabble_noise_hallucination.mp4\" controls width=\"240\">\u003C\u002Fvideo> | The friendly gang left the drug store.\u003Cbr>\u003Cbr>*Reference* | \u003Cspan style=\"color:#22c55e\">The friendly gang left the drug store.\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">8.0\u003C\u002Fspan> ✅** | \u003Cspan style=\"color:#ef4444\">It's a\u003C\u002Fspan> friendly gang. \u003Cspan style=\"color:#ef4444\">That's the drug gang.\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">57.1\u003C\u002Fspan> 🟠** | \u003Cspan style=\"color:#ef4444\">Friendly\u003C\u002Fspan> gang left the \u003Cspan style=\"color:#ef4444\">drugs\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">42.9\u003C\u002Fspan> 🟡** | The friendly gang left the \u003Cspan style=\"color:#ef4444\">drugstore\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">28.6\u003C\u002Fspan> 🟢** | \u003Cspan style=\"color:#ef4444\">A\u003C\u002Fspan> friendly \u003Cspan style=\"color:#ef4444\">young man\u003C\u002Fspan> left the drug store.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">62.3\u003C\u002Fspan> 🟠** |\n| \u003Cvideo src=\"assets\u002Fcase_study\u002Frestaurant_noise_recovery.mp4\" controls width=\"240\">\u003C\u002Fvideo> | The set of china hit the floor with a crash.\u003Cbr>\u003Cbr>*Reference* | \u003Cspan style=\"color:#22c55e\">The set of china hit the floor with a crash.\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">8.0\u003C\u002Fspan> ✅** | The \u003Cspan style=\"color:#ef4444\">bed is fine. It\u003C\u002Fspan> hit the floor with a crash.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">40.0\u003C\u002Fspan> 🟡** | \u003Cspan style=\"color:#ef4444\">He said it's fine I\u003C\u002Fspan> hit the \u003Cspan style=\"color:#ef4444\">forward slash\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">100.0\u003C\u002Fspan> 🔴** | The \u003Cspan style=\"color:#ef4444\">sound\u003C\u002Fspan> of china \u003Cspan style=\"color:#ef4444\">hits\u003C\u002Fspan> the floor with a crash.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">20.0\u003C\u002Fspan> 🟢** | The \u003Cspan style=\"color:#ef4444\">chef\u003C\u002Fspan> of \u003Cspan style=\"color:#ef4444\">China\u003C\u002Fspan> hit the floor with a \u003Cspan style=\"color:#ef4444\">clash\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">55.0\u003C\u002Fspan> 🟠** |\n| \u003Cvideo src=\"assets\u002Fcase_study\u002Ffinancial_entity_recovery.mp4\" controls width=\"240\">\u003C\u002Fvideo> | Among export-led electrical and computer makers, Japan Victor Company fell fifty to two thousand three hundred twenty.\u003Cbr>\u003Cbr>*Reference* | Among export-led \u003Cspan style=\"color:#ef4444\">(missing: electrical and)\u003C\u002Fspan> computer makers, Japan Victor Company fell fifty to two thousand three hundred twenty.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">11.1\u003C\u002Fspan> ✅** | Among export-led \u003Cspan style=\"color:#ef4444\">(missing: electrical and)\u003C\u002Fspan> computer makers, Japan \u003Cspan style=\"color:#ef4444\">VictorNet sold fifty-two thousand three hundred fifty\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">38.9\u003C\u002Fspan> 🟡** | Among export-led \u003Cspan style=\"color:#ef4444\">(missing: electrical and)\u003C\u002Fspan> computer makers, Japan Victor \u003Cspan style=\"color:#ef4444\">Co.\u003C\u002Fspan> fell \u003Cspan style=\"color:#ef4444\">50\u003C\u002Fspan> to \u003Cspan style=\"color:#ef4444\">2,350 yen\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">35.7\u003C\u002Fspan> 🟡** | Among export-led \u003Cspan style=\"color:#ef4444\">in\u003C\u002Fspan> computer makers, Japan Victor Company \u003Cspan style=\"color:#ef4444\">sell 50 to 2300 unit\u003C\u002Fspan>.\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">50.0\u003C\u002Fspan> 🟠** | Among \u003Cspan style=\"color:#ef4444\">exporters,\u003C\u002Fspan> computer makers \u003Cspan style=\"color:#ef4444\">in Japan victor companies sold\u003C\u002Fspan> fifty...\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">66.7\u003C\u002Fspan> 🟠** |\n| \u003Cvideo src=\"assets\u002Fcase_study\u002Fphrase_recovery.mp4\" controls width=\"240\">\u003C\u002Fvideo> | Has exposure really been reduced?\u003Cbr>\u003Cbr>*Reference* | \u003Cspan style=\"color:#22c55e\">Has exposure really been reduced\u003C\u002Fspan>\u003Cspan style=\"color:#ef4444\">.\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#22c55e\">8.0\u003C\u002Fspan> ✅** | Has exposure really \u003Cspan style=\"color:#ef4444\">done you?\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">40.0\u003C\u002Fspan> 🟡** | Has \u003Cspan style=\"color:#ef4444\">the closure\u003C\u002Fspan> really \u003Cspan style=\"color:#ef4444\">affected you?\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">80.0\u003C\u002Fspan> 🔴** | Has exposure \u003Cspan style=\"color:#ef4444\">to beauty products.\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">60.0\u003C\u002Fspan> 🟠** | \u003Cspan style=\"color:#ef4444\">Have those who\u003C\u002Fspan> really \u003Cspan style=\"color:#ef4444\">been refused?\u003C\u002Fspan>\u003Cbr>\u003Cbr>**WER: \u003Cspan style=\"color:#ef4444\">78.5\u003C\u002Fspan> 🔴** |\n\n\u003C\u002Fdetails> -->\n\n\n\n## 🔥News\n\n- [Coming]: We are going to release RL code and optimize WebUI.\n- [Coming]: Dataset and benchmark will be reformatted to be clearer.\n- [Coming]: We will release all the data process pipeline.\n- **May 20, 2025**: 🔥 We release **Voices-in-the-Wild-Bench**, a benchmark for in-the-wild ASR robustness evaluation.\n- **May 20, 2025**: 🔥 We release **Voices-in-the-Wild-2M**.\n- **May 20, 2025**: 🔥 We release the **Mega-ASR Inference and Training Codebase**.\n- **May 19, 2025**: 🔥 **Mega-ASR** model weights are now available on Hugging Face.\n- **May 19, 2025**: 🔥 We release the **Mega-ASR Technical Report**.\n\n## Overview\n\n\n* **[Quick Start](#quick-start)**\n* **[Introduction](#inference)**\n* **[Inference and deployment](#inference)**\n* **[Finetuning](#finetune)**\n* **[Evaluation](#evaluation)**\n* **[Citation and licence](#citation)**\n\n## Quick Start\n\nMega-ASR is trained on a large volume of inherently high-WER data, which leads to a slight degradation in its basic recognition capability. To address this, **we equip the system with a router** that determines whether Mega-ASR should be activated for the current audio input, via deciding whether to mount the LoRA weights.\n\n\n**Installation**\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fxzf-thu\u002FMega-ASR.git\ncd Mega-ASR\n\nconda create -n mega-asr python=3.10 -y\nconda activate mega-asr\npip install -r requirements.txt\n```\n\n**Download Weights**\n```bash\npython scripts\u002Fdownload.py\n```\n\n**Offline Inference**\n```bash\n# infer with default audio\nbash scripts\u002Finference.sh\n\n#Use your own audio:\nbash scripts\u002Finference.sh --audio \u002Fpath\u002Fto\u002Faudio.wav\n```\n\n\n## Introduction\n\n\n**MEGA-ASR** is purpose-built for **full-scenario robust ASR in the wild**, especially excelling at **semantic recovery** and **local keyword reconstruction** under severe acoustic degradation. It substantially reduces common failure modes such as **hallucinations**, **empty outputs**, and **dropped utterances**, making speech recognition reliable in truly challenging real-world environments.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Ffigures\u002Fradar_results.png\" alt=\"Results\" width=\"100%\">\n\u003C\u002Fp>\n\n### Features \n✅ **One model for the messy real world**: Covers **7 atomic acoustic conditions** and **54 compound acoustic scenarios** in a single model.\n\n✅ **Stronger recovery under severe distortion**: Excels at **semantic recovery** and **local keyword reconstruction**, greatly reducing **hallucinations**, **empty outputs**, and **dropped utterances**.\n\n✅ **SOTA robust ASR performance**: Achieves up to nearly **30% gains** over leading open and closed source SOTA models in challenging acoustic environments.\n\n\n\n\n\n## Finetuning\n\nYou can further fine-tune Mega-ASR on your own scenarios and data. You can also use our repository to directly train Qwen3-ASR.\n\n### A2S-SFT\n\n\n`src\u002FMegaASR\u002FA2S-SFT` contains the core training code for Mega-ASR A2S-SFT. \n\n```text\nsrc\u002FMegaASR\u002FA2S-SFT\u002F\n├── arguments.py      # Defines command-line arguments and training hyperparameters.\n├── checkpointing.py  # Saves base-model metadata and required processor\u002Ftokenizer files for LoRA reuse.\n├── dataloader.py     # Loads JSONL data, reads audio, builds model inputs, and masks non-target tokens.\n├── finetune.py       # Main entry point for launching A2S-SFT training.\n├── modeling.py       # Loads Qwen3-ASR and defines LoRA injection scopes.\n├── trainer.py        # Defines MegaASRTrainer with adapter-only saving and module-wise learning rates.\n```\n\n\nTraining data is in JSONL format:\n\n```json\n{\n  \"audio\": \"...\u002Fwavs\u002Ftest-clean\u002F61\u002F70968\u002F61-70968-0000.wav\",\n  \"text\": \"language English\u003Casr_text>THE TRANSCRIPT TEXT\",\n  \"prompt\": \"\"\n}\n```\n\nWe can use the following command to start it.\n\n```bash\ntorchrun --nproc_per_node=2 A2S_SFT\u002Ffinetune.py \\\n  --model_path Qwen3-ASR-1.7B --train_file ${TRAIN_JSONL} \\\n  --eval_file ${VAL_JSONL} --output_dir ${OUT_DIR} \\\n  --batch_size 8 --grad_acc 8 \\\n  --lr 1e-6 --lr_encoder 1e-6 --lr_aligner 1e-6 --lr_llm 1e-6 \\\n  --epochs 2 --save_steps 200 --save_total_limit 300 --use_lora 1 \\\n  --lora_scope all --lora_r 8 --lora_alpha 16 --lora_dropout 0.05 \\\n  --warmup_ratio 0.05 --max_grad_norm 1.0  --weight_decay 0.01 \\\n  --run_name ${RUN_NAME} --report_to wandb \\\n  2>&1 | tee -a ${LOG_FILE}\n```\n\nThe DG-WGPO reinforcement learning module will be released in a future update.\n\n## Evaluation\n\n\nWe provide a simple evaluation script for running Mega-ASR inference and computing WER\u002FCER. The input file should be a JSONL file. Each line only needs two required fields:\n\n```json\n{\"audio\": \"examples\u002Faudio\u002Fnoise.wav\", \"answer\": \"I usually take the quieter road home because the main street gets crowded after work.\"}\n```\n\n\nThe script will keep all original fields and append the following fields to the output JSONL:\n\n```text\nprediction  # model transcription\nmetric      # \"wer\" for English samples, \"cer\" for Chinese samples\nwer         # WER\u002FCER score value; CER is also stored in this field for compatibility\nnum_edits   # edit distance between prediction and ground truth\nref_len     # number of reference words or characters\n```\nThe script reuses the same Mega-ASR wrapper as `infer.py`, loading the base model, LoRA, and router from `ckpt\u002FMega-ASR`.\n\n```bash\npython src\u002FMegaASR\u002Feval\u002Fevaluate_wer.py \\\n  --ckpt_dir ckpt\u002FMega-ASR \\\n  --input_jsonl examples\u002Ftest.jsonl \\\n  --output_jsonl outputs\u002Fpred_with_wer.jsonl\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"\u002Fdocs\u002Fassets\u002Ftraining.png\" alt=\"Mega-ASR Training\" width=\"100%\">\n\u003C\u002Fp>\n\n**Mega-ASR** is trained with an acoustic-to-semantic progressive supervised fine-tuning strategy: it first curriculum-trains the encoder and aligner on increasingly difficult samples from WER\u003C30% to WER\u003C50% and then WER\u003C70%, then fine-tunes the LLM on WER\u003C70% data to strengthen semantic recovery, and finally jointly fine-tunes the full encoder-aligner-LLM stack for end-to-end alignment.\n\nOn top of Mega-ASR-Base, DG-WGPO further optimizes the model with WER-gated policy learning: low-WER samples emphasize token-level acoustic refinement, while high-WER samples emphasize sentence-level semantic reconstruction to reduce hallucinations, omissions, and off-audio outputs. The final reward combines a static WER-based accuracy signal with an anti-repetition gate and a dynamic dual-granularity reward, using fixed hyperparameters τ=0.3, αs=0.4, and αdyn=0.6.\n\n\nRun Mega-ASR inference without routing if you want to force the LoRA on every sample:\n\n```bash\npython src\u002FMegaASR\u002Feval\u002Fevaluate_wer.py \\\n  --ckpt_dir ckpt\u002FMega-ASR \\\n  --input_jsonl examples\u002Ftest.jsonl \\\n  --output_jsonl outputs\u002Fpred_with_wer.jsonl \\\n  --no-routing\n```\n\nEach input line requires `audio` or `audio_path`, plus `answer` as the ground-truth transcription.\n\n**Mega-ASR** is evaluated across three benchmark families — classical academic test sets, robustness benchmarks, and our own in-the-wild compound benchmark.\n\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"\u002Fassets\u002Ftables\u002Fnoisy_robust_asr_benchmarks.png\" alt=\"Mega-ASR Results\" width=\"100%\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"\u002Fassets\u002Ftables\u002Fvoices_in_the_wild_breakdown.png\" alt=\"Mega-ASR Results\" width=\"100%\">\n\u003C\u002Fp>\n\n\n## Acknowledgements\n\nWe sincerely thank the creators, maintainers, and contributors of the public datasets used in this work, including MUSAN, DNS Challenge, ESC-50, UrbanSound8K, LibriSpeech, Common Voice, WenetSpeech, and AISHELL-1.\n\nWe also sincerely thank the Qwen3-ASR Team for developing such an excellent foundation model, which provides a strong backbone for this work.\n\n## Licence, Citation and stars\nThis project will be released under the **Apache-2.0 License**. You can do everything with Mega-ASR 🎉\n\n\n**Citation**: You can cite Mega-ASR using the following BibTeX entry. Thank you for your kindness 🙂\n\n```bibtex\n@misc{xie2026megaasrinthewild2speechrecognition,\n      title={Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation},\n      author={Zhifei Xie and Kaiyu Pang and Haobin Zhang and Deheng Ye and Xiaobin Hu and Shuicheng Yan and Chunyan Miao},\n      year={2026},\n      eprint={2605.19833},\n      archivePrefix={arXiv},\n      primaryClass={cs.SD},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.19833},\n}\n```\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?repos=gpt-omni%2Fmini-omi%2Cxzf-thu%2FMega-ASR&type=date&legend=bottom-right\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=gpt-omni\u002Fmini-omi%2Cxzf-thu\u002FMega-ASR&type=date&theme=dark&legend=bottom-right\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=gpt-omni\u002Fmini-omi%2Cxzf-thu\u002FMega-ASR&type=date&legend=bottom-right\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=gpt-omni\u002Fmini-omi%2Cxzf-thu\u002FMega-ASR&type=date&legend=bottom-right\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n","Mega-ASR 是一个面向全场景鲁棒语音识别的基础模型，通过在7种基本声学条件和54种复合声学场景上进行系统训练来实现。该项目基于260万条包含噪声、远场语音、遮挡、回声和混响、录音伪影、电子失真及传输中断等多样音频样本构建，采用A2S-SFT与DG-WGPO强化学习技术，在复杂声学环境中相较于现有领先开源及闭源模型性能提升近30%。适用于需要高精度语音识别的现实世界应用，如智能家居、车载系统、远程会议等复杂声音环境下的语音处理任务。","2026-06-11 03:56:10","CREATED_QUERY"]