[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1982":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":15,"starSnapshotCount":15,"syncStatus":18,"lastSyncTime":31,"discoverSource":32},1982,"fairseq","facebookresearch\u002Ffairseq","facebookresearch","Facebook AI Research Sequence-to-Sequence Toolkit written in Python.",null,"Python",32231,6680,421,1201,0,5,25,2,45,"MIT License",true,false,"main",[25,26,27],"artificial-intelligence","python","pytorch","2026-06-12 02:00:35","\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Ffairseq_logo.png\" width=\"150\">\n  \u003Cbr \u002F>\n  \u003Cbr \u002F>\n  \u003Ca href=\"https:\u002F\u002Fopensource.fb.com\u002Fsupport-ukraine\">\u003Cimg alt=\"Support Ukraine\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSupport-Ukraine-FFD500?style=flat&labelColor=005BBB\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg alt=\"MIT License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Freleases\">\u003Cimg alt=\"Latest Release\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002Fpytorch\u002Ffairseq.svg\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Factions?query=workflow:build\">\u003Cimg alt=\"Build Status\" src=\"https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Fworkflows\u002Fbuild\u002Fbadge.svg\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ffairseq.readthedocs.io\u002Fen\u002Flatest\u002F?badge=latest\">\u003Cimg alt=\"Documentation Status\" src=\"https:\u002F\u002Freadthedocs.org\u002Fprojects\u002Ffairseq\u002Fbadge\u002F?version=latest\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fapp.circleci.com\u002Fpipelines\u002Fgithub\u002Ffacebookresearch\u002Ffairseq\u002F\">\u003Cimg alt=\"CicleCI Status\" src=\"https:\u002F\u002Fcircleci.com\u002Fgh\u002Ffacebookresearch\u002Ffairseq.svg?style=shield\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n--------------------------------------------------------------------------------\n\nFairseq(-py) is a sequence modeling toolkit that allows researchers and\ndevelopers to train custom models for translation, summarization, language\nmodeling and other text generation tasks.\n\nWe provide reference implementations of various sequence modeling papers:\n\n\u003Cdetails>\u003Csummary>List of implemented papers\u003C\u002Fsummary>\u003Cp>\n\n* **Convolutional Neural Networks (CNN)**\n  + [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples\u002Flanguage_model\u002Fconv_lm\u002FREADME.md)\n  + [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples\u002Fconv_seq2seq\u002FREADME.md)\n  + [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fclassic_seqlevel)\n  + [Hierarchical Neural Story Generation (Fan et al., 2018)](examples\u002Fstories\u002FREADME.md)\n  + [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples\u002Fwav2vec\u002FREADME.md)\n* **LightConv and DynamicConv models**\n  + [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples\u002Fpay_less_attention_paper\u002FREADME.md)\n* **Long Short-Term Memory (LSTM) networks**\n  + Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)\n* **Transformer (self-attention) networks**\n  + Attention Is All You Need (Vaswani et al., 2017)\n  + [Scaling Neural Machine Translation (Ott et al., 2018)](examples\u002Fscaling_nmt\u002FREADME.md)\n  + [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples\u002Fbacktranslation\u002FREADME.md)\n  + [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples\u002Flanguage_model\u002FREADME.adaptive_inputs.md)\n  + [Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)](examples\u002Fconstrained_decoding\u002FREADME.md)\n  + [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)](examples\u002Ftruncated_bptt\u002FREADME.md)\n  + [Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)](examples\u002Fadaptive_span\u002FREADME.md)\n  + [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples\u002Ftranslation_moe\u002FREADME.md)\n  + [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples\u002Froberta\u002FREADME.md)\n  + [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples\u002Fwmt19\u002FREADME.md)\n  + [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples\u002Fjoint_alignment_translation\u002FREADME.md )\n  + [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples\u002Fmbart\u002FREADME.md)\n  + [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples\u002Fbyte_level_bpe\u002FREADME.md)\n  + [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples\u002Funsupervised_quality_estimation\u002FREADME.md)\n  + [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples\u002Fwav2vec\u002FREADME.md)\n  + [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples\u002Fpointer_generator\u002FREADME.md)\n  + [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples\u002Flinformer\u002FREADME.md)\n  + [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples\u002Fcriss\u002FREADME.md)\n  + [Deep Transformers with Latent Depth (Li et al., 2020)](examples\u002Flatent_depth\u002FREADME.md)\n  + [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.13979)\n  + [Self-training and Pre-training are Complementary for Speech Recognition (Xu et al., 2020)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2010.11430)\n  + [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.01027)\n  + [Unsupervised Speech Recognition (Baevski, et al., 2021)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2105.11084)\n  + [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition (Xu et al., 2021)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2109.11680)\n  + [VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (Xu et. al., 2021)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2109.14084.pdf)\n  + [VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (Xu et. al., 2021)](https:\u002F\u002Faclanthology.org\u002F2021.findings-acl.370.pdf)\n  + [NormFormer: Improved Transformer Pretraining with Extra Normalization (Shleifer et. al, 2021)](examples\u002Fnormformer\u002FREADME.md)\n* **Non-autoregressive Transformers**\n  + Non-Autoregressive Neural Machine Translation (Gu et al., 2017)\n  + Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)\n  + Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)\n  + Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)\n  + [Levenshtein Transformer (Gu et al., 2019)](examples\u002Fnonautoregressive_translation\u002FREADME.md)\n* **Finetuning**\n  + [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples\u002Frxf\u002FREADME.md)\n\n\u003C\u002Fp>\u003C\u002Fdetails>\n\n### What's New:\n* May 2023 [Released models for Scaling Speech Technology to 1,000+ Languages  (Pratap, et al., 2023)](examples\u002Fmms\u002FREADME.md)\n* June 2022 [Released code for wav2vec-U 2.0 from Towards End-to-end Unsupervised Speech Recognition (Liu, et al., 2022)](examples\u002Fwav2vec\u002Funsupervised\u002FREADME.md)\n* May 2022 [Integration with xFormers](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fxformers)\n* December 2021 [Released Direct speech-to-speech translation code](examples\u002Fspeech_to_speech\u002FREADME.md)\n* October 2021 [Released VideoCLIP and VLM models](examples\u002FMMPT\u002FREADME.md)\n* October 2021 [Released multilingual finetuned XLSR-53 model](examples\u002Fwav2vec\u002FREADME.md)\n* September 2021 [`master` branch renamed to `main`](https:\u002F\u002Fgithub.com\u002Fgithub\u002Frenaming).\n* July 2021 [Released DrNMT code](examples\u002Fdiscriminative_reranking_nmt\u002FREADME.md)\n* July 2021 [Released Robust wav2vec 2.0 model](examples\u002Fwav2vec\u002FREADME.md)\n* June 2021 [Released XLMR-XL and XLMR-XXL models](examples\u002Fxlmr\u002FREADME.md)\n* May 2021 [Released Unsupervised Speech Recognition code](examples\u002Fwav2vec\u002Funsupervised\u002FREADME.md)\n* March 2021 [Added full parameter and optimizer state sharding + CPU offloading](examples\u002Ffully_sharded_data_parallel\u002FREADME.md)\n* February 2021 [Added LASER training code](examples\u002Flaser\u002FREADME.md)\n* December 2020: [Added Adaptive Attention Span code](examples\u002Fadaptive_span\u002FREADME.md)\n* December 2020: [GottBERT model and code released](examples\u002Fgottbert\u002FREADME.md)\n* November 2020: Adopted the [Hydra](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhydra) configuration framework\n  * [see documentation explaining how to use it for new and existing projects](docs\u002Fhydra_integration.md)\n* November 2020: [fairseq 0.10.0 released](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Freleases\u002Ftag\u002Fv0.10.0)\n* October 2020: [Added R3F\u002FR4F (Better Fine-Tuning) code](examples\u002Frxf\u002FREADME.md)\n* October 2020: [Deep Transformer with Latent Depth code released](examples\u002Flatent_depth\u002FREADME.md)\n* October 2020: [Added CRISS models and code](examples\u002Fcriss\u002FREADME.md)\n\n\u003Cdetails>\u003Csummary>Previous updates\u003C\u002Fsummary>\u003Cp>\n\n* September 2020: [Added Linformer code](examples\u002Flinformer\u002FREADME.md)\n* September 2020: [Added pointer-generator networks](examples\u002Fpointer_generator\u002FREADME.md)\n* August 2020: [Added lexically constrained decoding](examples\u002Fconstrained_decoding\u002FREADME.md)\n* August 2020: [wav2vec2 models and code released](examples\u002Fwav2vec\u002FREADME.md)\n* July 2020: [Unsupervised Quality Estimation code released](examples\u002Funsupervised_quality_estimation\u002FREADME.md)\n* May 2020: [Follow fairseq on Twitter](https:\u002F\u002Ftwitter.com\u002Ffairseq)\n* April 2020: [Monotonic Multihead Attention code released](examples\u002Fsimultaneous_translation\u002FREADME.md)\n* April 2020: [Quant-Noise code released](examples\u002Fquant_noise\u002FREADME.md)\n* April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples\u002Fmegatron_11b\u002FREADME.md)\n* March 2020: [Byte-level BPE code released](examples\u002Fbyte_level_bpe\u002FREADME.md)\n* February 2020: [mBART model and code released](examples\u002Fmbart\u002FREADME.md)\n* February 2020: [Added tutorial for back-translation](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fmain\u002Fexamples\u002Fbacktranslation#training-your-own-model-wmt18-english-german)\n* December 2019: [fairseq 0.9.0 released](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Freleases\u002Ftag\u002Fv0.9.0)\n* November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https:\u002F\u002Ffacebookresearch.github.io\u002Fvizseq\u002Fdocs\u002Fgetting_started\u002Ffairseq_example)\n* November 2019: [CamemBERT model and code released](examples\u002Fcamembert\u002FREADME.md)\n* November 2019: [BART model and code released](examples\u002Fbart\u002FREADME.md)\n* November 2019: [XLM-R models and code released](examples\u002Fxlmr\u002FREADME.md)\n* September 2019: [Nonautoregressive translation code released](examples\u002Fnonautoregressive_translation\u002FREADME.md)\n* August 2019: [WMT'19 models released](examples\u002Fwmt19\u002FREADME.md)\n* July 2019: fairseq relicensed under MIT license\n* July 2019: [RoBERTa models and code released](examples\u002Froberta\u002FREADME.md)\n* June 2019: [wav2vec models and code released](examples\u002Fwav2vec\u002FREADME.md)\n\n\u003C\u002Fp>\u003C\u002Fdetails>\n\n### Features:\n\n* multi-GPU training on one machine or across multiple machines (data and model parallel)\n* fast generation on both CPU and GPU with multiple search algorithms implemented:\n  + beam search\n  + Diverse Beam Search ([Vijayakumar et al., 2016](https:\u002F\u002Farxiv.org\u002Fabs\u002F1610.02424))\n  + sampling (unconstrained, top-k and top-p\u002Fnucleus)\n  + [lexically constrained decoding](examples\u002Fconstrained_decoding\u002FREADME.md) (Post & Vilar, 2018)\n* [gradient accumulation](https:\u002F\u002Ffairseq.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started.html#large-mini-batch-training-with-delayed-updates) enables training with large mini-batches even on a single GPU\n* [mixed precision training](https:\u002F\u002Ffairseq.readthedocs.io\u002Fen\u002Flatest\u002Fgetting_started.html#training-with-half-precision-floating-point-fp16) (trains faster with less GPU memory on [NVIDIA tensor cores](https:\u002F\u002Fdeveloper.nvidia.com\u002Ftensor-cores))\n* [extensible](https:\u002F\u002Ffairseq.readthedocs.io\u002Fen\u002Flatest\u002Foverview.html): easily register new models, criterions, tasks, optimizers and learning rate schedulers\n* [flexible configuration](docs\u002Fhydra_integration.md) based on [Hydra](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhydra) allowing a combination of code, command-line and file based configuration\n* [full parameter and optimizer state sharding](examples\u002Ffully_sharded_data_parallel\u002FREADME.md)\n* [offloading parameters to CPU](examples\u002Ffully_sharded_data_parallel\u002FREADME.md)\n\nWe also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)\nwith a convenient `torch.hub` interface:\n\n``` python\nen2de = torch.hub.load('pytorch\u002Ffairseq', 'transformer.wmt19.en-de.single_model')\nen2de.translate('Hello world', beam=5)\n# 'Hallo Welt'\n```\n\nSee the PyTorch Hub tutorials for [translation](https:\u002F\u002Fpytorch.org\u002Fhub\u002Fpytorch_fairseq_translation\u002F)\nand [RoBERTa](https:\u002F\u002Fpytorch.org\u002Fhub\u002Fpytorch_fairseq_roberta\u002F) for more examples.\n\n# Requirements and Installation\n\n* [PyTorch](http:\u002F\u002Fpytorch.org\u002F) version >= 1.10.0\n* Python version >= 3.8\n* For training new models, you'll also need an NVIDIA GPU and [NCCL](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fnccl)\n* **To install fairseq** and develop locally:\n\n``` bash\ngit clone https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\ncd fairseq\npip install --editable .\u002F\n\n# on MacOS:\n# CFLAGS=\"-stdlib=libc++\" pip install --editable .\u002F\n\n# to install the latest stable release (0.10.x)\n# pip install fairseq\n```\n\n* **For faster training** install NVIDIA's [apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex) library:\n\n``` bash\ngit clone https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex\ncd apex\npip install -v --no-cache-dir --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" \\\n  --global-option=\"--deprecated_fused_adam\" --global-option=\"--xentropy\" \\\n  --global-option=\"--fast_multihead_attn\" .\u002F\n```\n\n* **For large datasets** install [PyArrow](https:\u002F\u002Farrow.apache.org\u002Fdocs\u002Fpython\u002Finstall.html#using-pip): `pip install pyarrow`\n* If you use Docker make sure to increase the shared memory size either with `--ipc=host` or `--shm-size`\n as command line options to `nvidia-docker run` .\n\n# Getting Started\n\nThe [full documentation](https:\u002F\u002Ffairseq.readthedocs.io\u002F) contains instructions\nfor getting started, training new models and extending fairseq with new model\ntypes and tasks.\n\n# Pre-trained models and examples\n\nWe provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,\nas well as example training and evaluation commands.\n\n* [Translation](examples\u002Ftranslation\u002FREADME.md): convolutional and transformer models are available\n* [Language Modeling](examples\u002Flanguage_model\u002FREADME.md): convolutional and transformer models are available\n\nWe also have more detailed READMEs to reproduce results from specific papers:\n\n* [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale (Babu et al., 2021)](examples\u002Fwav2vec\u002Fxlsr\u002FREADME.md)\n* [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples\u002Fcriss\u002FREADME.md)\n* [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples\u002Fwav2vec\u002FREADME.md)\n* [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples\u002Funsupervised_quality_estimation\u002FREADME.md)\n* [Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)](examples\u002Fquant_noise\u002FREADME.md)\n* [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples\u002Fbyte_level_bpe\u002FREADME.md)\n* [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples\u002Fmbart\u002FREADME.md)\n* [Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)](examples\u002Flayerdrop\u002FREADME.md)\n* [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples\u002Fjoint_alignment_translation\u002FREADME.md)\n* [Levenshtein Transformer (Gu et al., 2019)](examples\u002Fnonautoregressive_translation\u002FREADME.md)\n* [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples\u002Fwmt19\u002FREADME.md)\n* [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples\u002Froberta\u002FREADME.md)\n* [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples\u002Fwav2vec\u002FREADME.md)\n* [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples\u002Ftranslation_moe\u002FREADME.md)\n* [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples\u002Fpay_less_attention_paper\u002FREADME.md)\n* [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples\u002Fbacktranslation\u002FREADME.md)\n* [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Ffairseq\u002Ftree\u002Fclassic_seqlevel)\n* [Hierarchical Neural Story Generation (Fan et al., 2018)](examples\u002Fstories\u002FREADME.md)\n* [Scaling Neural Machine Translation (Ott et al., 2018)](examples\u002Fscaling_nmt\u002FREADME.md)\n* [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples\u002Fconv_seq2seq\u002FREADME.md)\n* [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples\u002Flanguage_model\u002FREADME.conv.md)\n\n# Join the fairseq community\n\n* Twitter: https:\u002F\u002Ftwitter.com\u002Ffairseq\n* Facebook page: https:\u002F\u002Fwww.facebook.com\u002Fgroups\u002Ffairseq.users\n* Google group: https:\u002F\u002Fgroups.google.com\u002Fforum\u002F#!forum\u002Ffairseq-users\n\n# License\n\nfairseq(-py) is MIT-licensed.\nThe license applies to the pre-trained models as well.\n\n# Citation\n\nPlease cite as:\n\n``` bibtex\n@inproceedings{ott2019fairseq,\n  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},\n  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},\n  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},\n  year = {2019},\n}\n```\n","fairseq 是一个由 Facebook AI Research 开发的用于序列建模的 Python 工具包，支持翻译、摘要生成、语言模型等文本生成任务。它基于 PyTorch 构建，提供了多种前沿的序列到序列模型实现，包括卷积神经网络（CNN）、轻量级和动态卷积模型、长短期记忆网络（LSTM）以及自注意力机制的 Transformer 网络等。fairseq 通过提供这些模型的参考实现，使得研究人员能够快速复现论文中的实验结果，并在此基础上进行创新。该工具包适用于需要处理自然语言处理中序列数据的各种场景，如机器翻译、文本摘要、语音识别等，是学术研究与工业应用的理想选择。","2026-06-11 02:47:10","top_all"]