[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9724":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},9724,"attention-is-all-you-need-pytorch","jadore801120\u002Fattention-is-all-you-need-pytorch","jadore801120","A PyTorch implementation of the Transformer model in \"Attention is All You Need\".","",null,"Python",9736,2093,91,66,0,1,3,21,71.56,"MIT License",false,"master",true,[26,27,28,29,30,31],"attention","attention-is-all-you-need","deep-learning","natural-language-processing","nlp","pytorch","2026-06-12 04:00:46","# Attention is all you need: A Pytorch Implementation\n\nThis is a PyTorch implementation of the Transformer model in \"[Attention is All You Need](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762)\" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). \n\n\nA novel sequence to sequence framework utilizes the **self-attention mechanism**, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on **WMT 2014 English-to-German translation task**. (2017\u002F06\u002F12)\n\n> The official Tensorflow Implementation can be found in: [tensorflow\u002Ftensor2tensor](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Fblob\u002Fmaster\u002Ftensor2tensor\u002Fmodels\u002Ftransformer.py).\n\n> To learn more about self-attention mechanism, you could read \"[A Structured Self-attentive Sentence Embedding](https:\u002F\u002Farxiv.org\u002Fabs\u002F1703.03130)\".\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"http:\u002F\u002Fimgur.com\u002F1krF2R6.png\" width=\"250\">\n\u003C\u002Fp>\n\n\nThe project support training and translation with trained model now.\n\nNote that this project is still a work in progress.\n\n**BPE related parts are not yet fully tested.**\n\n\nIf there is any suggestion or error, feel free to fire an issue to let me know. :)\n\n\n# Usage\n\n## WMT'16 Multimodal Translation: de-en\n\nAn example of training for the WMT'16 Multimodal Translation task (http:\u002F\u002Fwww.statmt.org\u002Fwmt16\u002Fmultimodal-task.html).\n\n### 0) Download the spacy language model.\n```bash\n# conda install -c conda-forge spacy \npython -m spacy download en\npython -m spacy download de\n```\n\n### 1) Preprocess the data with torchtext and spacy.\n```bash\npython preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl\n```\n\n### 2) Train the model\n```bash\npython train.py -data_pkl m30k_deen_shr.pkl -log m30k_deen_shr -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400\n```\n\n### 3) Test the model\n```bash\npython translate.py -data_pkl m30k_deen_shr.pkl -model trained.chkpt -output prediction.txt\n```\n\n## [(WIP)] WMT'17 Multimodal Translation: de-en w\u002F BPE \n### 1) Download and preprocess the data with bpe:\n\n> Since the interfaces is not unified, you need to switch the main function call from `main_wo_bpe` to `main`.\n\n```bash\npython preprocess.py -raw_dir \u002Ftmp\u002Fraw_deen -data_dir .\u002Fbpe_deen -save_data bpe_vocab.pkl -codes codes.txt -prefix deen\n```\n\n### 2) Train the model\n```bash\npython train.py -data_pkl .\u002Fbpe_deen\u002Fbpe_vocab.pkl -train_path .\u002Fbpe_deen\u002Fdeen-train -val_path .\u002Fbpe_deen\u002Fdeen-val -log deen_bpe -embs_share_weight -proj_share_weight -label_smoothing -output_dir output -b 256 -warmup 128000 -epoch 400\n```\n\n### 3) Test the model (not ready)\n- TODO:\n\t- Load vocabulary.\n\t- Perform decoding after the translation.\n---\n# Performance\n## Training\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FS2EVtJx.png\" width=\"400\">\n\u003Cimg src=\"https:\u002F\u002Fi.imgur.com\u002FIZQmUKO.png\" width=\"400\">\n\u003C\u002Fp>\n\n- Parameter settings:\n  - batch size 256 \n  - warmup step 4000 \n  - epoch 200 \n  - lr_mul 0.5\n  - label smoothing \n  - do not apply BPE and shared vocabulary\n  - target embedding \u002F pre-softmax linear layer weight sharing. \n \n  \n## Testing \n- coming soon.\n---\n# TODO\n  - Evaluation on the generated text.\n  - Attention weight plot.\n---\n# Acknowledgement\n- The byte pair encoding parts are borrowed from [subword-nmt](https:\u002F\u002Fgithub.com\u002Frsennrich\u002Fsubword-nmt\u002F).\n- The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from [OpenNMT\u002FOpenNMT-py](https:\u002F\u002Fgithub.com\u002FOpenNMT\u002FOpenNMT-py).\n- Thanks for the suggestions from @srush, @iamalbert, @Zessay, @JulesGM, @ZiJianZhao, and @huanghoujing.\n","该项目是一个基于PyTorch实现的Transformer模型，源自论文《Attention is All You Need》。它采用自注意力机制而非传统的卷积或循环结构来处理序列到序列的任务，并在WMT 2014英德翻译任务中取得了顶尖性能。项目支持训练与使用已训练模型进行翻译，特别适用于自然语言处理中的机器翻译场景。尽管BPE相关部分仍在测试中，但此实现为研究者和开发者提供了一个强大的工具来探索自注意力机制及其在深度学习中的应用潜力。",2,"2026-06-11 03:24:24","top_topic"]