[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2664":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":24,"defaultBranch":25,"hasWiki":24,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":35,"discoverSource":36},2664,"tensor2tensor","tensorflow\u002Ftensor2tensor","tensorflow","Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.","",null,"Python",17331,3745,3,574,0,2,14,81,9,89.1,"Apache License 2.0",true,false,"master",[27,28,29,30,31],"deep-learning","machine-learning","machine-translation","reinforcement-learning","tpu","2026-06-12 04:00:15","# Tensor2Tensor\n\n[![PyPI\nversion](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ftensor2tensor.svg)](https:\u002F\u002Fbadge.fury.io\u002Fpy\u002Ftensor2tensor)\n[![GitHub\nIssues](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Ftensorflow\u002Ftensor2tensor.svg)](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Fissues)\n[![Contributions\nwelcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcontributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n[![Gitter](https:\u002F\u002Fimg.shields.io\u002Fgitter\u002Froom\u002Fnwjs\u002Fnw.js.svg)](https:\u002F\u002Fgitter.im\u002Ftensor2tensor\u002FLobby)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-brightgreen.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FApache-2.0)\n[![Travis](https:\u002F\u002Fimg.shields.io\u002Ftravis\u002Ftensorflow\u002Ftensor2tensor.svg)](https:\u002F\u002Ftravis-ci.org\u002Ftensorflow\u002Ftensor2tensor)\n[![Run on FH](https:\u002F\u002Fstatic.floydhub.com\u002Fbutton\u002Fbutton-small.svg)](https:\u002F\u002Ffloydhub.com\u002Frun)\n\n[Tensor2Tensor](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor), or\n[T2T](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor) for short, is a library\nof deep learning models and datasets designed to make deep learning more\naccessible and [accelerate ML\nresearch](https:\u002F\u002Fresearch.googleblog.com\u002F2017\u002F06\u002Faccelerating-deep-learning-research.html).\n\n\nT2T was developed by researchers and engineers in the\n[Google Brain team](https:\u002F\u002Fresearch.google.com\u002Fteams\u002Fbrain\u002F) and a community\nof users. It is now deprecated &mdash; we keep it running and welcome\nbug-fixes, but encourage users to use the successor library [Trax](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Ftrax).\n\n### Quick Start\n\n[This iPython notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Ftensorflow\u002Ftensor2tensor\u002Fblob\u002Fmaster\u002Ftensor2tensor\u002Fnotebooks\u002Fhello_t2t.ipynb)\nexplains T2T and runs in your browser using a free VM from Google,\nno installation needed. Alternatively, here is a one-command version that\ninstalls T2T, downloads MNIST, trains a model and evaluates it:\n\n```\npip install tensor2tensor && t2t-trainer \\\n  --generate_data \\\n  --data_dir=~\u002Ft2t_data \\\n  --output_dir=~\u002Ft2t_train\u002Fmnist \\\n  --problem=image_mnist \\\n  --model=shake_shake \\\n  --hparams_set=shake_shake_quick \\\n  --train_steps=1000 \\\n  --eval_steps=100\n```\n\n### Contents\n\n* [Suggested Datasets and Models](#suggested-datasets-and-models)\n  * [Mathematical Language Understanding](#mathematical-language-understanding)\n  * [Story, Question and Answer](#story-question-and-answer)\n  * [Image Classification](#image-classification)\n  * [Image Generation](#image-generation)\n  * [Language Modeling](#language-modeling)\n  * [Sentiment Analysis](#sentiment-analysis)\n  * [Speech Recognition](#speech-recognition)\n  * [Summarization](#summarization)\n  * [Translation](#translation)\n* [Basics](#basics)\n  * [Walkthrough](#walkthrough)\n  * [Installation](#installation)\n  * [Features](#features)\n* [T2T Overview](#t2t-overview)\n  * [Datasets](#datasets)\n  * [Problems and Modalities](#problems-and-modalities)\n  * [Models](#models)\n  * [Hyperparameter Sets](#hyperparameter-sets)\n  * [Trainer](#trainer)\n* [Adding your own components](#adding-your-own-components)\n* [Adding a dataset](#adding-a-dataset)\n* [Papers](#papers)\n* [Run on FloydHub](#run-on-floydhub)\n\n## Suggested Datasets and Models\n\nBelow we list a number of tasks that can be solved with T2T when\nyou train the appropriate model on the appropriate problem.\nWe give the problem and model below and we suggest a setting of\nhyperparameters that we know works well in our setup. We usually\nrun either on Cloud TPUs or on 8-GPU machines; you might need\nto modify the hyperparameters if you run on a different setup.\n\n### Mathematical Language Understanding\n\nFor evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use\n\n* the [MLU](https:\u002F\u002Fart.wangperawong.com\u002Fmathematical_language_understanding_train.tar.gz) data-set:\n `--problem=algorithmic_math_two_variables`\n\nYou can try solving the problem with different transformer models and hyperparameters as described in the [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.02825):\n* Standard transformer:\n`--model=transformer`\n`--hparams_set=transformer_tiny`\n* Universal transformer:\n`--model=universal_transformer`\n`--hparams_set=universal_transformer_tiny`\n* Adaptive universal transformer:\n`--model=universal_transformer`\n`--hparams_set=adaptive_universal_transformer_tiny`\n\n### Story, Question and Answer\n\nFor answering questions based on a story, use\n\n* the [bAbi](https:\u002F\u002Fresearch.fb.com\u002Fdownloads\u002Fbabi\u002F) data-set:\n `--problem=babi_qa_concat_task1_1k`\n\nYou can choose the bAbi task from the range [1,20] and the subset from 1k or\n10k. To combine test data from all tasks into a single test set, use\n`--problem=babi_qa_concat_all_tasks_10k`\n\n### Image Classification\n\nFor image classification, we have a number of standard data-sets:\n\n* ImageNet (a large data-set): `--problem=image_imagenet`, or one\n   of the re-scaled versions (`image_imagenet224`, `image_imagenet64`,\n   `image_imagenet32`)\n* CIFAR-10: `--problem=image_cifar10` (or\n    `--problem=image_cifar10_plain` to turn off data augmentation)\n* CIFAR-100: `--problem=image_cifar100`\n* MNIST: `--problem=image_mnist`\n\nFor ImageNet, we suggest to use the ResNet or Xception, i.e.,\nuse `--model=resnet --hparams_set=resnet_50` or\n`--model=xception --hparams_set=xception_base`.\nResnet should get to above 76% top-1 accuracy on ImageNet.\n\nFor CIFAR and MNIST, we suggest to try the shake-shake model:\n`--model=shake_shake --hparams_set=shakeshake_big`.\nThis setting trained for `--train_steps=700000` should yield\nclose to 97% accuracy on CIFAR-10.\n\n### Image Generation\n\nFor (un)conditional image generation, we have a number of standard data-sets:\n\n* CelebA: `--problem=img2img_celeba` for image-to-image translation, namely,\n    superresolution from 8x8 to 32x32.\n* CelebA-HQ: `--problem=image_celeba256_rev` for a downsampled 256x256.\n* CIFAR-10: `--problem=image_cifar10_plain_gen_rev` for class-conditional\n    32x32 generation.\n* LSUN Bedrooms: `--problem=image_lsun_bedrooms_rev`\n* MS-COCO: `--problem=image_text_ms_coco_rev` for text-to-image generation.\n* Small ImageNet (a large data-set): `--problem=image_imagenet32_gen_rev` for\n    32x32 or `--problem=image_imagenet64_gen_rev` for 64x64.\n\nWe suggest to use the Image Transformer, i.e., `--model=imagetransformer`, or\nthe Image Transformer Plus, i.e., `--model=imagetransformerpp` that uses\ndiscretized mixture of logistics, or variational auto-encoder, i.e.,\n`--model=transformer_ae`.\nFor CIFAR-10, using `--hparams_set=imagetransformer_cifar10_base` or\n`--hparams_set=imagetransformer_cifar10_base_dmol` yields 2.90 bits per\ndimension. For Imagenet-32, using\n`--hparams_set=imagetransformer_imagenet32_base` yields 3.77 bits per dimension.\n\n### Language Modeling\n\nFor language modeling, we have these data-sets in T2T:\n\n* PTB (a small data-set): `--problem=languagemodel_ptb10k` for\n    word-level modeling and `--problem=languagemodel_ptb_characters`\n    for character-level modeling.\n* LM1B (a billion-word corpus): `--problem=languagemodel_lm1b32k` for\n    subword-level modeling and `--problem=languagemodel_lm1b_characters`\n    for character-level modeling.\n\nWe suggest to start with `--model=transformer` on this task and use\n`--hparams_set=transformer_small` for PTB and\n`--hparams_set=transformer_base` for LM1B.\n\n### Sentiment Analysis\n\nFor the task of recognizing the sentiment of a sentence, use\n\n* the IMDB data-set: `--problem=sentiment_imdb`\n\nWe suggest to use `--model=transformer_encoder` here and since it is\na small data-set, try `--hparams_set=transformer_tiny` and train for\nfew steps (e.g., `--train_steps=2000`).\n\n### Speech Recognition\n\nFor speech-to-text, we have these data-sets in T2T:\n\n* Librispeech (US English): `--problem=librispeech` for\n    the whole set and `--problem=librispeech_clean` for a smaller\n    but nicely filtered part.\n\n* Mozilla Common Voice (US English): `--problem=common_voice` for the whole set\n    `--problem=common_voice_clean` for a quality-checked subset.\n\n### Summarization\n\nFor summarizing longer text into shorter one we have these data-sets:\n\n* CNN\u002FDailyMail articles summarized into a few sentences:\n  `--problem=summarize_cnn_dailymail32k`\n\nWe suggest to use `--model=transformer` and\n`--hparams_set=transformer_prepend` for this task.\nThis yields good ROUGE scores.\n\n### Translation\n\nThere are a number of translation data-sets in T2T:\n\n* English-German: `--problem=translate_ende_wmt32k`\n* English-French: `--problem=translate_enfr_wmt32k`\n* English-Czech: `--problem=translate_encs_wmt32k`\n* English-Chinese: `--problem=translate_enzh_wmt32k`\n* English-Vietnamese: `--problem=translate_envi_iwslt32k`\n* English-Spanish: `--problem=translate_enes_wmt32k`\n\nYou can get translations in the other direction by appending `_rev` to\nthe problem name, e.g., for German-English use\n`--problem=translate_ende_wmt32k_rev`\n(note that you still need to download the original data with t2t-datagen\n`--problem=translate_ende_wmt32k`).\n\nFor all translation problems, we suggest to try the Transformer model:\n`--model=transformer`. At first it is best to try the base setting,\n`--hparams_set=transformer_base`. When trained on 8 GPUs for 300K steps\nthis should reach a BLEU score of about 28 on the English-German data-set,\nwhich is close to state-of-the art. If training on a single GPU, try the\n`--hparams_set=transformer_base_single_gpu` setting. For very good results\nor larger data-sets (e.g., for English-French), try the big model\nwith `--hparams_set=transformer_big`.\n\nSee this [example](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Fblob\u002Fmaster\u002Ftensor2tensor\u002Fnotebooks\u002FTransformer_translate.ipynb) to know how the translation works.\n\n## Basics\n\n### Walkthrough\n\nHere's a walkthrough training a good English-to-German translation\nmodel using the Transformer model from [*Attention Is All You\nNeed*](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762) on WMT data.\n\n```\npip install tensor2tensor\n\n# See what problems, models, and hyperparameter sets are available.\n# You can easily swap between them (and add new ones).\nt2t-trainer --registry_help\n\nPROBLEM=translate_ende_wmt32k\nMODEL=transformer\nHPARAMS=transformer_base_single_gpu\n\nDATA_DIR=$HOME\u002Ft2t_data\nTMP_DIR=\u002Ftmp\u002Ft2t_datagen\nTRAIN_DIR=$HOME\u002Ft2t_train\u002F$PROBLEM\u002F$MODEL-$HPARAMS\n\nmkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR\n\n# Generate data\nt2t-datagen \\\n  --data_dir=$DATA_DIR \\\n  --tmp_dir=$TMP_DIR \\\n  --problem=$PROBLEM\n\n# Train\n# *  If you run out of memory, add --hparams='batch_size=1024'.\nt2t-trainer \\\n  --data_dir=$DATA_DIR \\\n  --problem=$PROBLEM \\\n  --model=$MODEL \\\n  --hparams_set=$HPARAMS \\\n  --output_dir=$TRAIN_DIR\n\n# Decode\n\nDECODE_FILE=$DATA_DIR\u002Fdecode_this.txt\necho \"Hello world\" >> $DECODE_FILE\necho \"Goodbye world\" >> $DECODE_FILE\necho -e 'Hallo Welt\\nAuf Wiedersehen Welt' > ref-translation.de\n\nBEAM_SIZE=4\nALPHA=0.6\n\nt2t-decoder \\\n  --data_dir=$DATA_DIR \\\n  --problem=$PROBLEM \\\n  --model=$MODEL \\\n  --hparams_set=$HPARAMS \\\n  --output_dir=$TRAIN_DIR \\\n  --decode_hparams=\"beam_size=$BEAM_SIZE,alpha=$ALPHA\" \\\n  --decode_from_file=$DECODE_FILE \\\n  --decode_to_file=translation.en\n\n# See the translations\ncat translation.en\n\n# Evaluate the BLEU score\n# Note: Report this BLEU score in papers, not the internal approx_bleu metric.\nt2t-bleu --translation=translation.en --reference=ref-translation.de\n```\n\n### Installation\n\n\n```\n# Assumes tensorflow or tensorflow-gpu installed\npip install tensor2tensor\n\n# Installs with tensorflow-gpu requirement\npip install tensor2tensor[tensorflow_gpu]\n\n# Installs with tensorflow (cpu) requirement\npip install tensor2tensor[tensorflow]\n```\n\nBinaries:\n\n```\n# Data generator\nt2t-datagen\n\n# Trainer\nt2t-trainer --registry_help\n```\n\nLibrary usage:\n\n```\npython -c \"from tensor2tensor.models.transformer import Transformer\"\n```\n\n### Features\n\n* Many state of the art and baseline models are built-in and new models can be\n  added easily (open an issue or pull request!).\n* Many datasets across modalities - text, audio, image - available for\n  generation and use, and new ones can be added easily (open an issue or pull\n  request for public datasets!).\n* Models can be used with any dataset and input mode (or even multiple); all\n  modality-specific processing (e.g. embedding lookups for text tokens) is done\n  with `bottom` and `top` transformations, which are specified per-feature in the\n  model.\n* Support for multi-GPU machines and synchronous (1 master, many workers) and\n  asynchronous (independent workers synchronizing through a parameter server)\n  [distributed training](https:\u002F\u002Ftensorflow.github.io\u002Ftensor2tensor\u002Fdistributed_training.html).\n* Easily swap amongst datasets and models by command-line flag with the data\n  generation script `t2t-datagen` and the training script `t2t-trainer`.\n* Train on [Google Cloud ML](https:\u002F\u002Ftensorflow.github.io\u002Ftensor2tensor\u002Fcloud_mlengine.html) and [Cloud TPUs](https:\u002F\u002Ftensorflow.github.io\u002Ftensor2tensor\u002Fcloud_tpu.html).\n\n## T2T overview\n\n### Problems\n\n**Problems** consist of features such as inputs and targets, and metadata such\nas each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem\nfeatures are given by a dataset, which is stored as a `TFRecord` file with\n`tensorflow.Example` protocol buffers. All\nproblems are imported in\n[`all_problems.py`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Fdata_generators\u002Fall_problems.py)\nor are registered with `@registry.register_problem`. Run\n[`t2t-datagen`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Fbin\u002Ft2t-datagen)\nto see the list of available problems and download them.\n\n### Models\n\n**`T2TModel`s** define the core tensor-to-tensor computation. They apply a\ndefault transformation to each input and output so that models may deal with\nmodality-independent tensors (e.g. embeddings at the input; and a linear\ntransform at the output to produce logits for a softmax over classes). All\nmodels are imported in the\n[`models` subpackage](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Fmodels\u002F__init__.py),\ninherit from [`T2TModel`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Futils\u002Ft2t_model.py),\nand are registered with\n[`@registry.register_model`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Futils\u002Fregistry.py).\n\n### Hyperparameter Sets\n\n**Hyperparameter sets** are encoded in\n[`HParams`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Futils\u002Fhparam.py)\nobjects, and are registered with\n[`@registry.register_hparams`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Futils\u002Fregistry.py).\nEvery model and problem has a `HParams`. A basic set of hyperparameters are\ndefined in\n[`common_hparams.py`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Flayers\u002Fcommon_hparams.py)\nand hyperparameter set functions can compose other hyperparameter set functions.\n\n### Trainer\n\nThe **trainer** binary is the entrypoint for training, evaluation, and\ninference. Users can easily switch between problems, models, and hyperparameter\nsets by using the `--model`, `--problem`, and `--hparams_set` flags. Specific\nhyperparameters can be overridden with the `--hparams` flag. `--schedule` and\nrelated flags control local and distributed training\u002Fevaluation\n([distributed training documentation](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Fdocs\u002Fdistributed_training.md)).\n\n## Adding your own components\n\nT2T's components are registered using a central registration mechanism that\nenables easily adding new ones and easily swapping amongst them by command-line\nflag. You can add your own components without editing the T2T codebase by\nspecifying the `--t2t_usr_dir` flag in `t2t-trainer`.\n\nYou can do so for models, hyperparameter sets, modalities, and problems. Please\ndo submit a pull request if your component might be useful to others.\n\nSee the [`example_usr_dir`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Ftest_data\u002Fexample_usr_dir)\nfor an example user directory.\n\n## Adding a dataset\n\nTo add a new dataset, subclass\n[`Problem`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Fdata_generators\u002Fproblem.py)\nand register it with `@registry.register_problem`. See\n[`TranslateEndeWmt8k`](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Fdata_generators\u002Ftranslate_ende.py)\nfor an example. Also see the [data generators\nREADME](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensor2tensor\u002Ftree\u002Fmaster\u002Ftensor2tensor\u002Fdata_generators\u002FREADME.md).\n\n## Run on FloydHub\n\n[![Run on FloydHub](https:\u002F\u002Fstatic.floydhub.com\u002Fbutton\u002Fbutton.svg)](https:\u002F\u002Ffloydhub.com\u002Frun)\n\nClick this button to open a [Workspace](https:\u002F\u002Fblog.floydhub.com\u002Fworkspaces\u002F) on [FloydHub](https:\u002F\u002Fwww.floydhub.com\u002F?utm_medium=readme&utm_source=tensor2tensor&utm_campaign=jul_2018). You can use the workspace to develop and test your code on a fully configured cloud GPU machine.\n\nTensor2Tensor comes preinstalled in the environment, you can simply open a [Terminal](https:\u002F\u002Fdocs.floydhub.com\u002Fguides\u002Fworkspace\u002F#using-terminal) and run your code.\n\n```bash\n# Test the quick-start on a Workspace's Terminal with this command\nt2t-trainer \\\n  --generate_data \\\n  --data_dir=.\u002Ft2t_data \\\n  --output_dir=.\u002Ft2t_train\u002Fmnist \\\n  --problem=image_mnist \\\n  --model=shake_shake \\\n  --hparams_set=shake_shake_quick \\\n  --train_steps=1000 \\\n  --eval_steps=100\n```\n\nNote: Ensure compliance with the FloydHub [Terms of Service](https:\u002F\u002Fwww.floydhub.com\u002Fabout\u002Fterms).\n\n## Papers\n\nWhen referencing Tensor2Tensor, please cite [this\npaper](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.07416).\n\n```\n@article{tensor2tensor,\n  author    = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and\n    Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and\n    \\L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and\n    Noam Shazeer and Jakob Uszkoreit},\n  title     = {Tensor2Tensor for Neural Machine Translation},\n  journal   = {CoRR},\n  volume    = {abs\u002F1803.07416},\n  year      = {2018},\n  url       = {http:\u002F\u002Farxiv.org\u002Fabs\u002F1803.07416},\n}\n```\n\nTensor2Tensor was used to develop a number of state-of-the-art models\nand deep learning methods. Here we list some papers that were based on T2T\nfrom the start and benefited from its features and architecture in ways\ndescribed in the [Google Research Blog post introducing\nT2T](https:\u002F\u002Fresearch.googleblog.com\u002F2017\u002F06\u002Faccelerating-deep-learning-research.html).\n\n* [Attention Is All You Need](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03762)\n* [Depthwise Separable Convolutions for Neural Machine\n   Translation](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.03059)\n* [One Model To Learn Them All](https:\u002F\u002Farxiv.org\u002Fabs\u002F1706.05137)\n* [Discrete Autoencoders for Sequence Models](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.09797)\n* [Generating Wikipedia by Summarizing Long\n   Sequences](https:\u002F\u002Farxiv.org\u002Fabs\u002F1801.10198)\n* [Image Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F1802.05751)\n* [Training Tips for the Transformer Model](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.00247)\n* [Self-Attention with Relative Position Representations](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.02155)\n* [Fast Decoding in Sequence Models using Discrete Latent Variables](https:\u002F\u002Farxiv.org\u002Fabs\u002F1803.03382)\n* [Adafactor: Adaptive Learning Rates with Sublinear Memory Cost](https:\u002F\u002Farxiv.org\u002Fabs\u002F1804.04235)\n* [Universal Transformers](https:\u002F\u002Farxiv.org\u002Fabs\u002F1807.03819)\n* [Attending to Mathematical Language with Transformers](https:\u002F\u002Farxiv.org\u002Fabs\u002F1812.02825)\n* [The Evolved Transformer](https:\u002F\u002Farxiv.org\u002Fabs\u002F1901.11117)\n* [Model-Based Reinforcement Learning for Atari](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.00374)\n* [VideoFlow: A Flow-Based Generative Model for Video](https:\u002F\u002Farxiv.org\u002Fabs\u002F1903.01434)\n\n*NOTE: This is not an official Google product.*\n","Tensor2Tensor是一个旨在简化深度学习并加速机器学习研究的模型和数据集库。它提供了多种预定义的深度学习模型、数据集以及超参数配置，支持包括图像分类、文本生成、机器翻译等在内的多种任务，并且能够利用TPU进行高效训练。尽管该项目已被官方标记为不再积极维护，转而推荐使用其继任者Trax，但Tensor2Tensor依然适用于需要快速搭建基于TensorFlow的深度学习实验环境的研究者与开发者，尤其是在探索新算法或模型时希望减少基础设置工作量的情况下。","2026-06-11 02:50:41","top_language"]