PyTorch 2.2 中文官方教程（六）

本文介绍: 音频音频 I/Opytorch.org/tutorials/beginner/audio_io_tutorial.html此教程已移至pytorch.org/audio/stable/tutorials/audio_io_tutorial.html3 秒后将重定向。音频重采样原文：pytorch.org/tutorials/beginner/audio_resampling_tutorial.html译者：飞龙协议：CC BY-NC-SA 4.0这个教程已经迁移到a new loc

批处理使处理更具并行性。但是，批处理意味着模型独立处理每一列；例如，上面示例中G和F的依赖关系无法学习。

from torchtext.datasets import WikiText2
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

train_iter = WikiText2(split='train')
tokenizer = get_tokenizer('basic_english')
vocab = build_vocab_from_iterator(map(tokenizer, train_iter), specials=['<unk>'])
vocab.set_default_index(vocab['<unk>'])

def data_process(raw_text_iter: dataset.IterableDataset) -> Tensor:
  """Converts raw text into a flat Tensor."""
    data = [torch.tensor(vocab(tokenizer(item)), dtype=torch.long) for item in raw_text_iter]
    return torch.cat(tuple(filter(lambda t: t.numel() > 0, data)))

# ``train_iter`` was "consumed" by the process of building the vocab,
# so we have to create it again
train_iter, val_iter, test_iter = WikiText2()
train_data = data_process(train_iter)
val_data = data_process(val_iter)
test_data = data_process(test_iter)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def batchify(data: Tensor, bsz: int) -> Tensor:
  """Divides the data into ``bsz`` separate sequences, removing extra elements
 that wouldn't cleanly fit.

 Arguments:
 data: Tensor, shape ``[N]``
 bsz: int, batch size

 Returns:
 Tensor of shape ``[N // bsz, bsz]``
 """
    seq_len = data.size(0) // bsz
    data = data[:seq_len * bsz]
    data = data.view(bsz, seq_len).t().contiguous()
    return data.to(device)

batch_size = 20
eval_batch_size = 10
train_data = batchify(train_data, batch_size)  # shape ``[seq_len, batch_size]``
val_data = batchify(val_data, eval_batch_size)
test_data = batchify(test_data, eval_batch_size)

get_batch()为 transformer 模型生成一对输入-目标序列。它将源数据细分为长度为bptt的块。对于语言建模任务，模型需要以下单词作为Target。例如，对于bptt值为 2，我们会得到i=0 时的以下两个变量：

../_images/transformer_input_target.png

值得注意的是，分块沿着维度 0，与 Transformer 模型中的S维度一致。批处理维度N沿着维度 1。

bptt = 35
def get_batch(source: Tensor, i: int) -> Tuple[Tensor, Tensor]:
  """
 Args:
 source: Tensor, shape ``[full_seq_len, batch_size]``
 i: int

 Returns:
 tuple (data, target), where data has shape ``[seq_len, batch_size]`` and
 target has shape ``[seq_len * batch_size]``
 """
    seq_len = min(bptt, len(source) - 1 - i)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len].reshape(-1)
    return data, target

模型超参数如下所定义。vocab大小等于词汇对象的长度。

ntokens = len(vocab)  # size of vocabulary
emsize = 200  # embedding dimension
d_hid = 200  # dimension of the feedforward network model in ``nn.TransformerEncoder``
nlayers = 2  # number of ``nn.TransformerEncoderLayer`` in ``nn.TransformerEncoder``
nhead = 2  # number of heads in ``nn.MultiheadAttention``
dropout = 0.2  # dropout probability
model = TransformerModel(ntokens, emsize, nhead, d_hid, nlayers, dropout).to(device)

/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:286: UserWarning:

enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance)

import time

criterion = nn.CrossEntropyLoss()
lr = 5.0  # learning rate
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.95)

def train(model: nn.Module) -> None:
    model.train()  # turn on train mode
    total_loss = 0.
    log_interval = 200
    start_time = time.time()

    num_batches = len(train_data) // bptt
    for batch, i in enumerate(range(0, train_data.size(0) - 1, bptt)):
        data, targets = get_batch(train_data, i)
        output = model(data)
        output_flat = output.view(-1, ntokens)
        loss = criterion(output_flat, targets)

        optimizer.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
        optimizer.step()

        total_loss += loss.item()
        if batch % log_interval == 0 and batch > 0:
            lr = scheduler.get_last_lr()[0]
            ms_per_batch = (time.time() - start_time) * 1000 / log_interval
            cur_loss = total_loss / log_interval
            ppl = math.exp(cur_loss)
            print(f'| epoch {epoch:3d} | {batch:5d}/{num_batches:5d} batches | '
                  f'lr {lr:02.2f} | ms/batch {ms_per_batch:5.2f} | '
                  f'loss {cur_loss:5.2f} | ppl {ppl:8.2f}')
            total_loss = 0
            start_time = time.time()

def evaluate(model: nn.Module, eval_data: Tensor) -> float:
    model.eval()  # turn on evaluation mode
    total_loss = 0.
    with torch.no_grad():
        for i in range(0, eval_data.size(0) - 1, bptt):
            data, targets = get_batch(eval_data, i)
            seq_len = data.size(0)
            output = model(data)
            output_flat = output.view(-1, ntokens)
            total_loss += seq_len * criterion(output_flat, targets).item()
    return total_loss / (len(eval_data) - 1)

best_val_loss = float('inf')
epochs = 3

with TemporaryDirectory() as tempdir:
    best_model_params_path = os.path.join(tempdir, "best_model_params.pt")

    for epoch in range(1, epochs + 1):
        epoch_start_time = time.time()
        train(model)
        val_loss = evaluate(model, val_data)
        val_ppl = math.exp(val_loss)
        elapsed = time.time() - epoch_start_time
        print('-' * 89)
        print(f'| end of epoch {epoch:3d} | time: {elapsed:5.2f}s | '
            f'valid loss {val_loss:5.2f} | valid ppl {val_ppl:8.2f}')
        print('-' * 89)

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            torch.save(model.state_dict(), best_model_params_path)

        scheduler.step()
    model.load_state_dict(torch.load(best_model_params_path)) # load best model states

| epoch   1 |   200/ 2928 batches | lr 5.00 | ms/batch 31.93 | loss  8.19 | ppl  3613.91
| epoch   1 |   400/ 2928 batches | lr 5.00 | ms/batch 28.57 | loss  6.88 | ppl   970.94
| epoch   1 |   600/ 2928 batches | lr 5.00 | ms/batch 28.31 | loss  6.43 | ppl   621.40
| epoch   1 |   800/ 2928 batches | lr 5.00 | ms/batch 28.48 | loss  6.30 | ppl   542.89
| epoch   1 |  1000/ 2928 batches | lr 5.00 | ms/batch 28.46 | loss  6.18 | ppl   484.73
| epoch   1 |  1200/ 2928 batches | lr 5.00 | ms/batch 28.32 | loss  6.15 | ppl   467.52
| epoch   1 |  1400/ 2928 batches | lr 5.00 | ms/batch 28.53 | loss  6.11 | ppl   450.65
| epoch   1 |  1600/ 2928 batches | lr 5.00 | ms/batch 28.45 | loss  6.11 | ppl   450.73
| epoch   1 |  1800/ 2928 batches | lr 5.00 | ms/batch 28.43 | loss  6.02 | ppl   410.39
| epoch   1 |  2000/ 2928 batches | lr 5.00 | ms/batch 28.56 | loss  6.01 | ppl   409.43
| epoch   1 |  2200/ 2928 batches | lr 5.00 | ms/batch 28.47 | loss  5.89 | ppl   361.18
| epoch   1 |  2400/ 2928 batches | lr 5.00 | ms/batch 28.57 | loss  5.97 | ppl   393.23
| epoch   1 |  2600/ 2928 batches | lr 5.00 | ms/batch 28.53 | loss  5.95 | ppl   383.85
| epoch   1 |  2800/ 2928 batches | lr 5.00 | ms/batch 28.59 | loss  5.88 | ppl   357.86
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 87.20s | valid loss  5.78 | valid ppl   324.74
-----------------------------------------------------------------------------------------
| epoch   2 |   200/ 2928 batches | lr 4.75 | ms/batch 28.74 | loss  5.86 | ppl   349.96
| epoch   2 |   400/ 2928 batches | lr 4.75 | ms/batch 28.60 | loss  5.85 | ppl   348.22
| epoch   2 |   600/ 2928 batches | lr 4.75 | ms/batch 28.49 | loss  5.66 | ppl   286.86
| epoch   2 |   800/ 2928 batches | lr 4.75 | ms/batch 28.39 | loss  5.70 | ppl   297.60
| epoch   2 |  1000/ 2928 batches | lr 4.75 | ms/batch 28.55 | loss  5.64 | ppl   282.01
| epoch   2 |  1200/ 2928 batches | lr 4.75 | ms/batch 28.56 | loss  5.67 | ppl   290.49
| epoch   2 |  1400/ 2928 batches | lr 4.75 | ms/batch 28.58 | loss  5.68 | ppl   292.36
| epoch   2 |  1600/ 2928 batches | lr 4.75 | ms/batch 28.64 | loss  5.70 | ppl   299.93
| epoch   2 |  1800/ 2928 batches | lr 4.75 | ms/batch 28.58 | loss  5.64 | ppl   282.54
| epoch   2 |  2000/ 2928 batches | lr 4.75 | ms/batch 28.50 | loss  5.66 | ppl   288.23
| epoch   2 |  2200/ 2928 batches | lr 4.75 | ms/batch 28.46 | loss  5.54 | ppl   254.44
| epoch   2 |  2400/ 2928 batches | lr 4.75 | ms/batch 28.58 | loss  5.65 | ppl   282.92
| epoch   2 |  2600/ 2928 batches | lr 4.75 | ms/batch 28.64 | loss  5.64 | ppl   282.54
| epoch   2 |  2800/ 2928 batches | lr 4.75 | ms/batch 28.57 | loss  5.58 | ppl   263.76
-----------------------------------------------------------------------------------------
| end of epoch   2 | time: 86.73s | valid loss  5.65 | valid ppl   282.95
-----------------------------------------------------------------------------------------
| epoch   3 |   200/ 2928 batches | lr 4.51 | ms/batch 28.69 | loss  5.60 | ppl   270.97
| epoch   3 |   400/ 2928 batches | lr 4.51 | ms/batch 28.55 | loss  5.62 | ppl   276.79
| epoch   3 |   600/ 2928 batches | lr 4.51 | ms/batch 28.65 | loss  5.42 | ppl   226.33
| epoch   3 |   800/ 2928 batches | lr 4.51 | ms/batch 28.59 | loss  5.48 | ppl   239.30
| epoch   3 |  1000/ 2928 batches | lr 4.51 | ms/batch 28.51 | loss  5.44 | ppl   229.71
| epoch   3 |  1200/ 2928 batches | lr 4.51 | ms/batch 28.55 | loss  5.48 | ppl   238.78
| epoch   3 |  1400/ 2928 batches | lr 4.51 | ms/batch 28.51 | loss  5.50 | ppl   243.54
| epoch   3 |  1600/ 2928 batches | lr 4.51 | ms/batch 28.56 | loss  5.52 | ppl   248.47
| epoch   3 |  1800/ 2928 batches | lr 4.51 | ms/batch 28.44 | loss  5.46 | ppl   235.26
| epoch   3 |  2000/ 2928 batches | lr 4.51 | ms/batch 28.37 | loss  5.48 | ppl   240.24
| epoch   3 |  2200/ 2928 batches | lr 4.51 | ms/batch 28.43 | loss  5.38 | ppl   217.29
| epoch   3 |  2400/ 2928 batches | lr 4.51 | ms/batch 28.44 | loss  5.47 | ppl   236.64
| epoch   3 |  2600/ 2928 batches | lr 4.51 | ms/batch 28.46 | loss  5.47 | ppl   237.76
| epoch   3 |  2800/ 2928 batches | lr 4.51 | ms/batch 28.49 | loss  5.40 | ppl   220.67
-----------------------------------------------------------------------------------------
| end of epoch   3 | time: 86.51s | valid loss  5.61 | valid ppl   273.90
-----------------------------------------------------------------------------------------

test_loss = evaluate(model, test_data)
test_ppl = math.exp(test_loss)
print('=' * 89)
print(f'| End of training | test loss {test_loss:5.2f} | '
      f'test ppl {test_ppl:8.2f}')
print('=' * 89)

=========================================================================================
| End of training | test loss  5.52 | test ppl   249.27
=========================================================================================

下载 Python 源代码：transformer_tutorial.py

下载 Jupyter 笔记本：transformer_tutorial.ipynb

本教程将 Better Transformer（BT）作为 PyTorch 1.12 版本的一部分进行介绍。在本教程中，我们展示了如何使用 Better Transformer 进行 torchtext 的生产推理。Better Transformer 是一个生产就绪的快速路径，可加速在 CPU 和 GPU 上部署具有高性能的 Transformer 模型。快速路径功能对基于 PyTorch 核心nn.module或 torchtext 的模型透明地工作。

可以通过 Better Transformer 快速路径执行加速的模型是使用以下 PyTorch 核心torch.nn.module类TransformerEncoder、TransformerEncoderLayer和MultiHeadAttention的模型。此外，torchtext 已更新为使用核心库模块以从快速路径加速中受益。（未来可能会启用其他模块以进行快速路径执行。）

import torch
import torch.nn as nn

print(f"torch version: {torch.__version__}")

DEVICE = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

print(f"torch cuda available: {torch.cuda.is_available()}")

import torch, torchtext
from torchtext.models import RobertaClassificationHead
from torchtext.functional import to_tensor
xlmr_large = torchtext.models.XLMR_LARGE_ENCODER
classifier_head = torchtext.models.RobertaClassificationHead(num_classes=2, input_dim = 1024)
model = xlmr_large.get_model(head=classifier_head)
transform = xlmr_large.transform()

small_input_batch = [
               "Hello world",
               "How are you!"
]
big_input_batch = [
               "Hello world",
               "How are you!",
  """`Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
if you still try to defend the infamies and horrors perpetrated by
that Antichrist- I really believe he is Antichrist- I will have
nothing more to do with you and you are no longer my friend, no longer
my 'faithful slave,' as you call yourself! But how do you do? I see
I have frightened you- sit down and tell me all the news.`

It was in July, 1805, and the speaker was the well-known Anna
Pavlovna Scherer, maid of honor and favorite of the Empress Marya
Fedorovna. With these words she greeted Prince Vasili Kuragin, a man
of high rank and importance, who was the first to arrive at her
reception. Anna Pavlovna had had a cough for some days. She was, as
she said, suffering from la grippe; grippe being then a new word in
St. Petersburg, used only by the elite."""
]

input_batch=big_input_batch

model_input = to_tensor(transform(input_batch), padding_value=1)
output = model(model_input)
output.shape

ITERATIONS=10

print("slow path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=False) as prof:
  for i in range(ITERATIONS):
    output = model(model_input)
print(prof)

model.eval()

print("fast path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=False) as prof:
  with torch.no_grad():
    for i in range(ITERATIONS):
      output = model(model_input)
print(prof)

model.encoder.transformer.layers.enable_nested_tensor

model.encoder.transformer.layers.enable_nested_tensor=False

model.to(DEVICE)
model_input = model_input.to(DEVICE)

print("slow path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=True) as prof:
  for i in range(ITERATIONS):
    output = model(model_input)
print(prof)

model.eval()

print("fast path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=True) as prof:
  with torch.no_grad():
    for i in range(ITERATIONS):
      output = model(model_input)
print(prof)

model.encoder.transformer.layers.enable_nested_tensor = True

model.to(DEVICE)
model_input = model_input.to(DEVICE)

print("slow path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=True) as prof:
  for i in range(ITERATIONS):
    output = model(model_input)
print(prof)

model.eval()

print("fast path:")
print("==========")
with torch.autograd.profiler.profile(use_cuda=True) as prof:
  with torch.no_grad():
    for i in range(ITERATIONS):
      output = model(model_input)
print(prof)

$  python  predict.py  Hinton
(-0.47)  Scottish
(-1.52)  English
(-3.57)  Irish

$  python  predict.py  Schmidhuber
(-0.19)  German
(-2.48)  Czech
(-2.68)  Dutch

data/names目录中包含 18 个名为[Language].txt的文本文件。每个文件包含一堆名称，每行一个名称，大多数是罗马化的（但我们仍然需要从 Unicode 转换为 ASCII）。

我们最终会得到一个字典，其中包含每种语言的名称列表，{language: [names ...]}。通用变量“category”和“line”（在我们的案例中用于语言和名称）用于以后的可扩展性。

from io import open
import glob
import os

def findFiles(path): return glob.glob(path)

print(findFiles('data/names/*.txt'))

import unicodedata
import string

all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)

# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

print(unicodeToAscii('Ślusàrski'))

# Build the category_lines dictionary, a list of names per language
category_lines = {}
all_categories = []

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('n')
    return [unicodeToAscii(line) for line in lines]

for filename in findFiles('data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

n_categories = len(all_categories)

['data/names/Arabic.txt', 'data/names/Chinese.txt', 'data/names/Czech.txt', 'data/names/Dutch.txt', 'data/names/English.txt', 'data/names/French.txt', 'data/names/German.txt', 'data/names/Greek.txt', 'data/names/Irish.txt', 'data/names/Italian.txt', 'data/names/Japanese.txt', 'data/names/Korean.txt', 'data/names/Polish.txt', 'data/names/Portuguese.txt', 'data/names/Russian.txt', 'data/names/Scottish.txt', 'data/names/Spanish.txt', 'data/names/Vietnamese.txt']
Slusarski

现在我们有category_lines，一个将每个类别（语言）映射到一系列行（名称）的字典。我们还跟踪了all_categories（只是一个语言列表）和n_categories以供以后参考。

print(category_lines['Italian'][:5])

['Abandonato', 'Abatangelo', 'Abatantuono', 'Abate', 'Abategiovanni']

为了表示单个字母，我们使用大小为<1 x n_letters>的“one-hot 向量”。一个 one-hot 向量除了当前字母的索引处为 1 之外，其他位置都填充为 0，例如，"b" = <0 1 0 0 0 ...>。

为了构成一个单词，我们将其中的一堆连接成一个 2D 矩阵<line_length x 1 x n_letters>。

import torch

# Find letter index from all_letters, e.g. "a" = 0
def letterToIndex(letter):
    return all_letters.find(letter)

# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letterToTensor(letter):
    tensor = torch.zeros(1, n_letters)
    tensor[0][letterToIndex(letter)] = 1
    return tensor

# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def lineToTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li, letter in enumerate(line):
        tensor[li][0][letterToIndex(letter)] = 1
    return tensor

print(letterToTensor('J'))

print(lineToTensor('Jones').size())

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0.]])
torch.Size([5, 1, 57])

这个 RNN 模块（主要是从PyTorch for Torch 用户教程中复制的）只是在输入和隐藏状态上操作的 2 个线性层，输出后是一个LogSoftmax层。

import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.h2o(hidden)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

n_hidden = 128
rnn = RNN(n_letters, n_hidden, n_categories)

input = letterToTensor('A')
hidden = torch.zeros(1, n_hidden)

output, next_hidden = rnn(input, hidden)

为了提高效率，我们不希望为每一步创建一个新的张量，因此我们将使用lineToTensor代替letterToTensor并使用切片。这可以通过预先计算批量张量来进一步优化。

input = lineToTensor('Albert')
hidden = torch.zeros(1, n_hidden)

output, next_hidden = rnn(input[0], hidden)
print(output)

tensor([[-2.9083, -2.9270, -2.9167, -2.9590, -2.9108, -2.8332, -2.8906, -2.8325,
         -2.8521, -2.9279, -2.8452, -2.8754, -2.8565, -2.9733, -2.9201, -2.8233,
         -2.9298, -2.8624]], grad_fn=<LogSoftmaxBackward0>)

如您所见，输出是一个<1 x n_categories>张量，其中每个项目都是该类别的可能性（可能性越高，越可能）。

在进行训练之前，我们应该编写一些辅助函数。第一个是解释网络输出的函数，我们知道它是每个类别的可能性。我们可以使用Tensor.topk来获取最大值的索引：

def categoryFromOutput(output):
    top_n, top_i = output.topk(1)
    category_i = top_i[0].item()
    return all_categories[category_i], category_i

print(categoryFromOutput(output))

('Scottish', 15)

import random

def randomChoice(l):
    return l[random.randint(0, len(l) - 1)]

def randomTrainingExample():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
    line_tensor = lineToTensor(line)
    return category, line, category_tensor, line_tensor

for i in range(10):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    print('category =', category, '/ line =', line)

category = Chinese / line = Hou
category = Scottish / line = Mckay
category = Arabic / line = Cham
category = Russian / line = V'Yurkov
category = Irish / line = O'Keeffe
category = French / line = Belrose
category = Spanish / line = Silva
category = Japanese / line = Fuchida
category = Greek / line = Tsahalis
category = Korean / line = Chang

对于损失函数，nn.NLLLoss是合适的，因为 RNN 的最后一层是nn.LogSoftmax。

criterion = nn.NLLLoss()

learning_rate = 0.005 # If you set this too high, it might explode. If too low, it might not learn

def train(category_tensor, line_tensor):
    hidden = rnn.initHidden()

    rnn.zero_grad()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    # Add parameters' gradients to their values, multiplied by learning rate
    for p in rnn.parameters():
        p.data.add_(p.grad.data, alpha=-learning_rate)

    return output, loss.item()

现在我们只需运行一堆示例。由于train函数返回输出和损失，我们可以打印其猜测并跟踪损失以绘图。由于有成千上万的示例，我们仅打印每print_every个示例，并计算损失的平均值。

import time
import math

n_iters = 100000
print_every = 5000
plot_every = 1000

# Keep track of losses for plotting
current_loss = 0
all_losses = []

def timeSince(since):
    now = time.time()
    s = now - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

start = time.time()

for iter in range(1, n_iters + 1):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output, loss = train(category_tensor, line_tensor)
    current_loss += loss

    # Print ``iter`` number, loss, name and guess
    if iter % print_every == 0:
        guess, guess_i = categoryFromOutput(output)
        correct = '✓' if guess == category else '✗ (%s)' % category
        print('%d  %d%% (%s) %.4f  %s / %s  %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))

    # Add current loss avg to list of losses
    if iter % plot_every == 0:
        all_losses.append(current_loss / plot_every)
        current_loss = 0

5000 5% (0m 29s) 2.6379 Horigome / Japanese ✓
10000 10% (0m 58s) 2.0172 Miazga / Japanese ✗ (Polish)
15000 15% (1m 29s) 0.2680 Yukhvidov / Russian ✓
20000 20% (1m 58s) 1.8239 Mclaughlin / Irish ✗ (Scottish)
25000 25% (2m 29s) 0.6978 Banh / Vietnamese ✓
30000 30% (2m 58s) 1.7433 Machado / Japanese ✗ (Portuguese)
35000 35% (3m 28s) 0.0340 Fotopoulos / Greek ✓
40000 40% (3m 58s) 1.4637 Quirke / Irish ✓
45000 45% (4m 28s) 1.9018 Reier / French ✗ (German)
50000 50% (4m 57s) 0.9174 Hou / Chinese ✓
55000 55% (5m 27s) 1.0506 Duan / Vietnamese ✗ (Chinese)
60000 60% (5m 57s) 0.9617 Giang / Vietnamese ✓
65000 65% (6m 27s) 2.4557 Cober / German ✗ (Czech)
70000 70% (6m 57s) 0.8502 Mateus / Portuguese ✓
75000 75% (7m 26s) 0.2750 Hamilton / Scottish ✓
80000 80% (7m 56s) 0.7515 Maessen / Dutch ✓
85000 85% (8m 26s) 0.0912 Gan / Chinese ✓
90000 90% (8m 55s) 0.1190 Bellomi / Italian ✓
95000 95% (9m 25s) 0.0137 Vozgov / Russian ✓
100000 100% (9m 55s) 0.7808 Tong / Vietnamese ✓

绘制all_losses中的历史损失可以显示网络的学习情况：

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

plt.figure()
plt.plot(all_losses)

[<matplotlib.lines.Line2D object at 0x7f4d28129de0>]

为了查看网络在不同类别上的表现如何，我们将创建一个混淆矩阵，指示对于每种实际语言（行），网络猜测的是哪种语言（列）。为了计算混淆矩阵，一堆样本通过网络运行evaluate()，这与train()相同，但没有反向传播。

# Keep track of correct guesses in a confusion matrix
confusion = torch.zeros(n_categories, n_categories)
n_confusion = 10000

# Just return an output given a line
def evaluate(line_tensor):
    hidden = rnn.initHidden()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    return output

# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
    category, line, category_tensor, line_tensor = randomTrainingExample()
    output = evaluate(line_tensor)
    guess, guess_i = categoryFromOutput(output)
    category_i = all_categories.index(category)
    confusion[category_i][guess_i] += 1

# Normalize by dividing every row by its sum
for i in range(n_categories):
    confusion[i] = confusion[i] / confusion[i].sum()

# Set up plot
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(confusion.numpy())
fig.colorbar(cax)

# Set up axes
ax.set_xticklabels([''] + all_categories, rotation=90)
ax.set_yticklabels([''] + all_categories)

# Force label at every tick
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# sphinx_gallery_thumbnail_number = 2
plt.show()

/var/lib/jenkins/workspace/intermediate_source/char_rnn_classification_tutorial.py:445: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

/var/lib/jenkins/workspace/intermediate_source/char_rnn_classification_tutorial.py:446: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

def predict(input_line, n_predictions=3):
    print('n> %s' % input_line)
    with torch.no_grad():
        output = evaluate(lineToTensor(input_line))

        # Get top N categories
        topv, topi = output.topk(n_predictions, 1, True)
        predictions = []

        for i in range(n_predictions):
            value = topv[0][i].item()
            category_index = topi[0][i].item()
            print('(%.2f) %s' % (value, all_categories[category_index]))
            predictions.append([value, all_categories[category_index]])

predict('Dovesky')
predict('Jackson')
predict('Satoshi')

> Dovesky
(-0.57) Czech
(-0.97) Russian
(-3.43) English

> Jackson
(-1.02) Scottish
(-1.49) Russian
(-1.96) English

> Satoshi
(-0.42) Japanese
(-1.70) Polish
(-2.74) Italian

运行train.py以训练并保存网络。

运行predict.py并输入一个名称以查看预测：

$  python  predict.py  Hazaki
(-0.42)  Japanese
(-1.39)  Polish
(-3.51)  Czech

运行server.py并访问localhost:5533/Yourname以获取预测的 JSON 输出。

下载 Python 源代码：char_rnn_classification_tutorial.py

下载 Jupyter 笔记本：char_rnn_classification_tutorial.ipynb

>  python  sample.py  Russian  RUS
Rovakov
Uantov
Shavakov

>  python  sample.py  German  GER
Gerren
Ereng
Rosher

>  python  sample.py  Spanish  SPA
Salla
Parer
Allan

>  python  sample.py  Chinese  CHI
Chan
Hang
Iun

有关此过程的更多详细信息，请参阅上一个教程。简而言之，有一堆纯文本文件data/names/[Language].txt，每行一个名字。我们将行拆分为数组，将 Unicode 转换为 ASCII，最终得到一个字典{language: [names ...]}。

from io import open
import glob
import os
import unicodedata
import string

all_letters = string.ascii_letters + " .,;'-"
n_letters = len(all_letters) + 1 # Plus EOS marker

def findFiles(path): return glob.glob(path)

# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# Read a file and split into lines
def readLines(filename):
    with open(filename, encoding='utf-8') as some_file:
        return [unicodeToAscii(line.strip()) for line in some_file]

# Build the category_lines dictionary, a list of lines per category
category_lines = {}
all_categories = []
for filename in findFiles('data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines[category] = lines

n_categories = len(all_categories)

if n_categories == 0:
    raise RuntimeError('Data not found. Make sure that you downloaded data '
        'from https://download.pytorch.org/tutorial/data.zip and extract it to '
        'the current directory.')

print('# categories:', n_categories, all_categories)
print(unicodeToAscii("O'Néàl"))

# categories: 18 ['Arabic', 'Chinese', 'Czech', 'Dutch', 'English', 'French', 'German', 'Greek', 'Irish', 'Italian', 'Japanese', 'Korean', 'Polish', 'Portuguese', 'Russian', 'Scottish', 'Spanish', 'Vietnamese']
O'Neal

我添加了第二个线性层o2o（在隐藏和输出合并后）以增加其处理能力。还有一个 dropout 层，它会随机将其输入的部分置零以给定的概率（这里是 0.1），通常用于模糊输入以防止过拟合。在网络末尾使用它是为了故意增加一些混乱并增加采样的多样性。

import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size

        self.i2h = nn.Linear(n_categories + input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(n_categories + input_size + hidden_size, output_size)
        self.o2o = nn.Linear(hidden_size + output_size, output_size)
        self.dropout = nn.Dropout(0.1)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, category, input, hidden):
        input_combined = torch.cat((category, input, hidden), 1)
        hidden = self.i2h(input_combined)
        output = self.i2o(input_combined)
        output_combined = torch.cat((hidden, output), 1)
        output = self.o2o(output_combined)
        output = self.dropout(output)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, self.hidden_size)

import random

# Random item from a list
def randomChoice(l):
    return l[random.randint(0, len(l) - 1)]

# Get a random category and random line from that category
def randomTrainingPair():
    category = randomChoice(all_categories)
    line = randomChoice(category_lines[category])
    return category, line

对于每个时间步（即训练单词中的每个字母），网络的输入将是(类别，当前字母，隐藏状态)，输出将是(下一个字母，下一个隐藏状态)。因此，对于每个训练集，我们需要类别、一组输入字母和一组输出/目标字母。

由于我们正在预测每个时间步的下一个字母，所以字母对是来自行中连续字母的组 – 例如对于"ABCD<EOS>"，我们将创建(“A”, “B”), (“B”, “C”), (“C”, “D”), (“D”, “EOS”)。

类别张量是一个大小为<1 x n_categories>的独热张量。在训练时，我们在每个时间步将其馈送到网络中 – 这是一个设计选择，它可以作为初始隐藏状态的一部分或其他策略的一部分。

# One-hot vector for category
def categoryTensor(category):
    li = all_categories.index(category)
    tensor = torch.zeros(1, n_categories)
    tensor[0][li] = 1
    return tensor

# One-hot matrix of first to last letters (not including EOS) for input
def inputTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li in range(len(line)):
        letter = line[li]
        tensor[li][0][all_letters.find(letter)] = 1
    return tensor

# ``LongTensor`` of second letter to end (EOS) for target
def targetTensor(line):
    letter_indexes = [all_letters.find(line[li]) for li in range(1, len(line))]
    letter_indexes.append(n_letters - 1) # EOS
    return torch.LongTensor(letter_indexes)

为了方便训练，我们将创建一个randomTrainingExample函数，该函数获取一个随机的（类别，行）对，并将它们转换为所需的（类别，输入，目标）张量。

# Make category, input, and target tensors from a random category, line pair
def randomTrainingExample():
    category, line = randomTrainingPair()
    category_tensor = categoryTensor(category)
    input_line_tensor = inputTensor(line)
    target_line_tensor = targetTensor(line)
    return category_tensor, input_line_tensor, target_line_tensor

criterion = nn.NLLLoss()

learning_rate = 0.0005

def train(category_tensor, input_line_tensor, target_line_tensor):
    target_line_tensor.unsqueeze_(-1)
    hidden = rnn.initHidden()

    rnn.zero_grad()

    loss = torch.Tensor([0]) # you can also just simply use ``loss = 0``

    for i in range(input_line_tensor.size(0)):
        output, hidden = rnn(category_tensor, input_line_tensor[i], hidden)
        l = criterion(output, target_line_tensor[i])
        loss += l

    loss.backward()

    for p in rnn.parameters():
        p.data.add_(p.grad.data, alpha=-learning_rate)

    return output, loss.item() / input_line_tensor.size(0)

为了跟踪训练需要多长时间，我添加了一个timeSince(timestamp)函数，它返回一个可读的字符串：

import time
import math

def timeSince(since):
    now = time.time()
    s = now - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

训练就像往常一样 – 多次调用 train 并等待几分钟，每print_every个示例打印当前时间和损失，并在all_losses中保留每plot_every个示例的平均损失以供稍后绘图。

rnn = RNN(n_letters, 128, n_letters)

n_iters = 100000
print_every = 5000
plot_every = 500
all_losses = []
total_loss = 0 # Reset every ``plot_every`` ``iters``

start = time.time()

for iter in range(1, n_iters + 1):
    output, loss = train(*randomTrainingExample())
    total_loss += loss

    if iter % print_every == 0:
        print('%s (%d  %d%%) %.4f' % (timeSince(start), iter, iter / n_iters * 100, loss))

    if iter % plot_every == 0:
        all_losses.append(total_loss / plot_every)
        total_loss = 0

0m 9s (5000 5%) 3.1506
0m 18s (10000 10%) 2.5070
0m 28s (15000 15%) 3.3047
0m 37s (20000 20%) 2.4247
0m 47s (25000 25%) 2.6406
0m 56s (30000 30%) 2.0266
1m 5s (35000 35%) 2.6520
1m 15s (40000 40%) 2.4261
1m 24s (45000 45%) 2.2302
1m 33s (50000 50%) 1.6496
1m 43s (55000 55%) 2.7101
1m 52s (60000 60%) 2.5396
2m 1s (65000 65%) 2.5978
2m 11s (70000 70%) 1.6029
2m 20s (75000 75%) 0.9634
2m 29s (80000 80%) 3.0950
2m 39s (85000 85%) 2.0512
2m 48s (90000 90%) 2.5302
2m 57s (95000 95%) 3.2365
3m 7s (100000 100%) 1.7113

从all_losses中绘制历史损失显示网络学习：

import matplotlib.pyplot as plt

plt.figure()
plt.plot(all_losses)

[<matplotlib.lines.Line2D object at 0x7fcf36ff04c0>]

max_length = 20

# Sample from a category and starting letter
def sample(category, start_letter='A'):
    with torch.no_grad():  # no need to track history in sampling
        category_tensor = categoryTensor(category)
        input = inputTensor(start_letter)
        hidden = rnn.initHidden()

        output_name = start_letter

        for i in range(max_length):
            output, hidden = rnn(category_tensor, input[0], hidden)
            topv, topi = output.topk(1)
            topi = topi[0][0]
            if topi == n_letters - 1:
                break
            else:
                letter = all_letters[topi]
                output_name += letter
            input = inputTensor(letter)

        return output_name

# Get multiple samples from one category and multiple starting letters
def samples(category, start_letters='ABC'):
    for start_letter in start_letters:
        print(sample(category, start_letter))

samples('Russian', 'RUS')

samples('German', 'GER')

samples('Spanish', 'SPA')

samples('Chinese', 'CHI')

Rovaki
Uarinovev
Shinan
Gerter
Eeren
Roune
Santera
Paneraz
Allan
Chin
Han
Ion

下载 Python 源代码：char_rnn_generation_tutorial.py

下载 Jupyter 笔记本：char_rnn_generation_tutorial.ipynb

[KEY:  >  input,  =  target,  <  output]

>  il  est  en  train  de  peindre  un  tableau  .
=  he  is  painting  a  picture  .
<  he  is  painting  a  picture  .

>  pourquoi  ne  pas  essayer  ce  vin  delicieux  ?
=  why  not  try  that  delicious  wine  ?
<  why  not  try  that  delicious  wine  ?

>  elle  n  est  pas  poete  mais  romanciere  .
=  she  is  not  a  poet  but  a  novelist  .
<  she  not  not  a  poet  but  a  novelist  .

>  vous  etes  trop  maigre  .
=  you  re  too  skinny  .
<  you  re  all  alone  .

from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

import numpy as np
from torch.utils.data import TensorDataset, DataLoader, RandomSampler

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

英语到法语的翻译对太大，无法包含在存储库中，请在继续之前下载到data/eng-fra.txt。该文件是一个制表符分隔的翻译对列表：

I am cold.    J'ai froid.

我们将需要每个单词的唯一索引，以便稍后用作网络的输入和目标。为了跟踪所有这些，我们将使用一个名为Lang的辅助类，其中包含单词→索引（word2index）和索引→单词（index2word）字典，以及每个单词的计数word2count，稍后将用于替换稀有单词。

SOS_token = 0
EOS_token = 1

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

# Turn a Unicode string to plain ASCII, thanks to
# https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" 1", s)
    s = re.sub(r"[^a-zA-Z!?]+", r" ", s)
    return s.strip()

为了读取数据文件，我们将文件拆分成行，然后将行拆分成对。所有文件都是英语→其他语言，因此如果我们想要从其他语言→英语翻译，我添加了reverse标志以反转对。

def readLangs(lang1, lang2, reverse=False):
    print("Reading lines...")

    # Read the file and split into lines
    lines = open('data/%s-%s.txt' % (lang1, lang2), encoding='utf-8').
        read().strip().split('n')

    # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('t')] for l in lines]

    # Reverse pairs, make Lang instances
    if reverse:
        pairs = [list(reversed(p)) for p in pairs]
        input_lang = Lang(lang2)
        output_lang = Lang(lang1)
    else:
        input_lang = Lang(lang1)
        output_lang = Lang(lang2)

    return input_lang, output_lang, pairs

MAX_LENGTH = 10

eng_prefixes = (
    "i am ", "i m ",
    "he is", "he s ",
    "she is", "she s ",
    "you are", "you re ",
    "we are", "we re ",
    "they are", "they re "
)

def filterPair(p):
    return len(p[0].split(' ')) < MAX_LENGTH and 
        len(p[1].split(' ')) < MAX_LENGTH and 
        p[1].startswith(eng_prefixes)

def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

def prepareData(lang1, lang2, reverse=False):
    input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
    print("Read %s sentence pairs" % len(pairs))
    pairs = filterPairs(pairs)
    print("Trimmed to %s sentence pairs" % len(pairs))
    print("Counting words...")
    for pair in pairs:
        input_lang.addSentence(pair[0])
        output_lang.addSentence(pair[1])
    print("Counted words:")
    print(input_lang.name, input_lang.n_words)
    print(output_lang.name, output_lang.n_words)
    return input_lang, output_lang, pairs

input_lang, output_lang, pairs = prepareData('eng', 'fra', True)
print(random.choice(pairs))

Reading lines...
Read 135842 sentence pairs
Trimmed to 11445 sentence pairs
Counting words...
Counted words:
fra 4601
eng 2991
['tu preches une convaincue', 'you re preaching to the choir']

考虑句子Je ne suis pas le chat noir → I am not the black cat。输入句子中的大多数单词在输出句子中有直接的翻译，但顺序略有不同，例如chat noir和black cat。由于ne/pas结构，在输入句子中还有一个单词。直接从输入单词序列中产生正确的翻译将会很困难。

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size, dropout_p=0.1):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)
        self.dropout = nn.Dropout(dropout_p)

    def forward(self, input):
        embedded = self.dropout(self.embedding(input))
        output, hidden = self.gru(embedded)
        return output, hidden

在解码的每一步，解码器都会得到一个输入标记和隐藏状态。初始输入标记是起始字符串<SOS>标记，第一个隐藏状态是上下文向量（编码器的最后一个隐藏状态）。

class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(DecoderRNN, self).__init__()
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True)
        self.out = nn.Linear(hidden_size, output_size)

    def forward(self, encoder_outputs, encoder_hidden, target_tensor=None):
        batch_size = encoder_outputs.size(0)
        decoder_input = torch.empty(batch_size, 1, dtype=torch.long, device=device).fill_(SOS_token)
        decoder_hidden = encoder_hidden
        decoder_outputs = []

        for i in range(MAX_LENGTH):
            decoder_output, decoder_hidden  = self.forward_step(decoder_input, decoder_hidden)
            decoder_outputs.append(decoder_output)

            if target_tensor is not None:
                # Teacher forcing: Feed the target as the next input
                decoder_input = target_tensor[:, i].unsqueeze(1) # Teacher forcing
            else:
                # Without teacher forcing: use its own predictions as the next input
                _, topi = decoder_output.topk(1)
                decoder_input = topi.squeeze(-1).detach()  # detach from history as input

        decoder_outputs = torch.cat(decoder_outputs, dim=1)
        decoder_outputs = F.log_softmax(decoder_outputs, dim=-1)
        return decoder_outputs, decoder_hidden, None # We return `None` for consistency in the training loop

    def forward_step(self, input, hidden):
        output = self.embedding(input)
        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.out(output)
        return output, hidden

注意力允许解码器网络在每一步解码器自身输出的不同部分上“聚焦”编码器的输出。首先我们计算一组注意力权重。这些将与编码器输出向量相乘，以创建加权组合。结果（代码中称为attn_applied）应该包含关于输入序列的特定部分的信息，从而帮助解码器选择正确的输出单词。

计算注意力权重是通过另一个前馈层attn完成的，使用解码器的输入和隐藏状态作为输入。由于训练数据中存在各种大小的句子，为了实际创建和训练这一层，我们必须选择一个最大句子长度（输入长度，用于编码器输出）来应用。最大长度的句子将使用所有的注意力权重，而较短的句子将只使用前几个。

class BahdanauAttention(nn.Module):
    def __init__(self, hidden_size):
        super(BahdanauAttention, self).__init__()
        self.Wa = nn.Linear(hidden_size, hidden_size)
        self.Ua = nn.Linear(hidden_size, hidden_size)
        self.Va = nn.Linear(hidden_size, 1)

    def forward(self, query, keys):
        scores = self.Va(torch.tanh(self.Wa(query) + self.Ua(keys)))
        scores = scores.squeeze(2).unsqueeze(1)

        weights = F.softmax(scores, dim=-1)
        context = torch.bmm(weights, keys)

        return context, weights

class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1):
        super(AttnDecoderRNN, self).__init__()
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.attention = BahdanauAttention(hidden_size)
        self.gru = nn.GRU(2 * hidden_size, hidden_size, batch_first=True)
        self.out = nn.Linear(hidden_size, output_size)
        self.dropout = nn.Dropout(dropout_p)

    def forward(self, encoder_outputs, encoder_hidden, target_tensor=None):
        batch_size = encoder_outputs.size(0)
        decoder_input = torch.empty(batch_size, 1, dtype=torch.long, device=device).fill_(SOS_token)
        decoder_hidden = encoder_hidden
        decoder_outputs = []
        attentions = []

        for i in range(MAX_LENGTH):
            decoder_output, decoder_hidden, attn_weights = self.forward_step(
                decoder_input, decoder_hidden, encoder_outputs
            )
            decoder_outputs.append(decoder_output)
            attentions.append(attn_weights)

            if target_tensor is not None:
                # Teacher forcing: Feed the target as the next input
                decoder_input = target_tensor[:, i].unsqueeze(1) # Teacher forcing
            else:
                # Without teacher forcing: use its own predictions as the next input
                _, topi = decoder_output.topk(1)
                decoder_input = topi.squeeze(-1).detach()  # detach from history as input

        decoder_outputs = torch.cat(decoder_outputs, dim=1)
        decoder_outputs = F.log_softmax(decoder_outputs, dim=-1)
        attentions = torch.cat(attentions, dim=1)

        return decoder_outputs, decoder_hidden, attentions

    def forward_step(self, input, hidden, encoder_outputs):
        embedded =  self.dropout(self.embedding(input))

        query = hidden.permute(1, 0, 2)
        context, attn_weights = self.attention(query, encoder_outputs)
        input_gru = torch.cat((embedded, context), dim=2)

        output, hidden = self.gru(input_gru, hidden)
        output = self.out(output)

        return output, hidden, attn_weights

def indexesFromSentence(lang, sentence):
    return [lang.word2index[word] for word in sentence.split(' ')]

def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(1, -1)

def tensorsFromPair(pair):
    input_tensor = tensorFromSentence(input_lang, pair[0])
    target_tensor = tensorFromSentence(output_lang, pair[1])
    return (input_tensor, target_tensor)

def get_dataloader(batch_size):
    input_lang, output_lang, pairs = prepareData('eng', 'fra', True)

    n = len(pairs)
    input_ids = np.zeros((n, MAX_LENGTH), dtype=np.int32)
    target_ids = np.zeros((n, MAX_LENGTH), dtype=np.int32)

    for idx, (inp, tgt) in enumerate(pairs):
        inp_ids = indexesFromSentence(input_lang, inp)
        tgt_ids = indexesFromSentence(output_lang, tgt)
        inp_ids.append(EOS_token)
        tgt_ids.append(EOS_token)
        input_ids[idx, :len(inp_ids)] = inp_ids
        target_ids[idx, :len(tgt_ids)] = tgt_ids

    train_data = TensorDataset(torch.LongTensor(input_ids).to(device),
                               torch.LongTensor(target_ids).to(device))

    train_sampler = RandomSampler(train_data)
    train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
    return input_lang, output_lang, train_dataloader

为了训练，我们将输入句子通过编码器，并跟踪每个输出和最新的隐藏状态。然后解码器将得到<SOS>标记作为其第一个输入，编码器的最后隐藏状态作为其第一个隐藏状态。

由于 PyTorch 的自动求导给了我们自由，我们可以随机选择是否使用强制教师，只需使用简单的 if 语句。将teacher_forcing_ratio调高以更多地使用它。

def train_epoch(dataloader, encoder, decoder, encoder_optimizer,
          decoder_optimizer, criterion):

    total_loss = 0
    for data in dataloader:
        input_tensor, target_tensor = data

        encoder_optimizer.zero_grad()
        decoder_optimizer.zero_grad()

        encoder_outputs, encoder_hidden = encoder(input_tensor)
        decoder_outputs, _, _ = decoder(encoder_outputs, encoder_hidden, target_tensor)

        loss = criterion(
            decoder_outputs.view(-1, decoder_outputs.size(-1)),
            target_tensor.view(-1)
        )
        loss.backward()

        encoder_optimizer.step()
        decoder_optimizer.step()

        total_loss += loss.item()

    return total_loss / len(dataloader)

import time
import math

def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

然后我们多次调用train，偶尔打印进度（示例的百分比，到目前为止的时间，估计时间）和平均损失。

def train(train_dataloader, encoder, decoder, n_epochs, learning_rate=0.001,
               print_every=100, plot_every=100):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate)
    criterion = nn.NLLLoss()

    for epoch in range(1, n_epochs + 1):
        loss = train_epoch(train_dataloader, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
        print_loss_total += loss
        plot_loss_total += loss

        if epoch % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d  %d%%) %.4f' % (timeSince(start, epoch / n_epochs),
                                        epoch, epoch / n_epochs * 100, print_loss_avg))

        if epoch % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    showPlot(plot_losses)

绘图是用 matplotlib 完成的，使用在训练时保存的损失值数组plot_losses。

import matplotlib.pyplot as plt
plt.switch_backend('agg')
import matplotlib.ticker as ticker
import numpy as np

def showPlot(points):
    plt.figure()
    fig, ax = plt.subplots()
    # this locator puts ticks at regular intervals
    loc = ticker.MultipleLocator(base=0.2)
    ax.yaxis.set_major_locator(loc)
    plt.plot(points)

def evaluate(encoder, decoder, sentence, input_lang, output_lang):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)

        encoder_outputs, encoder_hidden = encoder(input_tensor)
        decoder_outputs, decoder_hidden, decoder_attn = decoder(encoder_outputs, encoder_hidden)

        _, topi = decoder_outputs.topk(1)
        decoded_ids = topi.squeeze()

        decoded_words = []
        for idx in decoded_ids:
            if idx.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            decoded_words.append(output_lang.index2word[idx.item()])
    return decoded_words, decoder_attn

def evaluateRandomly(encoder, decoder, n=10):
    for i in range(n):
        pair = random.choice(pairs)
        print('>', pair[0])
        print('=', pair[1])
        output_words, _ = evaluate(encoder, decoder, pair[0], input_lang, output_lang)
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('')

如果您运行此笔记本，您可以训练，中断内核，评估，并稍后继续训练。注释掉初始化编码器和解码器的行，并再次运行trainIters。

hidden_size = 128
batch_size = 32

input_lang, output_lang, train_dataloader = get_dataloader(batch_size)

encoder = EncoderRNN(input_lang.n_words, hidden_size).to(device)
decoder = AttnDecoderRNN(hidden_size, output_lang.n_words).to(device)

train(train_dataloader, encoder, decoder, 80, print_every=5, plot_every=5)

Reading lines...
Read 135842 sentence pairs
Trimmed to 11445 sentence pairs
Counting words...
Counted words:
fra 4601
eng 2991
0m 27s (- 6m 53s) (5 6%) 1.5304
0m 54s (- 6m 21s) (10 12%) 0.6776
1m 21s (- 5m 52s) (15 18%) 0.3528
1m 48s (- 5m 25s) (20 25%) 0.1946
2m 15s (- 4m 57s) (25 31%) 0.1205
2m 42s (- 4m 30s) (30 37%) 0.0841
3m 9s (- 4m 3s) (35 43%) 0.0639
3m 36s (- 3m 36s) (40 50%) 0.0521
4m 2s (- 3m 8s) (45 56%) 0.0452
4m 29s (- 2m 41s) (50 62%) 0.0395
4m 56s (- 2m 14s) (55 68%) 0.0377
5m 23s (- 1m 47s) (60 75%) 0.0349
5m 50s (- 1m 20s) (65 81%) 0.0324
6m 17s (- 0m 53s) (70 87%) 0.0316
6m 44s (- 0m 26s) (75 93%) 0.0298
7m 11s (- 0m 0s) (80 100%) 0.0291

将 dropout 层设置为eval模式

encoder.eval()
decoder.eval()
evaluateRandomly(encoder, decoder)

> il est si mignon !
= he s so cute
< he s so cute <EOS>

> je vais me baigner
= i m going to take a bath
< i m going to take a bath <EOS>

> c est un travailleur du batiment
= he s a construction worker
< he s a construction worker <EOS>

> je suis representant de commerce pour notre societe
= i m a salesman for our company
< i m a salesman for our company <EOS>

> vous etes grande
= you re big
< you are big <EOS>

> tu n es pas normale
= you re not normal
< you re not normal <EOS>

> je n en ai pas encore fini avec vous
= i m not done with you yet
< i m not done with you yet <EOS>

> je suis desole pour ce malentendu
= i m sorry about my mistake
< i m sorry about my mistake <EOS>

> nous ne sommes pas impressionnes
= we re not impressed
< we re not impressed <EOS>

> tu as la confiance de tous
= you are trusted by every one of us
< you are trusted by every one of us <EOS>

你可以简单地运行plt.matshow(attentions)来查看注意力输出显示为矩阵。为了获得更好的查看体验，我们将额外添加坐标轴和标签：

def showAttention(input_sentence, output_words, attentions):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(attentions.cpu().numpy(), cmap='bone')
    fig.colorbar(cax)

    # Set up axes
    ax.set_xticklabels([''] + input_sentence.split(' ') +
                       ['<EOS>'], rotation=90)
    ax.set_yticklabels([''] + output_words)

    # Show label at every tick
    ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

    plt.show()

def evaluateAndShowAttention(input_sentence):
    output_words, attentions = evaluate(encoder, decoder, input_sentence, input_lang, output_lang)
    print('input =', input_sentence)
    print('output =', ' '.join(output_words))
    showAttention(input_sentence, output_words, attentions[0, :len(output_words), :])

evaluateAndShowAttention('il n est pas aussi grand que son pere')

evaluateAndShowAttention('je suis trop fatigue pour conduire')

evaluateAndShowAttention('je suis desole si c est une question idiote')

evaluateAndShowAttention('je suis reellement fiere de vous')

input = il n est pas aussi grand que son pere
output = he is not as tall as his father <EOS>
/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:823: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:825: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

input = je suis trop fatigue pour conduire
output = i m too tired to drive <EOS>
/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:823: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:825: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

input = je suis desole si c est une question idiote
output = i m sorry if this is a stupid question <EOS>
/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:823: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:825: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

input = je suis reellement fiere de vous
output = i m really proud of you guys <EOS>
/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:823: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

/var/lib/jenkins/workspace/intermediate_source/seq2seq_translation_tutorial.py:825: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

下载 Python 源代码：seq2seq_translation_tutorial.py

下载 Jupyter 笔记本：seq2seq_translation_tutorial.ipynb

显示所有内容

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

生成输入和目标序列的函数

初始化一个实例

运行模型

在测试数据集上评估最佳模型

使用 Better Transformer 进行快速 Transformer 推理

本教程中的 Better Transformer 功能

附加信息

总结

从头开始的自然语言处理：使用字符级 RNN 对名称进行分类

推荐准备工作

准备数据

将名称转换为张量

创建网络

训练

为训练做准备

训练网络

绘制结果

评估结果

在用户输入上运行

练习

从零开始的 NLP：使用字符级 RNN 生成名字

准备数据

创建网络

训练

准备训练

训练网络

绘制损失

对网络进行采样

练习

NLP 从头开始：使用序列到序列网络和注意力进行翻译

加载数据文件

Seq2Seq 模型

编码器

解码器

简单解码器

注意力解码器

训练

准备训练数据

训练模型

绘制结果

评估

训练和评估

可视化注意力

练习

相关文章

发表回复 取消回复

发表回复取消回复