encoder_inputs:a list,每个元素是时刻t的输入,每一时刻又存在batch_size个输入(word or token),并且每个token用input_size来表示(embedding)。因此,是a list of [batch_size, input_size] decoder_inputs:同上,但是这两个list的长度可能不同,前者根据encoder_max_length指定,decoder根据decoder_max_length指定。 输出: outputs:shape和decoder_inputs相同,差别在于这里用output_size和input_size区别【why state:还是最后一个时刻的cell state,[batch_size, cell.state_size]
输出: outputs:如果output_projection=None的话,也就是不进行映射(直接输出的是num_symbols的个数),那么a list of [batch_size, num_symbols];如果不为None,说明outputs要进行映射,则outputs是a list of [batch_size, num_symbols] state同上
embedding_rnn_seq2seq
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def embedding_rnn_seq2seq(encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
output_projection=None,
feed_previous=False,
dtype=None,
scope=None):
"""Embedding RNN sequence-to-sequence model.
This model first embeds encoder_inputs by a newly created embedding (of shape
[num_encoder_symbols x input_size]). Then it runs an RNN to encode
embedded encoder_inputs into a state vector. Next, it embeds decoder_inputs
by another newly created embedding (of shape [num_decoder_symbols x
input_size]). Then it runs RNN decoder, initialized with the last
encoder state, on embedded decoder_inputs.
Args:
encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
cell: core_rnn_cell.RNNCell defining the cell function and size.
num_encoder_symbols: Integer; number of symbols on the encoder side.
num_decoder_symbols: Integer; number of symbols on the decoder side.
embedding_size: Integer, the length of the embedding vector for each symbol.
output_projection: None or a pair (W, B) of output projection weights and
biases; W has shape [output_size x num_decoder_symbols] and B has
shape [num_decoder_symbols]; if provided and feed_previous=True, each
fed previous output will first be multiplied by W and added B.
feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
of decoder_inputs will be used (the "GO" symbol), and all other decoder
inputs will be taken from previous outputs (as in embedding_rnn_decoder).
If False, decoder_inputs are used as given (the standard decoder case).
dtype: The dtype of the initial state for both the encoder and encoder
rnn cells (default: tf.float32).
scope: VariableScope for the created subgraph; defaults to
"embedding_rnn_seq2seq"
Returns:
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors. The
output is of shape [batch_size x cell.output_size] when
output_projection is not None (and represents the dense representation
of predicted tokens). It is of shape [batch_size x num_decoder_symbols]
when output_projection is None.
state: The state of each decoder cell in each time-step. This is a list
with length len(decoder_inputs) -- one item for each time-step.
It is a 2D Tensor of shape [batch_size x cell.state_size].
"""
既然有了embedding_rnn_decoder,那么对应的就有embedding_rnn_seq2seq。之前讲过basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell, dtype=dtypes.float32, scope=None) inputs:还是像之前说的,既然embedding是内部帮我们完成,则inputs shape= a list of [batch_size],每个位置都只是一个token id。内部使用一个embedding wrapper,做lookup,生成a list of [batch_size, embedding_size] 对比之下,多了几个参数: num_encoder_symbols:通俗的说其实就是encoder端的vocab_size。enc和dec两端词汇量不同主要在于不同语言的translate task中,如果单纯是中文到中文的生成,不存在两端词汇量的不同。 num_decoder_symbols:同上 embedding_size:每个vocab需要用多少维的vector表示 output_projection=None: feed_previous=False:如果feed_previous只是简单的一个True or False,则直接返回embedding_rnn_decoder的结果。重点是feed_previous还能传入一个boolean tensor(暂时无此需求)
attention_decoder
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
defattention_decoder(decoder_inputs,
initial_state,
attention_states,
cell,
output_size=None,
num_heads=1,
loop_function=None,
dtype=None,
scope=None,
initial_state_attention=False):
"""RNN decoder with attention for the sequence-to-sequence model.
In this context "attention" means that, during decoding, the RNN can look up
information in the additional tensor attention_states, and it does this by
focusing on a few entries from the tensor. This model has proven to yield
especially good results in a number of sequence-to-sequence tasks. This
implementation is based on http://arxiv.org/abs/1412.7449 (see below for
details). It is recommended for complex sequence-to-sequence tasks.
Args:
decoder_inputs: A list of 2D Tensors [batch_size x input_size].
initial_state: 2D Tensor [batch_size x cell.state_size].
attention_states: 3D Tensor [batch_size x attn_length x attn_size].
cell: core_rnn_cell.RNNCell defining the cell function and size.
output_size: Size of the output vectors; if None, we use cell.output_size.
num_heads: Number of attention heads that read from attention_states.
loop_function: If not None, this function will be applied to i-th output
in order to generate i+1-th input, and decoder_inputs will be ignored,
except for the first element ("GO" symbol). This can be used for decoding,
but also for training to emulate http://arxiv.org/abs/1506.03099.
Signature -- loop_function(prev, i) = next
* prev is a 2D Tensor of shape [batch_size x output_size],
* i is an integer, the step number (when advanced control is needed),
* next is a 2D Tensor of shape [batch_size x input_size].
dtype: The dtype to use for the RNN initial state (default: tf.float32).
scope: VariableScope for the created subgraph; default: "attention_decoder".
initial_state_attention: If False (default), initial attentions are zero.
If True, initialize the attentions from the initial state and attention
states -- useful when we wish to resume decoding from a previously
stored decoder state and attention states.
Returns:
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors of
shape [batch_size x output_size]. These represent the generated outputs.
Output i is computed from input i (which is either the i-th element
of decoder_inputs or loop_function(output {i-1}, i)) as follows.
First, we run the cell on a combination of the input and previous
输入: (注意:一个batch上的所有数据都被pad成相同长度?因此它们的time_length是一样的?) logits:a list依次存储一系列时刻上的输出,每一时刻的输出都是batch_size为单位的,其中的每一个输入对应的输出是整个vocab上的得分,因此是num_decoder_symbols。因此,logits应该是a list of [batch_size, num_decoder_symbols] targets:a list表示依次的所有时刻的target,每一时刻又有batch_size个输入,因此对应batch_size个target,因此shape=a list of [batch_size, ] weights:每个example,在每一时刻都有对自身当前token的权重。因此shape=a list of [batch_size,] 疑问:weights是做什么用的?为什么要对每个token设置权重?
解读代码: 首先会生成一个crossent,shape=[batch_size, ],再和weights相乘,还是得到[batch_size, ],表示每个example在当前时刻t位置的得分(batch_size个),append到log_perp_list中(最终shape是a list of [batch_size, ]) 所有的time length循环完毕之后,累加这些time length,得到一个shape=[batch_size,]的变量,叫做log_perps。
sequence_loss
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
defsequence_loss(logits,
targets,
weights,
average_across_timesteps=True,
average_across_batch=True,
softmax_loss_function=None,
name=None):
"""Weighted cross-entropy loss for a sequence of logits, batch-collapsed.
Args:
logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].
targets: List of 1D batch-sized int32 Tensors of the same length as logits.
weights: List of 1D batch-sized float-Tensors of the same length as logits.
average_across_timesteps: If set, divide the returned cost by the total
label weight.
average_across_batch: If set, divide the returned cost by the batch size.
softmax_loss_function: Function (labels-batch, inputs-batch) -> loss-batch
to be used instead of the standard softmax (the default if this is None).
name: Optional name for this operation, defaults to "sequence_loss".
Returns:
A scalar float Tensor: The average log-perplexity per symbol (weighted).
Raises:
ValueError: If len(logits) is different from len(targets) or len(weights).
"""
with ops.name_scope(name, "sequence_loss", logits + targets + weights):
其实主体还是上面讲的sequence_loss_by_example,只不过对上面的[batch_size,]的结果进行sum,如果默认average_across_batch的话,就sum/batch_size,平均每一个sequence的log perplexity;要是设置了不平均,则返回的是整个batch上的sum of log perplexity
model_with_buckets
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
defmodel_with_buckets(encoder_inputs,
decoder_inputs,
targets,
weights,
buckets,
seq2seq,
softmax_loss_function=None,
per_example_loss=False,
name=None):
"""Create a sequence-to-sequence model with support for bucketing.
The seq2seq argument is a function that defines a sequence-to-sequence model,
e.g., seq2seq = lambda x, y: basic_rnn_seq2seq(
x, y, core_rnn_cell.GRUCell(24))
Args:
encoder_inputs: A list of Tensors to feed the encoder; first seq2seq input.
decoder_inputs: A list of Tensors to feed the decoder; second seq2seq input.
targets: A list of 1D batch-sized int32 Tensors (desired output sequence).
weights: List of 1D batch-sized float-Tensors to weight the targets.
buckets: A list of pairs of (input size, output size) for each bucket.
seq2seq: A sequence-to-sequence model function; it takes 2 input that
agree with encoder_inputs and decoder_inputs, and returns a pair
consisting of outputs and states (as, e.g., basic_rnn_seq2seq).
softmax_loss_function: Function (labels-batch, inputs-batch) -> loss-batch
to be used instead of the standard softmax (the default if this is None).
per_example_loss: Boolean. If set, the returned loss will be a batch-sized
tensor of losses for each sequence in the batch. If unset, it will be
a scalar with the averaged loss from all examples.
name: Optional name for this operation, defaults to "model_with_buckets".
Returns:
A tuple of the form (outputs, losses), where:
outputs: The outputs for each bucket. Its j'th element consists of a list
of 2D Tensors. The shape of output tensors can be either
[batch_size x output_size] or [batch_size x num_decoder_symbols]
depending on the seq2seq model used.
losses: List of scalar Tensors, representing losses for each bucket, or,
if per_example_loss is set, a list of 1D batch-sized float Tensors.
Raises:
ValueError: If length of encoder_inputs, targets, or weights is smaller
than the largest (last) bucket.
"""
参数: encoder_inputs:一开始我有个疑问,这里的inputs是ids的形式还是传入input_size的形式,仔细想想实际是这样的。这个inputs具体的shape形式要根据后面seq2seq定义的那个函数决定,一般就只传入两个参数x, y分别对应encoder_inputs和decoder_inputs(另外特定seq2seq需要的参数需要在自定义的这个seq2seq函数内部传入)。这个时候,如果我们使用的是embedding_seq2seq,那么实际的inputs就应该是ids的样子;否则,就是input_size的样子。 targets:a list因为每一时刻都会有target,并且每一时刻输入的是batch_size个,因此每一时刻的target是[batch_size,]的形式,最终导致targets是a list of [batch_size, ] buckets:a list of (input_size, output_size) per_example_loss:默认是False,表示losses是[batch_size, ]。比如刚才讲到的sequence_loss_by_example的结果是[batch_size,],再者sequence_loss的结果是一个scalar。
实现:
1
2
3
4
5
6
for j, bucket in enumerate(buckets):
with variable_scope.variable_scope(
variable_scope.get_variable_scope(), reuse=True if j > 0 else None):