缺陷,原因还是出在激活函数。通常来讲,激活函数在神经网络里最多只能6层左右,因为它的反向误差传递会随着层数的增加,传递的误差值越来越小,而在RNN中,误差传递不仅存在于层与层之间,也在存于每一层的样本序列间,所以RNN无法去学习太长的序列特征。
于是,神经网络学科中又演化了许多RNN网络的变体版本,使得模型能够学习更长的序列特征。
LSTM)
窥视孔连接(Peephole)的出现是为了弥补忘记门一个缺点:当前cell的状态不能影响到Input Gate, Forget Gate在下一时刻的输出,使整个cell对上个序列的处理丢失了部分信息。如下图虚线部分,计算的顺序为:
(1)上一时刻从cell输出的数据,随着本次时刻的数据一起输入Input Gate和Forget Gate。
(2)将输入门和忘记门的输出数据同时输入cell中。
(3)cell出来的数据输入到当前时刻的Output Gate,也输入到下一时刻的input gate,forget gate。
(4)Forget Gate输出的数据与cell激活后的数据一起作为整个Block的输出。
Bi-RNN采用了两个方向的RNN网络
基于神经网络的时序类分类CTC是语音辨识中的一个关键技术,通过增加一个额外的Symbol代表NULL来解决叠字问题。
该方法主要体现在处理loss值上,通过对序列对不上的label添加blank(空label)的方式,将预测的输出值与给定的label值在时间序列上对齐,通过交叉熵的算法求出具体损失值。
比如在语音识别的例子中,对于一句语音有它的序列值级对应的文本,可以使用CTC的损失函数求出模型输出与label之间的loss,再通过优化器的迭代训练让损失值变小的方式将模型训练出来。
TensorFlow中的RNN
定义好cell类之后,还需要将它们连接起来构成RNN网络。
1、静态RNN构建:static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)
- cell:生成好的cell类对象
- inputs:输入数据,一定是list或者二维张量,list的顺序就是时间序列。元素就是每一个序列的值。
- initial_state:初始化cell状态
- dtype:期望输出和初始化state的类型。
- sequence_length:每一个输入的序列长度。
- scope:命名空间
- 返回值有两个,一个是结果,一个是cell状态,输入多少个时序,结果就会输出多少个元素
2、动态RNN构建:dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, sequence_length, time_major=False, scope=None)
- cell:生成好的cell类对象
- inputs:输入数据为张量,一般是三维,[batch_size, max_time, ...]
- initial_state:初始化cell状态
- dtype:期望输出和初始化state的类型
- sequence_length:每一个输入的序列长度
- time_major:默认False, input的shape为[batch_size, max_time, ...]。如果是True,shape为[max_time, batch_size, ...]
- scope:命名空间
- 返回值:一个是结果,[batch_size, max_time, ...],一个是cell状态
3、双向RNN构建:有4个函数可以使用
4、使用动态RNN处理变长序列
动态RNN还有个更高级的功能就是可以处理变长序列,方法就是:在准备样本的同时,将样本对应的长度也作为初始化参数,一起创建动态RNN
实例:使用RNN对MNIST分类
import tensorflow as tf
# 导入 MINST 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)
n_input = 28 # MNIST data 输入 (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST 列别 (0-9 ,一共10类)
tf.reset_default_graph()
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
x1 = tf.unstack(x, n_steps, 1)
#1 BasicLSTMCell
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)
#2 LSTMCell
#lstm_cell = tf.contrib.rnn.LSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)
#3 gru
#gru = tf.contrib.rnn.GRUCell(n_hidden)
#outputs = tf.contrib.rnn.static_rnn(gru, x1, dtype=tf.float32)
#4 创建动态RNN
#outputs,_ = tf.nn.dynamic_rnn(gru,x,dtype=tf.float32)
#outputs = tf.transpose(outputs, [1, 0, 2])
pred = tf.contrib.layers.fully_connected(outputs[-1],n_classes,activation_fn = None)
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 启动session
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# 计算批次数据的准确率
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
print (" Finished!")
# 计算准确率 for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print ("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: test_data, y: test_label}))
优化RNN
RNN的优化技巧有很多,这里介绍RNN特有的两个优化方法
1、dropout功能:RNN有自己的dropout,lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob)
从t-1时刻的状态传递到t时刻进行计算,这中间不进行memory的dropout,仅在同一个t时刻中,多层cell之间传递信息时进行dropout。所以RNN的dropout方法会有两个设置参数input_keep_prob(传入cell的保留率)和output_keep_prob(输出cell的保留率)
2、LN基于层的归一化:由于RNN的特殊结构,它的输入不同于前面所讲的全连接、卷积网络。
在BN中,每一层的输入只考虑当前批次样本(或批次样本的转化值)即可。
但是在RNN中,每一层的输入除了当前批次样本的转化值,还得考虑样本中上一个序列样本的输出值,所以对于RNN的归一化,BN算法不再使用,最小批次覆盖不了全部的输入数据,而是需要对于输入BN的某一层来做归一化,即layer-Normalization。
import numpy as np
import tensorflow as tf
# 导入 MINST 数据集
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/data/", one_hot=True)
from tensorflow.python.ops.rnn_cell_impl import _RNNCell as RNNCell
from tensorflow.python.ops.math_ops import sigmoid
from tensorflow.python.ops.math_ops import tanh
from tensorflow.python.ops import variable_scope as vs
from tensorflow.python.ops import array_ops
from tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl import _linear
print(tf.__version__)
tf.reset_default_graph()
def ln(tensor, scope = None, epsilon = 1e-5):
""" Layer normalizes a 2D tensor along its second axis """
assert(len(tensor.get_shape()) == 2)
m, v = tf.nn.moments(tensor, [1], keep_dims=True)
if not isinstance(scope, str):
scope = ''
with tf.variable_scope(scope + 'layer_norm'):
scale = tf.get_variable('scale',
shape=[tensor.get_shape()[1]],
initializer=tf.constant_initializer(1))
shift = tf.get_variable('shift',
shape=[tensor.get_shape()[1]],
initializer=tf.constant_initializer(0))
LN_initial = (tensor - m) / tf.sqrt(v + epsilon)
return LN_initial * scale + shift
class LNGRUCell(RNNCell):
"""Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078)."""
def __init__(self, num_units, input_size=None, activation=tanh):
if input_size is not None:
print("%s: The input_size parameter is deprecated." % self)
self._num_units = num_units
self._activation = activation
@property
def state_size(self):
return self._num_units
@property
def output_size(self):
return self._num_units
def __call__(self, inputs, state):
"""Gated recurrent unit (GRU) with nunits cells."""
with vs.variable_scope("Gates"): # Reset gate and update gate.,reuse=True
# We start with bias of 1.0 to not reset and not update.
value =_linear([inputs, state], 2 * self._num_units, True, 1.0)
r, u = array_ops.split(value=value, num_or_size_splits=2, axis=1)
r = ln(r, scope = 'r/')
u = ln(u, scope = 'u/')
r, u = sigmoid(r), sigmoid(u)
with vs.variable_scope("Candidate"):
# with vs.variable_scope("Layer_Parameters"):
Cand = _linear([inputs, r *state], self._num_units, True)
c_pre = ln(Cand, scope = 'new_h/')
c = self._activation(c_pre)
new_h = u * state + (1 - u) * c
return new_h, new_h
n_input = 28 # MNIST data 输入 (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST 列别 (0-9 ,一共10类)
tf.reset_default_graph()
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
x1 = tf.unstack(x, n_steps, 1)
#1 BasicLSTMCell
#lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)
#2 LSTMCell
#lstm_cell = tf.contrib.rnn.LSTMCell(n_hidden, forget_bias=1.0)
#outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, x1, dtype=tf.float32)
#3 gru
#gru = tf.contrib.rnn.GRUCell(n_hidden)
gru = LNGRUCell(n_hidden)
#outputs = tf.contrib.rnn.static_rnn(gru, x1, dtype=tf.float32)
#4 创建动态RNN
outputs,_ = tf.nn.dynamic_rnn(gru,x,dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
pred = tf.contrib.layers.fully_connected(outputs[-1],n_classes,activation_fn = None)
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 启动session
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# 计算批次数据的准确率
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
print (" Finished!")
# 计算准确率 for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print ("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: test_data, y: test_label}))