esim卡移植 esim转sim

转载

智能创新者 2024-05-09 10:55:58

文章标签 esim卡移植自然语言处理深度学习机器学习词向量 文章分类 架构后端开发

文章目录

ESIM模型

1、input encoding
2、Local Inference Modelling
3、Enhancement of local inference information
4、others
5、Keras实现

ESIM模型

esim卡移植 esim转sim_机器学习

ESIM主要分为三部分：input encoding，local inference modeling 和 inference composition。

首先什么是文本匹配，简单来说就是分析两个句子是否具有某种关系，比如有一个问题，现在给出一个答案，我们就需要分析这个答案是否匹配这个问题，所以也可以看成是一个二分类问题（输出是或者不是）。现在主要基于SNIL和MutilNLI这两个语料库，它们包含两个句子premise和hypothesis以及一个label，label就是判断这两个句子的关系，本文主要讲解的就是如何利用ESIM分析这个问题。

1、input encoding

既然输入是两个句子，首先肯定是做word embedding了，方法有很多，这里假设直接用预训练的glove把句子转换成矩阵，注意，这时候我们得到的向量因为是基于glove预训练的向量得到的，并没有反映出句子中的前后文联系，所以我们这里继续利用BiLSTM再重新转换一下，得到最终的输入变量：
$esim卡移植 esim转sim_词向量_02$
上面的 $esim卡移植 esim转sim_自然语言处理_03$ 就是premise和hypothesis的某个单词的表示，上面的过程就是ESIM中的input encoding的过程。

2、Local Inference Modelling

接下来就是需要分析这两个句子之间的联系了，具体怎么分析？首先需要注意的是，我们现在得到的句子和单词的表示向量，是基于当前语境以及单词之间的意思综合分析得到的，那么如果两个单词之间的联系越大，就意味着他们之间的联系越大，就意味着他们之间的距离和夹角就越小，比如（1，0）和（0，1）之间的联系，就没有（0.5，0.5）和（0.5，0.5）之间的联系大。在理解了这一点之后，我们再来看看ESIM是怎么分析的。

首先，两个句子的词向量之间相乘：
$esim卡移植 esim转sim_esim卡移植_04$
正如上面所说的，如果两个词向量联系较大，那么乘积也会较大，然后：
$esim卡移植 esim转sim_深度学习_05$
上面两条公式我们可以理解为：比如premise中有一个单词"good"，首先我分析这个词和另一句话中各个词之间的联系，计算得到的结果 $esim卡移植 esim转sim_词向量_06$ 标准化后作为权重，用另一句话中的各个词向量按照权重去表示"good"，这样一个个分析对比，得到新的序列。以上过程称为Local Inference Modelling。

3、Enhancement of local inference information

之后，就是分析用另一句话的词 $esim卡移植 esim转sim_机器学习_07$ 表示的 $esim卡移植 esim转sim_深度学习_08$ 与真正的 $esim卡移植 esim转sim_词向量_09$ 之间的差异，从而判断两个句子之间的联系是否足够大，ESIM主要是计算新旧序列直接按的差，和积，并把所有信息合并起来存储在一个序列之中：
$esim卡移植 esim转sim_自然语言处理_10$
以上过程称为 Enhancement of local inference information

4、others

为什么要把所有信息储存在一个序列 $esim卡移植 esim转sim_深度学习_11$ 中？因为ESIM最后还需要综合所有信息，做一个全局的分析，这个过程依然是通过 BiLSTM 处理这两个序列：
$esim卡移植 esim转sim_自然语言处理_12$
值得注意的是，F是一个单层神经网络（ReLU作为激活函数），主要用来减少模型的参数避免过拟合，另外，上面的$ t $表示BiLSTM在t时刻的输出。

因为对于不同的句子，得到的向量v长度是不同的，为了方便最后一步的分析，这里进行了池化处理，把结果储存在一个固定长度的向量中。值得注意的是，因为考虑到求和运算对于序列长度是敏感的，因而降低了模型的鲁棒性，所以ESIM选择同时对两个序列进行average pooling和max pooling，再把结果放进一个向量中：

$esim卡移植 esim转sim_机器学习_13$
最终，终于来到最后一步了，那就是把向量v扔到一个多层感知器分类器，在输出层使用softmax函数。(如果是二分类，则使用sigmoid函数，输出一个0~1之间的概率。)

以上，就是ESIM的完整模型了。

5、Keras实现

import warnings
warnings.filterwarnings('ignore')
from keras.layers import *
from keras.activations import softmax
from keras.models import Model
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from bulid_input import *
from load_data import *
import matplotlib.pyplot as plt

def StaticEmbedding(embedding_matrix):
    # Embedding metrix
    in_dim, out_dim = embedding_matrix.shape
    return Embedding(in_dim, out_dim, weights=[embedding_matrix], trainable=False)


def subtract(input_1, input_2):
    minus_input_2 = Lambda(lambda x: -x)(input_2)
    return add([input_1, minus_input_2])


def aggregate(input_1, input_2, num_dense=300, dropout_rate=0.5):
    feat1 = concatenate([GlobalAvgPool1D()(input_1), GlobalMaxPool1D()(input_1)])
    feat2 = concatenate([GlobalAvgPool1D()(input_2), GlobalMaxPool1D()(input_2)])
    x = concatenate([feat1, feat2])
    x = BatchNormalization()(x)
    x = Dense(num_dense, activation='relu')(x)
    x = BatchNormalization()(x)
    x = Dropout(dropout_rate)(x)
    x = Dense(num_dense, activation='relu')(x)
    x = BatchNormalization()(x)
    x = Dropout(dropout_rate)(x)
    return x


def align(input_1, input_2):
    attention = Dot(axes=-1, name='attention-layer')([input_1, input_2])
    w_att_1 = Lambda(lambda x: softmax(x, axis=1))(attention)
    w_att_2 = Permute((2, 1))(Lambda(lambda x: softmax(x, axis=2))(attention))
    in1_aligned = Dot(axes=1)([w_att_1, input_1])
    in2_aligned = Dot(axes=1)([w_att_2, input_2])
    return in1_aligned, in2_aligned


def build_model(embedding_matrix, num_class=1, max_length=30, lstm_dim=300):
    q1 = Input(shape=(max_length,))
    q2 = Input(shape=(max_length,))

    # Embedding
    embedding = StaticEmbedding(embedding_matrix)
    q1_embed = BatchNormalization(axis=2)(embedding(q1))
    q2_embed = BatchNormalization(axis=2)(embedding(q2))

    # Encoding
    encode = Bidirectional(LSTM(lstm_dim, return_sequences=True))
    q1_encoded = encode(q1_embed)
    q2_encoded = encode(q2_embed)

    # Alignment
    q1_aligned, q2_aligned = align(q1_encoded, q2_encoded)

    # Compare
    q1_combined = concatenate(
        [q1_encoded, q2_aligned, subtract(q1_encoded, q2_aligned), multiply([q1_encoded, q2_aligned])])
    q2_combined = concatenate(
        [q2_encoded, q1_aligned, subtract(q2_encoded, q1_aligned), multiply([q2_encoded, q1_aligned])])
    compare = Bidirectional(LSTM(lstm_dim, return_sequences=True))
    q1_compare = compare(q1_combined)
    q2_compare = compare(q2_combined)

    # Aggregate
    x = aggregate(q1_compare, q2_compare)
    x = Dense(num_class, activation='sigmoid')(x)
    model = Model(inputs=[q1, q2], outputs=x)
    model.compile(loss='binary_crossentropy',
                  optimizer='nadam',
                  metrics=['accuracy'])
    model.summary()
    return model

def draw_train(history):
    '''绘制训练曲线'''
    # Plot training & validation accuracy values
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('Model accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper left')
    plt.show()

    # Plot training & validation loss values
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper left')
    plt.savefig("model/result_esim.png")
    plt.show()


if __name__ == "__main__":
    # 参数设置
    BATCH_SIZE = 512
    EMBEDDING_DIM = 100
    EPOCHS = 20
    model_path = 'model/tokenvec_esim_model.h5'

    # 数据准备
    train = read_bq('data/bq_corpus/train.tsv', ['line_num', 'q1', 'q2', 'label'])

    MAX_LENGTH = select_best_length(train)
    datas, word_dict = build_data(train)
    train_w2v(datas)
    VOCAB_SIZE = len(word_dict)
    embeddings_dict = load_pretrained_embedding()
    embedding_matrix = build_embedding_matrix(word_dict, embeddings_dict,
                                              VOCAB_SIZE, EMBEDDING_DIM)
    left_x_train, right_x_train, y_train = convert_data(datas, word_dict, MAX_LENGTH)
    model = build_model(embedding_matrix, max_length=MAX_LENGTH, lstm_dim=128)
    from keras.utils.vis_utils import plot_model
    plot_model(model, to_file='model/model_esim.png', show_shapes=True)
    history = model.fit(
        x=[left_x_train, right_x_train],
        y=y_train,
        validation_split=0.2,
        batch_size=BATCH_SIZE,
        epochs=EPOCHS,
    )
    draw_train(history)
    model.save(model_path)

更多详细代码可以见本人的github仓库~
keras实现的ESIM网络。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。