文本数据的序列性使得RNN的循环迭代模式成为显而易见的选择,但如果我们把文本编码后的结果(Batch×sequence×embedding)看做一张图片,那么通过卷积的方式提取文本信息也理所当然。这就是TextCNN算法的初衷。

TextCNN是一种高效的文本卷积算法,其可以捕捉相邻文本间的局部结构关系,同时卷积的特性又使得其支持并行操作。该算法在文本分类问题上的效果与TextRNN算法相当,因此被广泛使用。

那么如何从图片编码的角度,来合理的看待文本数据编码呢?这里提供两种视角:

视角1: 宽度为1的长条状图片,其embedding的尺寸可视为图片的channel大小。

CNN 文本分类 cnn文本分类算法 模型图_CNN 文本分类


对此,可直接应用1维卷积层提取信息。视角2:channel为1的图片,其高度和宽度分别对应于sequence和embedding。

CNN 文本分类 cnn文本分类算法 模型图_卷积_02

对此,需要使用2维卷积提取信息。其效果如下图所示。

CNN 文本分类 cnn文本分类算法 模型图_CNN 文本分类_03


无论从哪种视角思想,TextCNN中的卷积核都是在文本分词方向(即sequence方向)的扫描。其每一组卷积核的参数(包括个数、扫面窗口大小、步长等)均不一样,因此生成的特征图尺寸并不一致。在此基础上,通过对sequence维度进行1维的最大池化处理,使得每个卷积核均只输出一个结果,这样便可以对这些卷积核进行合并,最后交给最后的全连接层进行分类。

由上可见,整个TextCNN的网络架构为(省略了激活函数、dropout和BN处理):
Embedding——>1D-CNN/2D-CNN——>1D-MaxPooling——>Channel Merge——>FC——>分类结果

下面分别给出了基于1维卷积和2D卷积的网络架构,各层间维度的转换关系见代码中的备注:

  • 1D-CNN示例
import torch
import torch.nn as nn
import torch.nn.functional as F


Config = {"vob_size": 5000,         # 字典尺寸
          "ebd_size": 100,            # 词嵌入维度
          "conv1D_out": [8, 8, 8],          # 1D-conv层的output-channel列表
          "conv1D_ker": [2, 3, 4],        # 1D-conv层的kernel尺寸列表
          "fc_cla": 4,              # 全连接层的输出类别
          "dropout": 0.5            # dorpout层参数
}


class Text1DCNN(nn.Module):
    def __init__(self):
        super(Text1DCNN, self).__init__()
        self.embedding = nn.Embedding(num_embeddings=Config['vob_size'], embedding_dim=Config['ebd_size'])
        self.conv1D = nn.ModuleList([nn.Conv1d(in_channels=Config['ebd_size'], out_channels=out, kernel_size=ker)
                                     for out, ker in zip(Config['conv1D_out'], Config['conv1D_ker'])])
        self.dropout = nn.Dropout(p=Config['dropout'])
        self.fc = nn.Linear(sum(Config['conv1D_out']), Config["fc_cla"])

    def forward(self, x):           # x: (batch, sequence)
        x = self.embedding(x)      # x: (batch, sequence, embed)
        x = x.permute(0, 2, 1)      # x :(batch, embed, sequence)  将embed视为in_channel,这样才能进行1维卷积
        x = [F.relu(conv1D(x)) for conv1D in self.conv1D]    # [(batch, out_channel, L_out)]
        x = [F.max_pool1d(i, i.size(-1)) for i in x]   # [(batch, out_channel, 1)],在最后一个维度上进行max_pooling
        x = [torch.squeeze(i, dim=-1) for i in x]    # [(batch, out_channel)],维度压缩
        x = torch.cat(x, dim=-1)      # (batch, total_out_channel), 沿着各out_channel进行拼接
        x = self.dropout(x)
        out = self.fc(x)
        return out
  • 2D-CNN示例
import torch
import torch.nn as nn
import torch.nn.functional as F

Config = {"vob_size": 5000,         # 字典尺寸
          "ebd_size": 100,            # 词嵌入维度
          "conv1D_out": [8, 8, 8],          # 1D-conv层的output-channel列表
          "conv1D_ker": [2, 3, 4],        # 1D-conv层的kernel尺寸列表
          "fc_cla": 4,              # 全连接层的输出类别
          "dropout": 0.5            # dorpout层参数
}

class Text2DCNN(nn.Module):
    def __init__(self):
        super(Text2DCNN, self).__init__()
        self.embedding = nn.Embedding(num_embeddings=Config['vob_size'], embedding_dim=Config['ebd_size'])
        self.conv2D = nn.ModuleList([nn.Conv2d(in_channels=1, out_channels=out, kernel_size=(ker, Config['ebd_size']))
                                     for out, ker in zip(Config['conv1D_out'], Config['conv1D_ker'])])
        self.dropout = nn.Dropout(p=Config['dropout'])
        self.fc = nn.Linear(sum(Config['conv1D_out']), Config["fc_cla"])

    def forward(self, x):    # x: (batch, sequence)
        x = self.embedding(x)    # x: (batch, sequence, embed)
        x = torch.unsqueeze(x, dim=1)   # x: (batch, 1, sequence, embed)   增加in_channel维度,此时将embed维度视为width
        x = [F.relu(conv2D(x)) for conv2D in self.conv2D]    # x: [(batch, out_channel, height_out, 1)]
        x = [torch.squeeze(i, dim=-1) for i in x]    # x: [(batch, out_channel, height_out)], 对width维度进行压缩
        x = [F.max_pool1d(i, i.size(-1)) for i in x]    # x: [(batch, out_channel, 1)], 对height_out维度上进行1D-max pooling
        x = [torch.squeeze(i, dim=-1)for i in x]   # x: [(batch, out_channel)], 对height_out维度压缩
        x = torch.cat(x, dim=-1)   # x: (batch, total_out_channel), 对height_out维度合并
        x = self.dropout(x)
        out = self.fc(x)
        return out