pytorch 转换ncnn pytorch转ncnn移植android

转载

技术博客达人 2023-08-23 16:51:23

文章标签 pytorch 转换ncnn 深度学习 android github 初始化 文章分类 PyTorch 人工智能

前言：目前基于pytorch的深度学习框架应用的越来越广泛，相关的轻量级部署框架引擎也推广的比较火热。目前主要分为两种，针对1对1和多对1，如tflite，torchlite等为1对1主要支持自家生态训的训练框架。针对多对1，其中以onnxruntime、paddle、ncnn、mnn等为主，可支持多种不同训练框架，毕竟是BAT三巨头推出来的。但是在将基于pc端生成的深度学习模型部署到安卓端这条路依旧不是很明朗，实现方式非常多：如pytorch官网提供的andriod-app-demo，还有先通过onnx进行中转后通过ncnn等转换生成对应相关模型后及进行部署的。

目前主要的技术架构为：先将x86端训练的模型在不同的框架进行转换如上述介绍的onnx以及谷歌的Tflite等。经过转换的模型切断了反向传播从而直接可以进行推理，在生成的推理模型进行量化（为了让模型加速，其中还有压缩、裁剪、蒸馏）等操作对模型进行加速。之后不同的推理框架根据模型生成原则可能需要不同的下一步操作，有些模型需要拆分图结构和参数，如ncnn需要将模型进一步拆分生成对应的二进制模型bin以及网络描述文件param，而tflite等只需要直接使用JNI（一种可供C++和JAVA交互的框架）进行编译从而生成apk。

以上总结起来可为：pc端生成的模型经过不同引擎的转换身成可识别模型，其中以onnx最为广泛（如caffe，pytorch，keras均可以用来生成onnx），之后不同的推理框架上进行转换身成模型及网络。之后进行部署，需要将自身的模型在C++端进行编译，应此如果是自己搭建的模型需要自己写C++前、后处理，随后进行编译得到apk，so等文件，从而达到部署条件。

存在的问题：目前很多框架出现的demo主要以实现端到端任务的检测算法，涉及多模型的demo比较少，人脸识别，图像描述等并没有完整的推理文件，因此创作者需要写一些符合自己项目C++程序，前、后处理等。python实现的算法需要在对应的C++上实现难度较大。并且对应部署到andriod还涉及到Cmake的编译和一些接口的处理更加大了pytorch模型到安卓上部署的难度。

目前我的需求是使用多模型来实现人脸识别的任务（主要以retinaface+facenet的结构为基础），我将在这条路进行复现，其技术框架为：pytorch>>onnx>>ncnn>>android。说到ncnn，其发布者大佬的知乎请看这里，如果想了解onnx可查看这里。针对pytorch的框架其实ncnn有一套pnnx模型转换操作可以使得pytorch的模型直接转为ncnn的bin和param，但主要为了实现多对1的目的，还是使用onnx的转换方式。如果使用onnx转换出现模型输出不正确的问题，读者也可使用pnnx进行转换。

一、模型转换

（一）前向推理（将训练好的pytorch模型转为onnx格式，使用torch.onnx实现）

需要注意的是，模型转换需要现初始化转化模型，并且随机构建输入图片，其中输入图片需要也进行与训练操作相同的前处理（以下进行了减去均值的操作，如不进行直接random亲测产生的onnx输出和原模型不一致）。

import torch
import torch.onnx
import onnx

import numpy as np
import torch.nn as nn

from nets_retinaface.retinaface import RetinaFace
from nets.facenet import Facenet

model1 = RetinaFace().eval（)
model2 = Facenet().eval（)

# # Load the weights from a file (.pth usually)
weights_path1 = './model_data/Retinaface_mobilenet0.25.pth'
weights_path2 = './model_data/facenet_mobilenet.pth'

state_dict1 = torch.load(weights_path1)
state_dict2 = torch.load(weights_path2)

# Load the weights now into a model net architecture defined by our class
model1.load_state_dict(state_dict1)
model2.load_state_dict(state_dict2, strict=False)

input1 = np.random.randint(0, 255, size=(640, 640, 3), dtype=np.int32)
input1 = input1.astype(np.float)
input1 -= np.array((104, 117, 123),np.float32)
input1 = torch.from_numpy(input1.transpose(2, 0, 1)).unsqueeze(0).type(torch.FloatTensor)
input_names1 = [ "RetinaFace_input" ] 
output_names1 = [ "RetinaFace_output_%d" % i for i in range(3) ]

input2 = np.random.randint(0, 255, size=(160, 160, 3), dtype=np.uint8)/255
input2 = np.expand_dims(input2.transpose(2, 0, 1),0)
input2 = torch.from_numpy(input2).type(torch.FloatTensor)

input_names2 = [ "Facenet_input" ] 
output_names2 = [ "Facenet_output" ]

torch.onnx.export(model1, input1, "RetinaFace.onnx", keep_initializers_as_inputs=False, verbose=True,input_names=input_names1, output_names=output_names1, opset_version=11)
torch.onnx.export(model2, input2, "Facenet.onnx", keep_initializers_as_inputs=False, verbose=True,input_names=input_names2, output_names=output_names2, opset_version=11)

print('----------Down!!!----------Down!!!-----------')

# # Load the ONNX model
# model = onnx.load("RetinaFace.onnx")

# # Check that the IR is well formed
# onnx.checker.check_model(model)

# # Print a human readable representation of the graph
# onnx.helper.printable_graph(model.graph)

# model1 = onnx.load("Facenet.onnx")

# # Check that the IR is well formed
# onnx.checker.check_model(model1)

# # Print a human readable representation of the graph
# onnx.helper.printable_graph(model1.graph)

（二）验证模型（验证生成的onnx模型，使用onnxruntime实现）

随后生成的onnx可以和原pth在相同输入的情况下测试输出是否一致，一致则正确。或者使用netron查看模型对应参数是否一致也可。

image1 = image.numpy()
import onnxruntime as ort
ort_session = ort.InferenceSession('./RetinaFace.onnx')
input_name = ort_session.get_inputs()[0].name 
# outputs_1 = ort_session.get_outputs()[0].name
# outputs_2 = ort_session.get_outputs()[0].name
# out = ort_session.run([outputs_0], input_feed={input_name: image1})
outs = ort_session.run(None, input_feed={input_name: image1}) 
print('out_0:',out[0])
print('out_1:',out[1])
print('out_2:',out[2])

（三）模型优化（除去冗余的胶水op，使用onnxsim实现）

由于目前onnx的底层op还尚未完全优化，转换之后会生成很多胶水op（所谓胶水op个人理解指的是很多结构如卷积-激活等模块操作在onnx中会分解成很多op的连接），还需要对生成的onnx模型进行简化：

python -m onnxsim RetinaFace.onnx RetinaFace-sim.onnx

得到新的RetinaFace-sim.onnx文件。

（四）NCNN框架编译

目前ncnn支持很多训练框架模型的转换，如caffe，darknet，mxnet等。

下载编译ncnn框架，依次执行以下命令。

git clone https://github.com/Tencent/ncnn.git

cd ncnn

mkdir -p build

cd build

cmake ..

make -j16

（五）NCNN模型转换

在/ncnn/build/tools/onnx下，生成onnx的网络和参数模型。

./onnx2ncnn RetinaFace-sim.onnx RetinaFace.param RetinaFace.bin

进一步使用 ncnnoptimize转为 fp16 存储格式减小模型体积，65536表示fp16 存储。

$ ncnnoptimize yolov5s.param yolov5s.bin yolov5s-opt.param yolov5s-opt.bin 65536

（六）（可选）模型加密

使用ncnn2mem 对模型进行加密操作。

详情可参考ncnn的github。

二、模型部署

由于ncnn转换生成的模型仅仅是torch框架nn.module中搭建的网络模型，要完全进行部署还需要进行前、后处理等操作。由于NCNN底层为C++实现，因此该部分操作需要编写C++程序，在pytorch +python 上如何进行的预处理，在C++上就需要编写相同的程序执行。目前ncnn官网下的example已经提供大多主流的模型如yolo系列，squeezenet系列，mobile系列等目标检测模型。但是涉及多模型的复杂任务还需要根据自己的网络结构重写 *.cpp 前向推理。

实现ncnn的多模型加载，主要县通过ncnn::Net申明网络，后通过load_param和load_model来加载模型，之后使用create_extractor来进行初始化，就可以进行推理了。下附main代码，基本流程就是这样。但是在模型转换的时候还是除了问题，在pth转为onnx，使用相同个输入输出是一样的，但是onnx转为ncnn的时候，相同输入推理出的结果却是不一样的，后来使用pnnx转换也出现了问题，大家尽量转模型的时候用简单一点写打搭建的，避免比较复杂方式搭建的网络。我在转换的过程中ncnn的param出现了datamemory层，出现这情况是由于外部常量引入导致，可以将网络适当做精简修改即可。

int main(int argc, char** argv)
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s [img_path]\n", argv[0]);
        return -1;
    }
    const char* img_path = argv[1];
    cv::Mat m = cv::imread(img_path, 1);  // type 16 8u3c

    if (m.empty())
    {
        fprintf(stderr, "cv::imread %s failed\n", img_path);
        return -1;
    }
    std::vector<FaceObject> faceobjects;

    ncnn::Net retina, facenet, AgeGenderEstimator;
    retina.opt.use_vulkan_compute = false;    
    AgeGenderEstimator.opt.use_vulkan_compute = false;   


    retina.load_param("/home/kw/ncnn/build/tools/onnx/Retinaface-Facenet/Retinaface_sim.param"); 
    retina.load_model("/home/kw/ncnn/build/tools/onnx/Retinaface-Facenet/Retinaface_sim.bin");

    AgeGenderEstimator.load_param("/home/kw/ncnn/build/tools/onnx/sex_age/full-sim.param"); 
    AgeGenderEstimator.load_model("/home/kw/ncnn/build/tools/onnx/sex_age/full-sim.bin");

    ncnn::Extractor ex = retina.create_extractor();
    
    detect_face(m, faceobjects, ex);
    std::vector<cv::Mat> crop_images;
    face_attributes(m, faceobjects, crop_images);

    ncnn::Extractor ex1 = AgeGenderEstimator.create_extractor();
    sex_age(crop_images, ex1);
    draw_faceobjects(m, faceobjects);
    return 0;
}
// +--------+----+----+----+----+------+------+------+------+
// |        | C1 | C2 | C3 | C4 | C(5) | C(6) | C(7) | C(8) |
// +--------+----+----+----+----+------+------+------+------+
// | CV_8U  |  0 |  8 | 16 | 24 |   32 |   40 |   48 |   56 |
// | CV_8S  |  1 |  9 | 17 | 25 |   33 |   41 |   49 |   57 |
// | CV_16U |  2 | 10 | 18 | 26 |   34 |   42 |   50 |   58 |
// | CV_16S |  3 | 11 | 19 | 27 |   35 |   43 |   51 |   59 |
// | CV_32S |  4 | 12 | 20 | 28 |   36 |   44 |   52 |   60 |
// | CV_32F |  5 | 13 | 21 | 29 |   37 |   45 |   53 |   61 |
// | CV_64F |  6 | 14 | 22 | 30 |   38 |   46 |   54 |   62 |
// +--------+----+----+----+----+------+------+------+------+

目前已经得到完整的ncnn推理前向C++和多个模型，还需要进一步实现在android上的部署。

持续跟新中。。。。

——————————————————————————————————————————

原创分割线

android移植

新建android-ncnn工程，可以参考。

或者直接下载编译好的工程，https://github.com/chehongshu/ncnnforandroid_objectiondetection_Mobilenetssd/tree/master/MobileNetSSD_demo_single，我们以这个工程为例，直接修改为自己的模型。

首先将自己的模型文件age.param.bin,age.bin，标签文件label.txt(每行对应标签名) 拷贝到ncnnforandroid_objectiondetection_Mobilenetssd/MobileNetSSD_demo_single/app/src/main/asset/.

将age.id.h文件拷贝到ncnnforandroid_objectiondetection_Mobilenetssd/MobileNetSSD_demo_single/app/src/main/cpp/

修改ncnnforandroid_objectiondetection_Mobilenetssd/MobileNetSSD_demo_single/app/src/main/cpp/MobileNetssd.cpp

文件：

修改include “MobileNetSSD_deploy.id.h” 为include “age.id.h”

####a.输入：

由于我的输入图片是直接cv2.imread(‘tupian.jpg’),读取的为bgr格式，因此修改输入为，

in = ncnn::Mat::from_pixels((const unsigned char*)indata, ncnn::Mat::PIXEL_RGBA2BGR, width, height);

由于我没有归一化，注释掉一下行，

const float mean_vals[3] = {127.5f, 127.5f, 127.5f}; const float scale[3] = {0.007843f, 0.007843f, 0.007843f}; in.substract_mean_normalize(mean_vals, scale);// 归一化

####b.模型输入、输出名修改

按照age.id.h的输入，输出名，修改输入输出，

// 如果不加密是使用ex.input(“data”, in);

// BLOB_data在id.h文件中可见，相当于datainput网络层的id

ex.input(age_param_id::BLOB_input, in);

// 如果时不加密是使用ex.extract(“prob”, out);

//BLOB_detection_out.h文件中可见，相当于dataout网络层的id,输出检测的结果数据

ex.extract(age_param_id::BLOB_output, out);

到此，模型可以正常预测了。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：mysql越来越大 mysql数据大怎么处理

下一篇：python的HTML文件中使用和加载jS文件 html调用python

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯