将yolov8部署到安卓——测试

更换OpenCV版本

需要在VS2019中测试ncnn推理yolo的效果,虽然在上一篇博客中我们安装OpenCV-Mobile,但是由于OpenCV-Mobile阉割了一些功能,所以为了方便测试,我们还是先换回Window的OpenCV,好在配置起来也不算麻烦,我下载的是4.8.0版本。下载后解压,配置环境变量,再在VS2019中类似上一篇博客配置即可,因为时间间隔有些久,具体步骤我记不太清楚了,所以这一部分还请参考其他博客

opencv modules included

module comment
opencv_core Mat, matrix operations, etc
opencv_imgproc resize, cvtColor, warpAffine, etc
opencv_highgui imread, imwrite
opencv_features2d keypoint feature and matcher, etc (not included in opencv 2.x package)
opencv_photo inpaint, etc
opencv_video opticalflow, etc

opencv modules discarded

module comment
opencv_androidcamera use android Camera api instead
opencv_calib3d camera calibration, rare uses on mobile
opencv_contrib experimental functions, build part of the source externally if you need
opencv_dnn very slow on mobile, try ncnn for neural network inference on mobile
opencv_dynamicuda no cuda on mobile
opencv_flann feature matching, rare uses on mobile, build the source externally if you need
opencv_gapi graph based image processing, little gain on mobile
opencv_gpu no cuda/opencl on mobile
opencv_imgcodecs link with opencv_highgui instead
opencv_java wrap your c++ code with jni
opencv_js write native code on mobile
opencv_legacy various good-old cv routines, build part of the source externally if you need
opencv_ml train your ML algorithm on powerful pc or server
opencv_nonfree the SIFT and SURF, use ORB which is faster and better
opencv_objdetect HOG, cascade detector, use deep learning detector which is faster and better
opencv_ocl no opencl on mobile
opencv_python no python on mobile
opencv_shape shape matching, rare uses on mobile, build the source externally if you need
opencv_stitching image stitching, rare uses on mobile, build the source externally if you need
opencv_superres do video super-resolution on powerful pc or server
opencv_ts test modules, useless in production anyway
opencv_videoio use android MediaCodec or ios AVFoundation api instead
opencv_videostab do video stablization on powerful pc or server
opencv_viz vtk is not available on mobile, write your own data visualization routines

模型转换(以Ultralytics官方提供的CoCo目标检测预训练权重为例)

​ 这一步可以说是我踩坑最多的步骤了,碰到问题可以参考这篇WikiIssues,你能够得到大部分的解答。并且由于我转换的整个过程踩坑众多,所以很多细节也已经记不太清楚,不能保证我的步骤一定能够成功,仅供参考。

安装yolov8

​ 这一步我在之前的博客也已经提过,网上也是教程众多,单纯的复述一遍意义不大,所以我打算把我踩的坑记录一下,希望能有启发作用。

首先是关于选择pip install ultralytics还是git clone的方法

​ 如果你看了上面的Wiki和issues的话,可以知道转换模型前是需要修改源码的。所以按理来说应该选择git clone的方法更方便修改,所以我一开始也是选择的git clone,但是

关于模型转换参数的问题

​ 基于我找到的测试代码,我只发现opset=13转换得到的模型会出现全屏检测框的问题,opset=12或11都没问题,其他参数的影响并不清楚。

关于转换得到的模型输出维度问题

​ 最终的维度应该是1×144×8400,而不是1×84×8400,根据在ultralytics/nn/modules/head.py中修改的内容,forward输出应该是被拦腰截段了的,也就是说我们等于取了Yolo推理的中间结果,再完成后面的部分,所以送到ncnn的模型的输出维度并不是最终的输出维度。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def forward(self, x):
"""Concatenates and returns predicted bounding boxes and class probabilities."""
for i in range(self.nl):
x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
if self.training: # Training path
return x

# Inference path
shape = x[0].shape # BCHW

if self.dynamic or self.shape != shape:
self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
self.shape = shape
pred = torch.cat([xi.view(shape[0], self.no, -1) for xi in x],2).permute(0,2,1)
return pred

# x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
# if self.export and self.format in {"saved_model", "pb", "tflite", "edgetpu", "tfjs"}: # avoid TF FlexSplitV ops
# box = x_cat[:, : self.reg_max * 4]
# cls = x_cat[:, self.reg_max * 4 :]
# else:
# box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
#
# if self.export and self.format in {"tflite", "edgetpu"}:
# # Precompute normalization factor to increase numerical stability
# # See https://github.com/ultralytics/ultralytics/issues/7371
# grid_h = shape[2]
# grid_w = shape[3]
# grid_size = torch.tensor([grid_w, grid_h, grid_w, grid_h], device=box.device).reshape(1, 4, 1)
# norm = self.strides / (self.stride[0] * grid_size)
# dbox = self.decode_bboxes(self.dfl(box) * norm, self.anchors.unsqueeze(0) * norm[:, :2])
# else:
# dbox = self.decode_bboxes(self.dfl(box), self.anchors.unsqueeze(0)) * self.strides
#
# y = torch.cat((dbox, cls.sigmoid()), 1)
# return y if self.export else (y, x)

关于转换得到的模型结构问题

​ 打开转换后得到的.parm文件,可以查看模型的结构。主要关注最后一层是不是Permute

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
7767517
181 217
Input images 0 1 images
Convolution /model.0/conv/Conv 1 1 images /model.0/conv/Conv_output_0 0=16 1=3 3=2 4=1 5=1 6=432
Swish /model.0/act/Mul 1 1 /model.0/conv/Conv_output_0 /model.0/act/Mul_output_0
Convolution /model.1/conv/Conv 1 1 /model.0/act/Mul_output_0 /model.1/conv/Conv_output_0 0=32 1=3 3=2 4=1 5=1 6=4608
Swish /model.1/act/Mul 1 1 /model.1/conv/Conv_output_0 /model.1/act/Mul_output_0
Convolution /model.2/cv1/conv/Conv 1 1 /model.1/act/Mul_output_0 /model.2/cv1/conv/Conv_output_0 0=32 1=1 5=1 6=1024
Swish /model.2/cv1/act/Mul 1 1 /model.2/cv1/conv/Conv_output_0 /model.2/cv1/act/Mul_output_0
Slice /model.2/Split 1 2 /model.2/cv1/act/Mul_output_0 /model.2/Split_output_0 /model.2/Split_output_1 -23300=2,16,-233
Split splitncnn_0 1 3 /model.2/Split_output_1 /model.2/Split_output_1_splitncnn_0 /model.2/Split_output_1_splitncnn_1 /model.2/Split_output_1_splitncnn_2
Convolution /model.2/m.0/cv1/conv/Conv 1 1 /model.2/Split_output_1_splitncnn_2 /model.2/m.0/cv1/conv/Conv_output_0 0=16 1=3 4=1 5=1 6=2304
Swish /model.2/m.0/cv1/act/Mul 1 1 /model.2/m.0/cv1/conv/Conv_output_0 /model.2/m.0/cv1/act/Mul_output_0
Convolution /model.2/m.0/cv2/conv/Conv 1 1 /model.2/m.0/cv1/act/Mul_output_0 /model.2/m.0/cv2/conv/Conv_output_0 0=16 1=3 4=1 5=1 6=2304
Swish /model.2/m.0/cv2/act/Mul 1 1 /model.2/m.0/cv2/conv/Conv_output_0 /model.2/m.0/cv2/act/Mul_output_0
BinaryOp /model.2/m.0/Add 2 1 /model.2/Split_output_1_splitncnn_1 /model.2/m.0/cv2/act/Mul_output_0 /model.2/m.0/Add_output_0
Concat /model.2/Concat 3 1 /model.2/Split_output_0 /model.2/Split_output_1_splitncnn_0 /model.2/m.0/Add_output_0 /model.2/Concat_output_0
Convolution /model.2/cv2/conv/Conv 1 1 /model.2/Concat_output_0 /model.2/cv2/conv/Conv_output_0 0=32 1=1 5=1 6=1536
Swish /model.2/cv2/act/Mul 1 1 /model.2/cv2/conv/Conv_output_0 /model.2/cv2/act/Mul_output_0
Convolution /model.3/conv/Conv 1 1 /model.2/cv2/act/Mul_output_0 /model.3/conv/Conv_output_0 0=64 1=3 3=2 4=1 5=1 6=18432
Swish /model.3/act/Mul 1 1 /model.3/conv/Conv_output_0 /model.3/act/Mul_output_0
Convolution /model.4/cv1/conv/Conv 1 1 /model.3/act/Mul_output_0 /model.4/cv1/conv/Conv_output_0 0=64 1=1 5=1 6=4096
Swish /model.4/cv1/act/Mul 1 1 /model.4/cv1/conv/Conv_output_0 /model.4/cv1/act/Mul_output_0
Slice /model.4/Split 1 2 /model.4/cv1/act/Mul_output_0 /model.4/Split_output_0 /model.4/Split_output_1 -23300=2,32,-233
Split splitncnn_1 1 3 /model.4/Split_output_1 /model.4/Split_output_1_splitncnn_0 /model.4/Split_output_1_splitncnn_1 /model.4/Split_output_1_splitncnn_2
Convolution /model.4/m.0/cv1/conv/Conv 1 1 /model.4/Split_output_1_splitncnn_2 /model.4/m.0/cv1/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish /model.4/m.0/cv1/act/Mul 1 1 /model.4/m.0/cv1/conv/Conv_output_0 /model.4/m.0/cv1/act/Mul_output_0
Convolution /model.4/m.0/cv2/conv/Conv 1 1 /model.4/m.0/cv1/act/Mul_output_0 /model.4/m.0/cv2/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish /model.4/m.0/cv2/act/Mul 1 1 /model.4/m.0/cv2/conv/Conv_output_0 /model.4/m.0/cv2/act/Mul_output_0
BinaryOp /model.4/m.0/Add 2 1 /model.4/Split_output_1_splitncnn_1 /model.4/m.0/cv2/act/Mul_output_0 /model.4/m.0/Add_output_0
Split splitncnn_2 1 3 /model.4/m.0/Add_output_0 /model.4/m.0/Add_output_0_splitncnn_0 /model.4/m.0/Add_output_0_splitncnn_1 /model.4/m.0/Add_output_0_splitncnn_2
Convolution /model.4/m.1/cv1/conv/Conv 1 1 /model.4/m.0/Add_output_0_splitncnn_2 /model.4/m.1/cv1/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish /model.4/m.1/cv1/act/Mul 1 1 /model.4/m.1/cv1/conv/Conv_output_0 /model.4/m.1/cv1/act/Mul_output_0
Convolution /model.4/m.1/cv2/conv/Conv 1 1 /model.4/m.1/cv1/act/Mul_output_0 /model.4/m.1/cv2/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish /model.4/m.1/cv2/act/Mul 1 1 /model.4/m.1/cv2/conv/Conv_output_0 /model.4/m.1/cv2/act/Mul_output_0
BinaryOp /model.4/m.1/Add 2 1 /model.4/m.0/Add_output_0_splitncnn_1 /model.4/m.1/cv2/act/Mul_output_0 /model.4/m.1/Add_output_0
Concat /model.4/Concat 4 1 /model.4/Split_output_0 /model.4/Split_output_1_splitncnn_0 /model.4/m.0/Add_output_0_splitncnn_0 /model.4/m.1/Add_output_0 /model.4/Concat_output_0
Convolution /model.4/cv2/conv/Conv 1 1 /model.4/Concat_output_0 /model.4/cv2/conv/Conv_output_0 0=64 1=1 5=1 6=8192
Swish /model.4/cv2/act/Mul 1 1 /model.4/cv2/conv/Conv_output_0 /model.4/cv2/act/Mul_output_0
Split splitncnn_3 1 2 /model.4/cv2/act/Mul_output_0 /model.4/cv2/act/Mul_output_0_splitncnn_0 /model.4/cv2/act/Mul_output_0_splitncnn_1
Convolution /model.5/conv/Conv 1 1 /model.4/cv2/act/Mul_output_0_splitncnn_1 /model.5/conv/Conv_output_0 0=128 1=3 3=2 4=1 5=1 6=73728
Swish /model.5/act/Mul 1 1 /model.5/conv/Conv_output_0 /model.5/act/Mul_output_0
Convolution /model.6/cv1/conv/Conv 1 1 /model.5/act/Mul_output_0 /model.6/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=16384
Swish /model.6/cv1/act/Mul 1 1 /model.6/cv1/conv/Conv_output_0 /model.6/cv1/act/Mul_output_0
Slice /model.6/Split 1 2 /model.6/cv1/act/Mul_output_0 /model.6/Split_output_0 /model.6/Split_output_1 -23300=2,64,-233
Split splitncnn_4 1 3 /model.6/Split_output_1 /model.6/Split_output_1_splitncnn_0 /model.6/Split_output_1_splitncnn_1 /model.6/Split_output_1_splitncnn_2
Convolution /model.6/m.0/cv1/conv/Conv 1 1 /model.6/Split_output_1_splitncnn_2 /model.6/m.0/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.6/m.0/cv1/act/Mul 1 1 /model.6/m.0/cv1/conv/Conv_output_0 /model.6/m.0/cv1/act/Mul_output_0
Convolution /model.6/m.0/cv2/conv/Conv 1 1 /model.6/m.0/cv1/act/Mul_output_0 /model.6/m.0/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.6/m.0/cv2/act/Mul 1 1 /model.6/m.0/cv2/conv/Conv_output_0 /model.6/m.0/cv2/act/Mul_output_0
BinaryOp /model.6/m.0/Add 2 1 /model.6/Split_output_1_splitncnn_1 /model.6/m.0/cv2/act/Mul_output_0 /model.6/m.0/Add_output_0
Split splitncnn_5 1 3 /model.6/m.0/Add_output_0 /model.6/m.0/Add_output_0_splitncnn_0 /model.6/m.0/Add_output_0_splitncnn_1 /model.6/m.0/Add_output_0_splitncnn_2
Convolution /model.6/m.1/cv1/conv/Conv 1 1 /model.6/m.0/Add_output_0_splitncnn_2 /model.6/m.1/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.6/m.1/cv1/act/Mul 1 1 /model.6/m.1/cv1/conv/Conv_output_0 /model.6/m.1/cv1/act/Mul_output_0
Convolution /model.6/m.1/cv2/conv/Conv 1 1 /model.6/m.1/cv1/act/Mul_output_0 /model.6/m.1/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.6/m.1/cv2/act/Mul 1 1 /model.6/m.1/cv2/conv/Conv_output_0 /model.6/m.1/cv2/act/Mul_output_0
BinaryOp /model.6/m.1/Add 2 1 /model.6/m.0/Add_output_0_splitncnn_1 /model.6/m.1/cv2/act/Mul_output_0 /model.6/m.1/Add_output_0
Concat /model.6/Concat 4 1 /model.6/Split_output_0 /model.6/Split_output_1_splitncnn_0 /model.6/m.0/Add_output_0_splitncnn_0 /model.6/m.1/Add_output_0 /model.6/Concat_output_0
Convolution /model.6/cv2/conv/Conv 1 1 /model.6/Concat_output_0 /model.6/cv2/conv/Conv_output_0 0=128 1=1 5=1 6=32768
Swish /model.6/cv2/act/Mul 1 1 /model.6/cv2/conv/Conv_output_0 /model.6/cv2/act/Mul_output_0
Split splitncnn_6 1 2 /model.6/cv2/act/Mul_output_0 /model.6/cv2/act/Mul_output_0_splitncnn_0 /model.6/cv2/act/Mul_output_0_splitncnn_1
Convolution /model.7/conv/Conv 1 1 /model.6/cv2/act/Mul_output_0_splitncnn_1 /model.7/conv/Conv_output_0 0=256 1=3 3=2 4=1 5=1 6=294912
Swish /model.7/act/Mul 1 1 /model.7/conv/Conv_output_0 /model.7/act/Mul_output_0
Convolution /model.8/cv1/conv/Conv 1 1 /model.7/act/Mul_output_0 /model.8/cv1/conv/Conv_output_0 0=256 1=1 5=1 6=65536
Swish /model.8/cv1/act/Mul 1 1 /model.8/cv1/conv/Conv_output_0 /model.8/cv1/act/Mul_output_0
Slice /model.8/Split 1 2 /model.8/cv1/act/Mul_output_0 /model.8/Split_output_0 /model.8/Split_output_1 -23300=2,128,-233
Split splitncnn_7 1 3 /model.8/Split_output_1 /model.8/Split_output_1_splitncnn_0 /model.8/Split_output_1_splitncnn_1 /model.8/Split_output_1_splitncnn_2
Convolution /model.8/m.0/cv1/conv/Conv 1 1 /model.8/Split_output_1_splitncnn_2 /model.8/m.0/cv1/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish /model.8/m.0/cv1/act/Mul 1 1 /model.8/m.0/cv1/conv/Conv_output_0 /model.8/m.0/cv1/act/Mul_output_0
Convolution /model.8/m.0/cv2/conv/Conv 1 1 /model.8/m.0/cv1/act/Mul_output_0 /model.8/m.0/cv2/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish /model.8/m.0/cv2/act/Mul 1 1 /model.8/m.0/cv2/conv/Conv_output_0 /model.8/m.0/cv2/act/Mul_output_0
BinaryOp /model.8/m.0/Add 2 1 /model.8/Split_output_1_splitncnn_1 /model.8/m.0/cv2/act/Mul_output_0 /model.8/m.0/Add_output_0
Concat /model.8/Concat 3 1 /model.8/Split_output_0 /model.8/Split_output_1_splitncnn_0 /model.8/m.0/Add_output_0 /model.8/Concat_output_0
Convolution /model.8/cv2/conv/Conv 1 1 /model.8/Concat_output_0 /model.8/cv2/conv/Conv_output_0 0=256 1=1 5=1 6=98304
Swish /model.8/cv2/act/Mul 1 1 /model.8/cv2/conv/Conv_output_0 /model.8/cv2/act/Mul_output_0
Convolution /model.9/cv1/conv/Conv 1 1 /model.8/cv2/act/Mul_output_0 /model.9/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=32768
Swish /model.9/cv1/act/Mul 1 1 /model.9/cv1/conv/Conv_output_0 /model.9/cv1/act/Mul_output_0
Split splitncnn_8 1 2 /model.9/cv1/act/Mul_output_0 /model.9/cv1/act/Mul_output_0_splitncnn_0 /model.9/cv1/act/Mul_output_0_splitncnn_1
Pooling /model.9/m/MaxPool 1 1 /model.9/cv1/act/Mul_output_0_splitncnn_1 /model.9/m/MaxPool_output_0 1=5 3=2 5=1
Split splitncnn_9 1 2 /model.9/m/MaxPool_output_0 /model.9/m/MaxPool_output_0_splitncnn_0 /model.9/m/MaxPool_output_0_splitncnn_1
Pooling /model.9/m_1/MaxPool 1 1 /model.9/m/MaxPool_output_0_splitncnn_1 /model.9/m_1/MaxPool_output_0 1=5 3=2 5=1
Split splitncnn_10 1 2 /model.9/m_1/MaxPool_output_0 /model.9/m_1/MaxPool_output_0_splitncnn_0 /model.9/m_1/MaxPool_output_0_splitncnn_1
Pooling /model.9/m_2/MaxPool 1 1 /model.9/m_1/MaxPool_output_0_splitncnn_1 /model.9/m_2/MaxPool_output_0 1=5 3=2 5=1
Concat /model.9/Concat 4 1 /model.9/cv1/act/Mul_output_0_splitncnn_0 /model.9/m/MaxPool_output_0_splitncnn_0 /model.9/m_1/MaxPool_output_0_splitncnn_0 /model.9/m_2/MaxPool_output_0 /model.9/Concat_output_0
Convolution /model.9/cv2/conv/Conv 1 1 /model.9/Concat_output_0 /model.9/cv2/conv/Conv_output_0 0=256 1=1 5=1 6=131072
Swish /model.9/cv2/act/Mul 1 1 /model.9/cv2/conv/Conv_output_0 /model.9/cv2/act/Mul_output_0
Split splitncnn_11 1 2 /model.9/cv2/act/Mul_output_0 /model.9/cv2/act/Mul_output_0_splitncnn_0 /model.9/cv2/act/Mul_output_0_splitncnn_1
Interp /model.10/Resize 1 1 /model.9/cv2/act/Mul_output_0_splitncnn_1 /model.10/Resize_output_0 0=1 1=2.000000e+00 2=2.000000e+00
Concat /model.11/Concat 2 1 /model.10/Resize_output_0 /model.6/cv2/act/Mul_output_0_splitncnn_0 /model.11/Concat_output_0
Convolution /model.12/cv1/conv/Conv 1 1 /model.11/Concat_output_0 /model.12/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=49152
Swish /model.12/cv1/act/Mul 1 1 /model.12/cv1/conv/Conv_output_0 /model.12/cv1/act/Mul_output_0
Slice /model.12/Split 1 2 /model.12/cv1/act/Mul_output_0 /model.12/Split_output_0 /model.12/Split_output_1 -23300=2,64,-233
Split splitncnn_12 1 2 /model.12/Split_output_1 /model.12/Split_output_1_splitncnn_0 /model.12/Split_output_1_splitncnn_1
Convolution /model.12/m.0/cv1/conv/Conv 1 1 /model.12/Split_output_1_splitncnn_1 /model.12/m.0/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.12/m.0/cv1/act/Mul 1 1 /model.12/m.0/cv1/conv/Conv_output_0 /model.12/m.0/cv1/act/Mul_output_0
Convolution /model.12/m.0/cv2/conv/Conv 1 1 /model.12/m.0/cv1/act/Mul_output_0 /model.12/m.0/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.12/m.0/cv2/act/Mul 1 1 /model.12/m.0/cv2/conv/Conv_output_0 /model.12/m.0/cv2/act/Mul_output_0
Concat /model.12/Concat 3 1 /model.12/Split_output_0 /model.12/Split_output_1_splitncnn_0 /model.12/m.0/cv2/act/Mul_output_0 /model.12/Concat_output_0
Convolution /model.12/cv2/conv/Conv 1 1 /model.12/Concat_output_0 /model.12/cv2/conv/Conv_output_0 0=128 1=1 5=1 6=24576
Swish /model.12/cv2/act/Mul 1 1 /model.12/cv2/conv/Conv_output_0 /model.12/cv2/act/Mul_output_0
Split splitncnn_13 1 2 /model.12/cv2/act/Mul_output_0 /model.12/cv2/act/Mul_output_0_splitncnn_0 /model.12/cv2/act/Mul_output_0_splitncnn_1
Interp /model.13/Resize 1 1 /model.12/cv2/act/Mul_output_0_splitncnn_1 /model.13/Resize_output_0 0=1 1=2.000000e+00 2=2.000000e+00
Concat /model.14/Concat 2 1 /model.13/Resize_output_0 /model.4/cv2/act/Mul_output_0_splitncnn_0 /model.14/Concat_output_0
Convolution /model.15/cv1/conv/Conv 1 1 /model.14/Concat_output_0 /model.15/cv1/conv/Conv_output_0 0=64 1=1 5=1 6=12288
Swish /model.15/cv1/act/Mul 1 1 /model.15/cv1/conv/Conv_output_0 /model.15/cv1/act/Mul_output_0
Slice /model.15/Split 1 2 /model.15/cv1/act/Mul_output_0 /model.15/Split_output_0 /model.15/Split_output_1 -23300=2,32,-233
Split splitncnn_14 1 2 /model.15/Split_output_1 /model.15/Split_output_1_splitncnn_0 /model.15/Split_output_1_splitncnn_1
Convolution /model.15/m.0/cv1/conv/Conv 1 1 /model.15/Split_output_1_splitncnn_1 /model.15/m.0/cv1/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish /model.15/m.0/cv1/act/Mul 1 1 /model.15/m.0/cv1/conv/Conv_output_0 /model.15/m.0/cv1/act/Mul_output_0
Convolution /model.15/m.0/cv2/conv/Conv 1 1 /model.15/m.0/cv1/act/Mul_output_0 /model.15/m.0/cv2/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish /model.15/m.0/cv2/act/Mul 1 1 /model.15/m.0/cv2/conv/Conv_output_0 /model.15/m.0/cv2/act/Mul_output_0
Concat /model.15/Concat 3 1 /model.15/Split_output_0 /model.15/Split_output_1_splitncnn_0 /model.15/m.0/cv2/act/Mul_output_0 /model.15/Concat_output_0
Convolution /model.15/cv2/conv/Conv 1 1 /model.15/Concat_output_0 /model.15/cv2/conv/Conv_output_0 0=64 1=1 5=1 6=6144
Swish /model.15/cv2/act/Mul 1 1 /model.15/cv2/conv/Conv_output_0 /model.15/cv2/act/Mul_output_0
Split splitncnn_15 1 3 /model.15/cv2/act/Mul_output_0 /model.15/cv2/act/Mul_output_0_splitncnn_0 /model.15/cv2/act/Mul_output_0_splitncnn_1 /model.15/cv2/act/Mul_output_0_splitncnn_2
Convolution /model.16/conv/Conv 1 1 /model.15/cv2/act/Mul_output_0_splitncnn_2 /model.16/conv/Conv_output_0 0=64 1=3 3=2 4=1 5=1 6=36864
Swish /model.16/act/Mul 1 1 /model.16/conv/Conv_output_0 /model.16/act/Mul_output_0
Concat /model.17/Concat 2 1 /model.16/act/Mul_output_0 /model.12/cv2/act/Mul_output_0_splitncnn_0 /model.17/Concat_output_0
Convolution /model.18/cv1/conv/Conv 1 1 /model.17/Concat_output_0 /model.18/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=24576
Swish /model.18/cv1/act/Mul 1 1 /model.18/cv1/conv/Conv_output_0 /model.18/cv1/act/Mul_output_0
Slice /model.18/Split 1 2 /model.18/cv1/act/Mul_output_0 /model.18/Split_output_0 /model.18/Split_output_1 -23300=2,64,-233
Split splitncnn_16 1 2 /model.18/Split_output_1 /model.18/Split_output_1_splitncnn_0 /model.18/Split_output_1_splitncnn_1
Convolution /model.18/m.0/cv1/conv/Conv 1 1 /model.18/Split_output_1_splitncnn_1 /model.18/m.0/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.18/m.0/cv1/act/Mul 1 1 /model.18/m.0/cv1/conv/Conv_output_0 /model.18/m.0/cv1/act/Mul_output_0
Convolution /model.18/m.0/cv2/conv/Conv 1 1 /model.18/m.0/cv1/act/Mul_output_0 /model.18/m.0/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.18/m.0/cv2/act/Mul 1 1 /model.18/m.0/cv2/conv/Conv_output_0 /model.18/m.0/cv2/act/Mul_output_0
Concat /model.18/Concat 3 1 /model.18/Split_output_0 /model.18/Split_output_1_splitncnn_0 /model.18/m.0/cv2/act/Mul_output_0 /model.18/Concat_output_0
Convolution /model.18/cv2/conv/Conv 1 1 /model.18/Concat_output_0 /model.18/cv2/conv/Conv_output_0 0=128 1=1 5=1 6=24576
Swish /model.18/cv2/act/Mul 1 1 /model.18/cv2/conv/Conv_output_0 /model.18/cv2/act/Mul_output_0
Split splitncnn_17 1 3 /model.18/cv2/act/Mul_output_0 /model.18/cv2/act/Mul_output_0_splitncnn_0 /model.18/cv2/act/Mul_output_0_splitncnn_1 /model.18/cv2/act/Mul_output_0_splitncnn_2
Convolution /model.19/conv/Conv 1 1 /model.18/cv2/act/Mul_output_0_splitncnn_2 /model.19/conv/Conv_output_0 0=128 1=3 3=2 4=1 5=1 6=147456
Swish /model.19/act/Mul 1 1 /model.19/conv/Conv_output_0 /model.19/act/Mul_output_0
Concat /model.20/Concat 2 1 /model.19/act/Mul_output_0 /model.9/cv2/act/Mul_output_0_splitncnn_0 /model.20/Concat_output_0
Convolution /model.21/cv1/conv/Conv 1 1 /model.20/Concat_output_0 /model.21/cv1/conv/Conv_output_0 0=256 1=1 5=1 6=98304
Swish /model.21/cv1/act/Mul 1 1 /model.21/cv1/conv/Conv_output_0 /model.21/cv1/act/Mul_output_0
Slice /model.21/Split 1 2 /model.21/cv1/act/Mul_output_0 /model.21/Split_output_0 /model.21/Split_output_1 -23300=2,128,-233
Split splitncnn_18 1 2 /model.21/Split_output_1 /model.21/Split_output_1_splitncnn_0 /model.21/Split_output_1_splitncnn_1
Convolution /model.21/m.0/cv1/conv/Conv 1 1 /model.21/Split_output_1_splitncnn_1 /model.21/m.0/cv1/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish /model.21/m.0/cv1/act/Mul 1 1 /model.21/m.0/cv1/conv/Conv_output_0 /model.21/m.0/cv1/act/Mul_output_0
Convolution /model.21/m.0/cv2/conv/Conv 1 1 /model.21/m.0/cv1/act/Mul_output_0 /model.21/m.0/cv2/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish /model.21/m.0/cv2/act/Mul 1 1 /model.21/m.0/cv2/conv/Conv_output_0 /model.21/m.0/cv2/act/Mul_output_0
Concat /model.21/Concat 3 1 /model.21/Split_output_0 /model.21/Split_output_1_splitncnn_0 /model.21/m.0/cv2/act/Mul_output_0 /model.21/Concat_output_0
Convolution /model.21/cv2/conv/Conv 1 1 /model.21/Concat_output_0 /model.21/cv2/conv/Conv_output_0 0=256 1=1 5=1 6=98304
Swish /model.21/cv2/act/Mul 1 1 /model.21/cv2/conv/Conv_output_0 /model.21/cv2/act/Mul_output_0
Split splitncnn_19 1 2 /model.21/cv2/act/Mul_output_0 /model.21/cv2/act/Mul_output_0_splitncnn_0 /model.21/cv2/act/Mul_output_0_splitncnn_1
Convolution /model.22/cv2.0/cv2.0.0/conv/Conv 1 1 /model.15/cv2/act/Mul_output_0_splitncnn_1 /model.22/cv2.0/cv2.0.0/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.22/cv2.0/cv2.0.0/act/Mul 1 1 /model.22/cv2.0/cv2.0.0/conv/Conv_output_0 /model.22/cv2.0/cv2.0.0/act/Mul_output_0
Convolution /model.22/cv2.0/cv2.0.1/conv/Conv 1 1 /model.22/cv2.0/cv2.0.0/act/Mul_output_0 /model.22/cv2.0/cv2.0.1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.22/cv2.0/cv2.0.1/act/Mul 1 1 /model.22/cv2.0/cv2.0.1/conv/Conv_output_0 /model.22/cv2.0/cv2.0.1/act/Mul_output_0
Convolution /model.22/cv2.0/cv2.0.2/Conv 1 1 /model.22/cv2.0/cv2.0.1/act/Mul_output_0 /model.22/cv2.0/cv2.0.2/Conv_output_0 0=64 1=1 5=1 6=4096
Convolution /model.22/cv3.0/cv3.0.0/conv/Conv 1 1 /model.15/cv2/act/Mul_output_0_splitncnn_0 /model.22/cv3.0/cv3.0.0/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=46080
Swish /model.22/cv3.0/cv3.0.0/act/Mul 1 1 /model.22/cv3.0/cv3.0.0/conv/Conv_output_0 /model.22/cv3.0/cv3.0.0/act/Mul_output_0
Convolution /model.22/cv3.0/cv3.0.1/conv/Conv 1 1 /model.22/cv3.0/cv3.0.0/act/Mul_output_0 /model.22/cv3.0/cv3.0.1/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=57600
Swish /model.22/cv3.0/cv3.0.1/act/Mul 1 1 /model.22/cv3.0/cv3.0.1/conv/Conv_output_0 /model.22/cv3.0/cv3.0.1/act/Mul_output_0
Convolution /model.22/cv3.0/cv3.0.2/Conv 1 1 /model.22/cv3.0/cv3.0.1/act/Mul_output_0 /model.22/cv3.0/cv3.0.2/Conv_output_0 0=80 1=1 5=1 6=6400
Concat /model.22/Concat 2 1 /model.22/cv2.0/cv2.0.2/Conv_output_0 /model.22/cv3.0/cv3.0.2/Conv_output_0 /model.22/Concat_output_0
Convolution /model.22/cv2.1/cv2.1.0/conv/Conv 1 1 /model.18/cv2/act/Mul_output_0_splitncnn_1 /model.22/cv2.1/cv2.1.0/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=73728
Swish /model.22/cv2.1/cv2.1.0/act/Mul 1 1 /model.22/cv2.1/cv2.1.0/conv/Conv_output_0 /model.22/cv2.1/cv2.1.0/act/Mul_output_0
Convolution /model.22/cv2.1/cv2.1.1/conv/Conv 1 1 /model.22/cv2.1/cv2.1.0/act/Mul_output_0 /model.22/cv2.1/cv2.1.1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.22/cv2.1/cv2.1.1/act/Mul 1 1 /model.22/cv2.1/cv2.1.1/conv/Conv_output_0 /model.22/cv2.1/cv2.1.1/act/Mul_output_0
Convolution /model.22/cv2.1/cv2.1.2/Conv 1 1 /model.22/cv2.1/cv2.1.1/act/Mul_output_0 /model.22/cv2.1/cv2.1.2/Conv_output_0 0=64 1=1 5=1 6=4096
Convolution /model.22/cv3.1/cv3.1.0/conv/Conv 1 1 /model.18/cv2/act/Mul_output_0_splitncnn_0 /model.22/cv3.1/cv3.1.0/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=92160
Swish /model.22/cv3.1/cv3.1.0/act/Mul 1 1 /model.22/cv3.1/cv3.1.0/conv/Conv_output_0 /model.22/cv3.1/cv3.1.0/act/Mul_output_0
Convolution /model.22/cv3.1/cv3.1.1/conv/Conv 1 1 /model.22/cv3.1/cv3.1.0/act/Mul_output_0 /model.22/cv3.1/cv3.1.1/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=57600
Swish /model.22/cv3.1/cv3.1.1/act/Mul 1 1 /model.22/cv3.1/cv3.1.1/conv/Conv_output_0 /model.22/cv3.1/cv3.1.1/act/Mul_output_0
Convolution /model.22/cv3.1/cv3.1.2/Conv 1 1 /model.22/cv3.1/cv3.1.1/act/Mul_output_0 /model.22/cv3.1/cv3.1.2/Conv_output_0 0=80 1=1 5=1 6=6400
Concat /model.22/Concat_1 2 1 /model.22/cv2.1/cv2.1.2/Conv_output_0 /model.22/cv3.1/cv3.1.2/Conv_output_0 /model.22/Concat_1_output_0
Convolution /model.22/cv2.2/cv2.2.0/conv/Conv 1 1 /model.21/cv2/act/Mul_output_0_splitncnn_1 /model.22/cv2.2/cv2.2.0/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=147456
Swish /model.22/cv2.2/cv2.2.0/act/Mul 1 1 /model.22/cv2.2/cv2.2.0/conv/Conv_output_0 /model.22/cv2.2/cv2.2.0/act/Mul_output_0
Convolution /model.22/cv2.2/cv2.2.1/conv/Conv 1 1 /model.22/cv2.2/cv2.2.0/act/Mul_output_0 /model.22/cv2.2/cv2.2.1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish /model.22/cv2.2/cv2.2.1/act/Mul 1 1 /model.22/cv2.2/cv2.2.1/conv/Conv_output_0 /model.22/cv2.2/cv2.2.1/act/Mul_output_0
Convolution /model.22/cv2.2/cv2.2.2/Conv 1 1 /model.22/cv2.2/cv2.2.1/act/Mul_output_0 /model.22/cv2.2/cv2.2.2/Conv_output_0 0=64 1=1 5=1 6=4096
Convolution /model.22/cv3.2/cv3.2.0/conv/Conv 1 1 /model.21/cv2/act/Mul_output_0_splitncnn_0 /model.22/cv3.2/cv3.2.0/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=184320
Swish /model.22/cv3.2/cv3.2.0/act/Mul 1 1 /model.22/cv3.2/cv3.2.0/conv/Conv_output_0 /model.22/cv3.2/cv3.2.0/act/Mul_output_0
Convolution /model.22/cv3.2/cv3.2.1/conv/Conv 1 1 /model.22/cv3.2/cv3.2.0/act/Mul_output_0 /model.22/cv3.2/cv3.2.1/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=57600
Swish /model.22/cv3.2/cv3.2.1/act/Mul 1 1 /model.22/cv3.2/cv3.2.1/conv/Conv_output_0 /model.22/cv3.2/cv3.2.1/act/Mul_output_0
Convolution /model.22/cv3.2/cv3.2.2/Conv 1 1 /model.22/cv3.2/cv3.2.1/act/Mul_output_0 /model.22/cv3.2/cv3.2.2/Conv_output_0 0=80 1=1 5=1 6=6400
Concat /model.22/Concat_2 2 1 /model.22/cv2.2/cv2.2.2/Conv_output_0 /model.22/cv3.2/cv3.2.2/Conv_output_0 /model.22/Concat_2_output_0
Reshape /model.22/Reshape 1 1 /model.22/Concat_output_0 /model.22/Reshape_output_0 0=-1 1=144
Reshape /model.22/Reshape_1 1 1 /model.22/Concat_1_output_0 /model.22/Reshape_1_output_0 0=-1 1=144
Reshape /model.22/Reshape_2 1 1 /model.22/Concat_2_output_0 /model.22/Reshape_2_output_0 0=-1 1=144
Concat /model.22/Concat_3 3 1 /model.22/Reshape_output_0 /model.22/Reshape_1_output_0 /model.22/Reshape_2_output_0 /model.22/Concat_3_output_0 0=1
Permute /model.22/Transpose 1 1 /model.22/Concat_3_output_0 output0 0=1

总结

​ 根据我的结果,如果能保证转换的Onnx模型是opset=12/11,输出维度是1×144×8400,转换为ncnn模型后得到的param结构最后一层是Permute层,就应该没有问题。

代码测试

​ 目前网上的参考代码应该有这个这个这个,其中前者的代码是针对分割任务的,中间的我测试会出现乱框现象,所以我采用的是最后一位大佬的代码(修改了一丢丢)。代码在下面,如果你是完全按照我的步骤来的话应该是可以直接用的,先不急着看,建议先看下面的修改内容。

yolo.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#ifndef YOLO_H
#define YOLO_H

#include <opencv2/core/core.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <net.h>

struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
};
struct GridAndStride
{
int grid0;
int grid1;
int stride;
};
class Yolo
{
public:
Yolo();

int load(int target_size, const float* mean_vals, const float* norm_vals, bool use_gpu = false);

int detect(const cv::Mat& rgb, std::vector<Object>& objects, float prob_threshold = 0.4f, float nms_threshold = 0.5f);

int draw(cv::Mat& rgb, const std::vector<Object>& objects);

private:
ncnn::Net yolo;
int target_size;
float mean_vals[3];
float norm_vals[3];
ncnn::UnlockedPoolAllocator blob_pool_allocator;
ncnn::PoolAllocator workspace_pool_allocator;
};

#endif

yolo.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
#include "yolo.h"

#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>

//#include "cpu.h"
#include <iostream>

static float fast_exp(float x)
{
union {
uint32_t i;
float f;
} v{};
v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f);
return v.f;
}

static float sigmoid(float x)
{
return 1.0f / (1.0f + fast_exp(-x));
}
static float intersection_area(const Object& a, const Object& b)
{
cv::Rect_<float> inter = a.rect & b.rect;
return inter.area();
}

static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
{
int i = left;
int j = right;
float p = faceobjects[(left + right) / 2].prob;

while (i <= j)
{
while (faceobjects[i].prob > p)
i++;

while (faceobjects[j].prob < p)
j--;

if (i <= j)
{
// swap
std::swap(faceobjects[i], faceobjects[j]);

i++;
j--;
}
}

// #pragma omp parallel sections
{
// #pragma omp section
{
if (left < j) qsort_descent_inplace(faceobjects, left, j);
}
// #pragma omp section
{
if (i < right) qsort_descent_inplace(faceobjects, i, right);
}
}
}

static void qsort_descent_inplace(std::vector<Object>& faceobjects)
{
if (faceobjects.empty())
return;

qsort_descent_inplace(faceobjects, 0, faceobjects.size() - 1);
}

static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
{
picked.clear();

const int n = faceobjects.size();

std::vector<float> areas(n);
for (int i = 0; i < n; i++)
{
areas[i] = faceobjects[i].rect.width * faceobjects[i].rect.height;
}

for (int i = 0; i < n; i++)
{
const Object& a = faceobjects[i];

int keep = 1;
for (int j = 0; j < (int)picked.size(); j++)
{
const Object& b = faceobjects[picked[j]];

// intersection over union
float inter_area = intersection_area(a, b);
float union_area = areas[i] + areas[picked[j]] - inter_area;
// float IoU = inter_area / union_area
if (inter_area / union_area > nms_threshold)
keep = 0;
}

if (keep)
picked.push_back(i);
}
}
static void generate_grids_and_stride(const int target_w, const int target_h, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
for (int i = 0; i < (int)strides.size(); i++)
{
int stride = strides[i];
int num_grid_w = target_w / stride;
int num_grid_h = target_h / stride;
for (int g1 = 0; g1 < num_grid_h; g1++)
{
for (int g0 = 0; g0 < num_grid_w; g0++)
{
GridAndStride gs;
gs.grid0 = g0;
gs.grid1 = g1;
gs.stride = stride;
grid_strides.push_back(gs);
}
}
}
}
static void generate_proposals(std::vector<GridAndStride> grid_strides, const ncnn::Mat& pred, float prob_threshold, std::vector<Object>& objects)
{
const int num_points = grid_strides.size();
const int num_class = 80; //<--------------
const int reg_max_1 = 16;

for (int i = 0; i < num_points; i++)
{
const float* scores = pred.row(i) + 4 * reg_max_1;

// find label with max score
int label = -1;
float score = -FLT_MAX;
for (int k = 0; k < num_class; k++)
{
float confidence = scores[k];
if (confidence > score)
{
label = k;
score = confidence;
}
}
float box_prob = sigmoid(score);
if (box_prob >= prob_threshold)
{
ncnn::Mat bbox_pred(reg_max_1, 4, (void*)pred.row(i));
{
ncnn::Layer* softmax = ncnn::create_layer("Softmax");

ncnn::ParamDict pd;
pd.set(0, 1); // axis
pd.set(1, 1);
softmax->load_param(pd);

ncnn::Option opt;
opt.num_threads = 1;
opt.use_packing_layout = false;

softmax->create_pipeline(opt);

softmax->forward_inplace(bbox_pred, opt);

softmax->destroy_pipeline(opt);

delete softmax;
}

float pred_ltrb[4];
for (int k = 0; k < 4; k++)
{
float dis = 0.f;
const float* dis_after_sm = bbox_pred.row(k);
for (int l = 0; l < reg_max_1; l++)
{
dis += l * dis_after_sm[l];
}

pred_ltrb[k] = dis * grid_strides[i].stride;
}

float pb_cx = (grid_strides[i].grid0 + 0.5f) * grid_strides[i].stride;
float pb_cy = (grid_strides[i].grid1 + 0.5f) * grid_strides[i].stride;

float x0 = pb_cx - pred_ltrb[0];
float y0 = pb_cy - pred_ltrb[1];
float x1 = pb_cx + pred_ltrb[2];
float y1 = pb_cy + pred_ltrb[3];

Object obj;
obj.rect.x = x0;
obj.rect.y = y0;
obj.rect.width = x1 - x0;
obj.rect.height = y1 - y0;
obj.label = label;
obj.prob = box_prob;

objects.push_back(obj);
}
}
}

Yolo::Yolo()
{
blob_pool_allocator.set_size_compare_ratio(0.f);
workspace_pool_allocator.set_size_compare_ratio(0.f);
}


int Yolo::load(int _target_size, const float* _mean_vals, const float* _norm_vals, bool use_gpu)
{
/* yolo.clear();
blob_pool_allocator.clear();
workspace_pool_allocator.clear()*/;

// ncnn::set_cpu_powersave(2);
// ncnn::set_omp_num_threads(ncnn::get_big_cpu_count());

yolo.opt = ncnn::Option();

//#if NCNN_VULKAN
// yolo.opt.use_vulkan_compute = use_gpu;
//#endif

//yolo.opt.num_threads = ncnn::get_big_cpu_count();
//yolo.opt.blob_allocator = &blob_pool_allocator;
//yolo.opt.workspace_allocator = &workspace_pool_allocator;



//QFile::copy("assets:/pic/yolov8s_opt.param", "yolov8s_opt.param");
//QFile::copy("assets:/pic/yolov8s_opt.bin", "yolov8s_opt.bin");
yolo.load_param("yolov8n-op11.param");
yolo.load_model("yolov8n-op11.bin");


// yolo.load_param("./model/yolov8s_opt.param");
// yolo.load_model("./model/yolov8s_opt.bin");

target_size = _target_size;
mean_vals[0] = _mean_vals[0];
mean_vals[1] = _mean_vals[1];
mean_vals[2] = _mean_vals[2];
norm_vals[0] = _norm_vals[0];
norm_vals[1] = _norm_vals[1];
norm_vals[2] = _norm_vals[2];

return 0;
}

int Yolo::detect(const cv::Mat& rgb, std::vector<Object>& objects, float prob_threshold, float nms_threshold)
{
int width = rgb.cols;
int height = rgb.rows;

// pad to multiple of 32
int w = width;
int h = height;
float scale = 1.f;
if (w > h)
{
scale = (float)target_size / w;
w = target_size;
h = h * scale;
}
else
{
scale = (float)target_size / h;
h = target_size;
w = w * scale;
}

ncnn::Mat in = ncnn::Mat::from_pixels_resize(rgb.data, ncnn::Mat::PIXEL_RGB2BGR, width, height, w, h);

// pad to target_size rectangle
int wpad = (w + 31) / 32 * 32 - w;
int hpad = (h + 31) / 32 * 32 - h;
ncnn::Mat in_pad;
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);

in_pad.substract_mean_normalize(0, norm_vals);

ncnn::Extractor ex = yolo.create_extractor();

ex.input("images", in_pad);

std::vector<Object> proposals;

ncnn::Mat out;
ex.extract("output0", out);

std::cout << "---------YES, " << out.dims << " width," << out.w << " height," << out.h << " , " << out.c << std::endl;
// 解析output + NMS TODO

std::vector<int> strides = { 8, 16, 32 }; // might have stride=64
std::vector<GridAndStride> grid_strides;
generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);
generate_proposals(grid_strides, out, prob_threshold, proposals);

// sort all proposals by score from highest to lowest
qsort_descent_inplace(proposals);

// apply nms with nms_threshold
std::vector<int> picked;
nms_sorted_bboxes(proposals, picked, nms_threshold);

int count = picked.size();

objects.resize(count);
for (int i = 0; i < count; i++)
{
objects[i] = proposals[picked[i]];

// adjust offset to original unpadded
float x0 = (objects[i].rect.x - (wpad / 2)) / scale;
float y0 = (objects[i].rect.y - (hpad / 2)) / scale;
float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;
float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;

// clip
x0 = std::max(std::min(x0, (float)(width - 1)), 0.f);
y0 = std::max(std::min(y0, (float)(height - 1)), 0.f);
x1 = std::max(std::min(x1, (float)(width - 1)), 0.f);
y1 = std::max(std::min(y1, (float)(height - 1)), 0.f);

objects[i].rect.x = x0;
objects[i].rect.y = y0;
objects[i].rect.width = x1 - x0;
objects[i].rect.height = y1 - y0;
}

// sort objects by area
struct
{
bool operator()(const Object& a, const Object& b) const
{
return a.rect.area() > b.rect.area();
}
} objects_area_greater;
std::sort(objects.begin(), objects.end(), objects_area_greater);

return 0;
}

int Yolo::draw(cv::Mat& rgb, const std::vector<Object>& objects)
{
static const char* class_names[] = {
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush"
};
// static const char* class_names[] = {"blur", "phone", "reflectLight", "reflection"};
static const unsigned char colors[81][3] = {
{56, 0, 255},
{226, 255, 0},
{0, 94, 255},
{0, 37, 255},
{0, 255, 94},
{255, 226, 0},
{0, 18, 255},
{255, 151, 0},
{170, 0, 255},
{0, 255, 56},
{255, 0, 75},
{0, 75, 255},
{0, 255, 169},
{255, 0, 207},
{75, 255, 0},
{207, 0, 255},
{37, 0, 255},
{0, 207, 255},
{94, 0, 255},
{0, 255, 113},
{255, 18, 0},
{255, 0, 56},
{18, 0, 255},
{0, 255, 226},
{170, 255, 0},
{255, 0, 245},
{151, 255, 0},
{132, 255, 0},
{75, 0, 255},
{151, 0, 255},
{0, 151, 255},
{132, 0, 255},
{0, 255, 245},
{255, 132, 0},
{226, 0, 255},
{255, 37, 0},
{207, 255, 0},
{0, 255, 207},
{94, 255, 0},
{0, 226, 255},
{56, 255, 0},
{255, 94, 0},
{255, 113, 0},
{0, 132, 255},
{255, 0, 132},
{255, 170, 0},
{255, 0, 188},
{113, 255, 0},
{245, 0, 255},
{113, 0, 255},
{255, 188, 0},
{0, 113, 255},
{255, 0, 0},
{0, 56, 255},
{255, 0, 113},
{0, 255, 188},
{255, 0, 94},
{255, 0, 18},
{18, 255, 0},
{0, 255, 132},
{0, 188, 255},
{0, 245, 255},
{0, 169, 255},
{37, 255, 0},
{255, 0, 151},
{188, 0, 255},
{0, 255, 37},
{0, 255, 0},
{255, 0, 170},
{255, 0, 37},
{255, 75, 0},
{0, 0, 255},
{255, 207, 0},
{255, 0, 226},
{255, 245, 0},
{188, 255, 0},
{0, 255, 18},
{0, 255, 75},
{0, 255, 151},
{255, 56, 0},
{245, 255, 0}
};

int color_index = 0;

for (size_t i = 0; i < objects.size(); i++)
{
const Object& obj = objects[i];

// fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
// obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);

const unsigned char* color = colors[color_index % 19];
color_index++;

cv::Scalar cc(color[0], color[1], color[2]);

cv::rectangle(rgb, obj.rect, cc, 2);

char text[256];
sprintf_s(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
//std::string text = class_names[obj.label] + std::to_string(round(obj.prob * 10000)/100);

int baseLine = 0;
cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

int x = obj.rect.x;
int y = obj.rect.y - label_size.height - baseLine;
if (y < 0)
y = 0;
if (x + label_size.width > rgb.cols)
x = rgb.cols - label_size.width;

cv::rectangle(rgb, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)), cc, -1);

cv::Scalar textcc = (color[0] + color[1] + color[2] >= 381) ? cv::Scalar(0, 0, 0) : cv::Scalar(255, 255, 255);

cv::putText(rgb, text, cv::Point(x, y + label_size.height), cv::FONT_HERSHEY_SIMPLEX, 0.5, textcc, 1);
}
cv::imshow("image", rgb);
//cv::imwrite("demo.png", image);
cv::waitKey(0);
return 0;
}

main.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include "yolo.h"
#include "net.h"

#if defined(USE_NCNN_SIMPLEOCV)
#include "simpleocv.h"
#else
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#endif

#include <stdlib.h>
#include <float.h>
#include <stdio.h>
#include <vector>
#include <iostream>

int main()
{
Yolo *yolov8 = new Yolo();
cv::Mat m = cv::imread("image1.png", 1);
int target_size = 640;
float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };
float mean_vals[3] = { 103.53f, 116.28f, 123.675f };

yolov8->load(target_size, mean_vals, norm_vals);
std::vector<Object> objects;
yolov8->detect(m, objects);
std::cout << objects.size();
yolov8->draw(m, objects);

return 0;
}

修改内容(只修改yolo.cpp,修改后的内容参考上面的代码)

加载的模型

​ 在大概256行的位置要把加载的模型修改为你自己的,上面的QFile也注释掉,如果你没有装Qt的话

类别数量和类别标签

​ 在大概148行的位置把类别数量设置成和你的模型匹配,以本教程使用的CoCo目标检测模型为例是80

对应的类别标签也要修改,大概370的位置

总结

​ 加载的模型要改,类别数量要改,类别标签要改。

测试结果

​ 采用官方提供的yolov8n.pt模型,选择opset=12,simplify=True,使用在线网站转换时三项全部勾选,转换得到模型测试效果如下:

总结

​ 模型转换参数opset不能为13,转换得到输出维度应该是1×144×8400,模型结构最后一层应该是Permute。如果已经修改源码,但是转换一直不成功,可能是环境冲突,也就是你调用的ultralytics包并不是你修改过的。

​ 网上有多个版本的测试代码,如果模型转换正确,都能够成功运行,但是预测结果不一定正确,可以多找几个版本的代码测试。

推理代码解读

Object类

1
2
3
4
5
6
7
8
struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
cv::Mat mask;
std::vector<float> mask_feat;
};

GridAndStride类

1
2
3
4
5
6
struct GridAndStride
{
int grid0;
int grid1;
int stride;
};

main

1
2
3
4
5
6
7
8
int main(int argc, char** argv)
{
cv::Mat m = cv::imread("20240416111118.png", 1);
std::vector<Object> objects;
detect_yolov8(m, objects);
draw_objects(m, objects);
return 0;
}

在 OpenCV 中,Mat 类的主要成员变量如下:

  1. int flags:用于存储关于矩阵的一些标志,例如矩阵的类型、通道数等信息。
  2. int dims:表示矩阵的维数。
  3. int rows:表示矩阵的行数。
  4. int cols:表示矩阵的列数。
  5. uchar* data:指向矩阵数据存储的指针。
  6. std::shared_ptr<MatAllocator> allocator:智能指针,用于内存分配和释放。
  7. std::vector<int> size:存储矩阵每个维度的大小。
  8. std::vector<int> step:存储矩阵每个维度的步长。

detect_yolo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
static int detect_yolov8(const cv::Mat& bgr, std::vector<Object>& objects)
{
ncnn::Net yolov8; //定义网络

yolov8.load_param("yolov8s-op13.param"); //加载参数
yolov8.load_model("yolov8s-op13.bin"); //加载模型

int width = bgr.cols; //图像的宽
int height = bgr.rows;//图像的高

const int target_size = 640;//输入图片大小?
const float prob_threshold = 0.4f;
const float nms_threshold = 0.5f;

// 在图像旁边补像素
int w = width;
int h = height;
float scale = 1.f;
/*

在C++中,这三种写法的区别在于数值类型和表示精度的方式:

float scale = 1.f;:这种写法明确地将数值标记为单精度浮点数,.f 表示这是一个浮点数,用于表示精度。这样做可以确保初始化的值是单精度浮点数,并且更加清晰和明确。
float scale = 1.0;:这种写法使用的是双精度浮点数(双精度浮点数默认情况下用 double 类型表示)。虽然这也可以用于初始化一个 float 类型的变量,但它是一个双精度浮点数,会占用更多的内存,而且可能会引入不必要的精度。
float scale = 1;:这种写法使用的是整数,编译器会自动将整数转换为单精度浮点数,然后再赋值给 float 类型的变量。虽然这种写法是合法的,但它的可读性较差,并且可能引入不必要的类型转换。
因此,推荐使用 float scale = 1.f; 这种写法,因为它更加明确和清晰,可以确保初始化的值是单精度浮点数,并且不会引入不必要的类型转换或精度损失。
*/

//把图像给变成1:1
if (w > h)
{
scale = (float)target_size / w;
w = target_size;
h = h * scale;
}
else
{
scale = (float)target_size / h;
h = target_size;
w = w * scale;
}

ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, width, height, w, h);

// pad to target_size rectangle
int wpad = (w + 31) / 32 * 32 - w; //计算宽度方向上需要填充的像素数,确保填充后的宽度是 32 的整数倍。
int hpad = (h + 31) / 32 * 32 - h; //计算高度方向上需要填充的像素数,确保填充后的高度是 32 的整数倍
ncnn::Mat in_pad;
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);

const float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };
in_pad.substract_mean_normalize(0, norm_vals);


ncnn::Extractor ex = yolov8.create_extractor();
ex.input("images", in_pad);

ncnn::Mat out;
ex.extract("output0", out);

ncnn::Mat mask_proto;
ex.extract("output1", mask_proto);

std::vector<int> strides = { 8, 16, 32 };
std::vector<GridAndStride> grid_strides;
generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);

std::vector<Object> proposals;
std::vector<Object> objects8;
generate_proposals(grid_strides, out, prob_threshold, objects8);

proposals.insert(proposals.end(), objects8.begin(), objects8.end());

// sort all proposals by score from highest to lowest
qsort_descent_inplace(proposals);

// apply nms with nms_threshold
std::vector<int> picked;
nms_sorted_bboxes(proposals, picked, nms_threshold);

int count = picked.size();

ncnn::Mat mask_feat = ncnn::Mat(32, count, sizeof(float));
for (int i = 0; i < count; i++) {
float* mask_feat_ptr = mask_feat.row(i);
std::memcpy(mask_feat_ptr, proposals[picked[i]].mask_feat.data(), sizeof(float) * proposals[picked[i]].mask_feat.size());
}

ncnn::Mat mask_pred_result;
decode_mask(mask_feat, width, height, mask_proto, in_pad, wpad, hpad, mask_pred_result);

objects.resize(count);
for (int i = 0; i < count; i++)
{
objects[i] = proposals[picked[i]];

// adjust offset to original unpadded
float x0 = (objects[i].rect.x - (wpad / 2)) / scale;
float y0 = (objects[i].rect.y - (hpad / 2)) / scale;
float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;
float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;

// clip
x0 = std::max(std::min(x0, (float)(width - 1)), 0.f);
y0 = std::max(std::min(y0, (float)(height - 1)), 0.f);
x1 = std::max(std::min(x1, (float)(width - 1)), 0.f);
y1 = std::max(std::min(y1, (float)(height - 1)), 0.f);

objects[i].rect.x = x0;
objects[i].rect.y = y0;
objects[i].rect.width = x1 - x0;
objects[i].rect.height = y1 - y0;

objects[i].mask = cv::Mat::zeros(height, width, CV_32FC1);
cv::Mat mask = cv::Mat(height, width, CV_32FC1, (float*)mask_pred_result.channel(i));
mask(objects[i].rect).copyTo(objects[i].mask(objects[i].rect));
}

return 0;
}