将yolov8部署到安卓——测试

更换OpenCV版本

需要在VS2019中测试ncnn推理yolo的效果，虽然在上一篇博客中我们安装OpenCV-Mobile，但是由于OpenCV-Mobile阉割了一些功能，所以为了方便测试，我们还是先换回Window的OpenCV，好在配置起来也不算麻烦，我下载的是4.8.0版本。下载后解压，配置环境变量，再在VS2019中类似上一篇博客配置即可，因为时间间隔有些久，具体步骤我记不太清楚了，所以这一部分还请参考其他博客

opencv modules included

module	comment
opencv_core	Mat, matrix operations, etc
opencv_imgproc	resize, cvtColor, warpAffine, etc
opencv_highgui	imread, imwrite
opencv_features2d	keypoint feature and matcher, etc (not included in opencv 2.x package)
opencv_photo	inpaint, etc
opencv_video	opticalflow, etc

opencv modules discarded

module	comment
opencv_androidcamera	use android Camera api instead
opencv_calib3d	camera calibration, rare uses on mobile
opencv_contrib	experimental functions, build part of the source externally if you need
opencv_dnn	very slow on mobile, try ncnn for neural network inference on mobile
opencv_dynamicuda	no cuda on mobile
opencv_flann	feature matching, rare uses on mobile, build the source externally if you need
opencv_gapi	graph based image processing, little gain on mobile
opencv_gpu	no cuda/opencl on mobile
opencv_imgcodecs	link with opencv_highgui instead
opencv_java	wrap your c++ code with jni
opencv_js	write native code on mobile
opencv_legacy	various good-old cv routines, build part of the source externally if you need
opencv_ml	train your ML algorithm on powerful pc or server
opencv_nonfree	the SIFT and SURF, use ORB which is faster and better
opencv_objdetect	HOG, cascade detector, use deep learning detector which is faster and better
opencv_ocl	no opencl on mobile
opencv_python	no python on mobile
opencv_shape	shape matching, rare uses on mobile, build the source externally if you need
opencv_stitching	image stitching, rare uses on mobile, build the source externally if you need
opencv_superres	do video super-resolution on powerful pc or server
opencv_ts	test modules, useless in production anyway
opencv_videoio	use android MediaCodec or ios AVFoundation api instead
opencv_videostab	do video stablization on powerful pc or server
opencv_viz	vtk is not available on mobile, write your own data visualization routines

模型转换（以Ultralytics官方提供的CoCo目标检测预训练权重为例）

这一步可以说是我踩坑最多的步骤了，碰到问题可以参考这篇Wiki和Issues，你能够得到大部分的解答。并且由于我转换的整个过程踩坑众多，所以很多细节也已经记不太清楚，不能保证我的步骤一定能够成功，仅供参考。

安装yolov8

这一步我在之前的博客也已经提过，网上也是教程众多，单纯的复述一遍意义不大，所以我打算把我踩的坑记录一下，希望能有启发作用。

首先是关于选择pip install ultralytics还是git clone的方法

如果你看了上面的Wiki和issues的话，可以知道转换模型前是需要修改源码的。所以按理来说应该选择git clone的方法更方便修改，所以我一开始也是选择的git clone，但是

关于模型转换参数的问题

基于我找到的测试代码，我只发现opset=13转换得到的模型会出现全屏检测框的问题，opset=12或11都没问题，其他参数的影响并不清楚。

关于转换得到的模型输出维度问题

最终的维度应该是1×144×8400，而不是1×84×8400，根据在ultralytics/nn/modules/head.py中修改的内容，forward输出应该是被拦腰截段了的，也就是说我们等于取了Yolo推理的中间结果，再完成后面的部分，所以送到ncnn的模型的输出维度并不是最终的输出维度。

def forward(self, x):
    """Concatenates and returns predicted bounding boxes and class probabilities."""
    for i in range(self.nl):
        x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
    if self.training:  # Training path
        return x

    # Inference path
    shape = x[0].shape  # BCHW

    if self.dynamic or self.shape != shape:
        self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
        self.shape = shape
    pred = torch.cat([xi.view(shape[0], self.no, -1) for xi in x],2).permute(0,2,1)
    return pred

    # x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
    # if self.export and self.format in {"saved_model", "pb", "tflite", "edgetpu", "tfjs"}:  # avoid TF FlexSplitV ops
    #     box = x_cat[:, : self.reg_max * 4]
    #     cls = x_cat[:, self.reg_max * 4 :]
    # else:
    #     box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
    #
    # if self.export and self.format in {"tflite", "edgetpu"}:
    #     # Precompute normalization factor to increase numerical stability
    #     # See https://github.com/ultralytics/ultralytics/issues/7371
    #     grid_h = shape[2]
    #     grid_w = shape[3]
    #     grid_size = torch.tensor([grid_w, grid_h, grid_w, grid_h], device=box.device).reshape(1, 4, 1)
    #     norm = self.strides / (self.stride[0] * grid_size)
    #     dbox = self.decode_bboxes(self.dfl(box) * norm, self.anchors.unsqueeze(0) * norm[:, :2])
    # else:
    #     dbox = self.decode_bboxes(self.dfl(box), self.anchors.unsqueeze(0)) * self.strides
    #
    # y = torch.cat((dbox, cls.sigmoid()), 1)
    # return y if self.export else (y, x)

关于转换得到的模型结构问题

打开转换后得到的.parm文件，可以查看模型的结构。主要关注最后一层是不是Permute

7767517
181 217
Input                    images                   0 1 images
Convolution              /model.0/conv/Conv       1 1 images /model.0/conv/Conv_output_0 0=16 1=3 3=2 4=1 5=1 6=432
Swish                    /model.0/act/Mul         1 1 /model.0/conv/Conv_output_0 /model.0/act/Mul_output_0
Convolution              /model.1/conv/Conv       1 1 /model.0/act/Mul_output_0 /model.1/conv/Conv_output_0 0=32 1=3 3=2 4=1 5=1 6=4608
Swish                    /model.1/act/Mul         1 1 /model.1/conv/Conv_output_0 /model.1/act/Mul_output_0
Convolution              /model.2/cv1/conv/Conv   1 1 /model.1/act/Mul_output_0 /model.2/cv1/conv/Conv_output_0 0=32 1=1 5=1 6=1024
Swish                    /model.2/cv1/act/Mul     1 1 /model.2/cv1/conv/Conv_output_0 /model.2/cv1/act/Mul_output_0
Slice                    /model.2/Split           1 2 /model.2/cv1/act/Mul_output_0 /model.2/Split_output_0 /model.2/Split_output_1 -23300=2,16,-233
Split                    splitncnn_0              1 3 /model.2/Split_output_1 /model.2/Split_output_1_splitncnn_0 /model.2/Split_output_1_splitncnn_1 /model.2/Split_output_1_splitncnn_2
Convolution              /model.2/m.0/cv1/conv/Conv 1 1 /model.2/Split_output_1_splitncnn_2 /model.2/m.0/cv1/conv/Conv_output_0 0=16 1=3 4=1 5=1 6=2304
Swish                    /model.2/m.0/cv1/act/Mul 1 1 /model.2/m.0/cv1/conv/Conv_output_0 /model.2/m.0/cv1/act/Mul_output_0
Convolution              /model.2/m.0/cv2/conv/Conv 1 1 /model.2/m.0/cv1/act/Mul_output_0 /model.2/m.0/cv2/conv/Conv_output_0 0=16 1=3 4=1 5=1 6=2304
Swish                    /model.2/m.0/cv2/act/Mul 1 1 /model.2/m.0/cv2/conv/Conv_output_0 /model.2/m.0/cv2/act/Mul_output_0
BinaryOp                 /model.2/m.0/Add         2 1 /model.2/Split_output_1_splitncnn_1 /model.2/m.0/cv2/act/Mul_output_0 /model.2/m.0/Add_output_0
Concat                   /model.2/Concat          3 1 /model.2/Split_output_0 /model.2/Split_output_1_splitncnn_0 /model.2/m.0/Add_output_0 /model.2/Concat_output_0
Convolution              /model.2/cv2/conv/Conv   1 1 /model.2/Concat_output_0 /model.2/cv2/conv/Conv_output_0 0=32 1=1 5=1 6=1536
Swish                    /model.2/cv2/act/Mul     1 1 /model.2/cv2/conv/Conv_output_0 /model.2/cv2/act/Mul_output_0
Convolution              /model.3/conv/Conv       1 1 /model.2/cv2/act/Mul_output_0 /model.3/conv/Conv_output_0 0=64 1=3 3=2 4=1 5=1 6=18432
Swish                    /model.3/act/Mul         1 1 /model.3/conv/Conv_output_0 /model.3/act/Mul_output_0
Convolution              /model.4/cv1/conv/Conv   1 1 /model.3/act/Mul_output_0 /model.4/cv1/conv/Conv_output_0 0=64 1=1 5=1 6=4096
Swish                    /model.4/cv1/act/Mul     1 1 /model.4/cv1/conv/Conv_output_0 /model.4/cv1/act/Mul_output_0
Slice                    /model.4/Split           1 2 /model.4/cv1/act/Mul_output_0 /model.4/Split_output_0 /model.4/Split_output_1 -23300=2,32,-233
Split                    splitncnn_1              1 3 /model.4/Split_output_1 /model.4/Split_output_1_splitncnn_0 /model.4/Split_output_1_splitncnn_1 /model.4/Split_output_1_splitncnn_2
Convolution              /model.4/m.0/cv1/conv/Conv 1 1 /model.4/Split_output_1_splitncnn_2 /model.4/m.0/cv1/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish                    /model.4/m.0/cv1/act/Mul 1 1 /model.4/m.0/cv1/conv/Conv_output_0 /model.4/m.0/cv1/act/Mul_output_0
Convolution              /model.4/m.0/cv2/conv/Conv 1 1 /model.4/m.0/cv1/act/Mul_output_0 /model.4/m.0/cv2/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish                    /model.4/m.0/cv2/act/Mul 1 1 /model.4/m.0/cv2/conv/Conv_output_0 /model.4/m.0/cv2/act/Mul_output_0
BinaryOp                 /model.4/m.0/Add         2 1 /model.4/Split_output_1_splitncnn_1 /model.4/m.0/cv2/act/Mul_output_0 /model.4/m.0/Add_output_0
Split                    splitncnn_2              1 3 /model.4/m.0/Add_output_0 /model.4/m.0/Add_output_0_splitncnn_0 /model.4/m.0/Add_output_0_splitncnn_1 /model.4/m.0/Add_output_0_splitncnn_2
Convolution              /model.4/m.1/cv1/conv/Conv 1 1 /model.4/m.0/Add_output_0_splitncnn_2 /model.4/m.1/cv1/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish                    /model.4/m.1/cv1/act/Mul 1 1 /model.4/m.1/cv1/conv/Conv_output_0 /model.4/m.1/cv1/act/Mul_output_0
Convolution              /model.4/m.1/cv2/conv/Conv 1 1 /model.4/m.1/cv1/act/Mul_output_0 /model.4/m.1/cv2/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish                    /model.4/m.1/cv2/act/Mul 1 1 /model.4/m.1/cv2/conv/Conv_output_0 /model.4/m.1/cv2/act/Mul_output_0
BinaryOp                 /model.4/m.1/Add         2 1 /model.4/m.0/Add_output_0_splitncnn_1 /model.4/m.1/cv2/act/Mul_output_0 /model.4/m.1/Add_output_0
Concat                   /model.4/Concat          4 1 /model.4/Split_output_0 /model.4/Split_output_1_splitncnn_0 /model.4/m.0/Add_output_0_splitncnn_0 /model.4/m.1/Add_output_0 /model.4/Concat_output_0
Convolution              /model.4/cv2/conv/Conv   1 1 /model.4/Concat_output_0 /model.4/cv2/conv/Conv_output_0 0=64 1=1 5=1 6=8192
Swish                    /model.4/cv2/act/Mul     1 1 /model.4/cv2/conv/Conv_output_0 /model.4/cv2/act/Mul_output_0
Split                    splitncnn_3              1 2 /model.4/cv2/act/Mul_output_0 /model.4/cv2/act/Mul_output_0_splitncnn_0 /model.4/cv2/act/Mul_output_0_splitncnn_1
Convolution              /model.5/conv/Conv       1 1 /model.4/cv2/act/Mul_output_0_splitncnn_1 /model.5/conv/Conv_output_0 0=128 1=3 3=2 4=1 5=1 6=73728
Swish                    /model.5/act/Mul         1 1 /model.5/conv/Conv_output_0 /model.5/act/Mul_output_0
Convolution              /model.6/cv1/conv/Conv   1 1 /model.5/act/Mul_output_0 /model.6/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=16384
Swish                    /model.6/cv1/act/Mul     1 1 /model.6/cv1/conv/Conv_output_0 /model.6/cv1/act/Mul_output_0
Slice                    /model.6/Split           1 2 /model.6/cv1/act/Mul_output_0 /model.6/Split_output_0 /model.6/Split_output_1 -23300=2,64,-233
Split                    splitncnn_4              1 3 /model.6/Split_output_1 /model.6/Split_output_1_splitncnn_0 /model.6/Split_output_1_splitncnn_1 /model.6/Split_output_1_splitncnn_2
Convolution              /model.6/m.0/cv1/conv/Conv 1 1 /model.6/Split_output_1_splitncnn_2 /model.6/m.0/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.6/m.0/cv1/act/Mul 1 1 /model.6/m.0/cv1/conv/Conv_output_0 /model.6/m.0/cv1/act/Mul_output_0
Convolution              /model.6/m.0/cv2/conv/Conv 1 1 /model.6/m.0/cv1/act/Mul_output_0 /model.6/m.0/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.6/m.0/cv2/act/Mul 1 1 /model.6/m.0/cv2/conv/Conv_output_0 /model.6/m.0/cv2/act/Mul_output_0
BinaryOp                 /model.6/m.0/Add         2 1 /model.6/Split_output_1_splitncnn_1 /model.6/m.0/cv2/act/Mul_output_0 /model.6/m.0/Add_output_0
Split                    splitncnn_5              1 3 /model.6/m.0/Add_output_0 /model.6/m.0/Add_output_0_splitncnn_0 /model.6/m.0/Add_output_0_splitncnn_1 /model.6/m.0/Add_output_0_splitncnn_2
Convolution              /model.6/m.1/cv1/conv/Conv 1 1 /model.6/m.0/Add_output_0_splitncnn_2 /model.6/m.1/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.6/m.1/cv1/act/Mul 1 1 /model.6/m.1/cv1/conv/Conv_output_0 /model.6/m.1/cv1/act/Mul_output_0
Convolution              /model.6/m.1/cv2/conv/Conv 1 1 /model.6/m.1/cv1/act/Mul_output_0 /model.6/m.1/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.6/m.1/cv2/act/Mul 1 1 /model.6/m.1/cv2/conv/Conv_output_0 /model.6/m.1/cv2/act/Mul_output_0
BinaryOp                 /model.6/m.1/Add         2 1 /model.6/m.0/Add_output_0_splitncnn_1 /model.6/m.1/cv2/act/Mul_output_0 /model.6/m.1/Add_output_0
Concat                   /model.6/Concat          4 1 /model.6/Split_output_0 /model.6/Split_output_1_splitncnn_0 /model.6/m.0/Add_output_0_splitncnn_0 /model.6/m.1/Add_output_0 /model.6/Concat_output_0
Convolution              /model.6/cv2/conv/Conv   1 1 /model.6/Concat_output_0 /model.6/cv2/conv/Conv_output_0 0=128 1=1 5=1 6=32768
Swish                    /model.6/cv2/act/Mul     1 1 /model.6/cv2/conv/Conv_output_0 /model.6/cv2/act/Mul_output_0
Split                    splitncnn_6              1 2 /model.6/cv2/act/Mul_output_0 /model.6/cv2/act/Mul_output_0_splitncnn_0 /model.6/cv2/act/Mul_output_0_splitncnn_1
Convolution              /model.7/conv/Conv       1 1 /model.6/cv2/act/Mul_output_0_splitncnn_1 /model.7/conv/Conv_output_0 0=256 1=3 3=2 4=1 5=1 6=294912
Swish                    /model.7/act/Mul         1 1 /model.7/conv/Conv_output_0 /model.7/act/Mul_output_0
Convolution              /model.8/cv1/conv/Conv   1 1 /model.7/act/Mul_output_0 /model.8/cv1/conv/Conv_output_0 0=256 1=1 5=1 6=65536
Swish                    /model.8/cv1/act/Mul     1 1 /model.8/cv1/conv/Conv_output_0 /model.8/cv1/act/Mul_output_0
Slice                    /model.8/Split           1 2 /model.8/cv1/act/Mul_output_0 /model.8/Split_output_0 /model.8/Split_output_1 -23300=2,128,-233
Split                    splitncnn_7              1 3 /model.8/Split_output_1 /model.8/Split_output_1_splitncnn_0 /model.8/Split_output_1_splitncnn_1 /model.8/Split_output_1_splitncnn_2
Convolution              /model.8/m.0/cv1/conv/Conv 1 1 /model.8/Split_output_1_splitncnn_2 /model.8/m.0/cv1/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish                    /model.8/m.0/cv1/act/Mul 1 1 /model.8/m.0/cv1/conv/Conv_output_0 /model.8/m.0/cv1/act/Mul_output_0
Convolution              /model.8/m.0/cv2/conv/Conv 1 1 /model.8/m.0/cv1/act/Mul_output_0 /model.8/m.0/cv2/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish                    /model.8/m.0/cv2/act/Mul 1 1 /model.8/m.0/cv2/conv/Conv_output_0 /model.8/m.0/cv2/act/Mul_output_0
BinaryOp                 /model.8/m.0/Add         2 1 /model.8/Split_output_1_splitncnn_1 /model.8/m.0/cv2/act/Mul_output_0 /model.8/m.0/Add_output_0
Concat                   /model.8/Concat          3 1 /model.8/Split_output_0 /model.8/Split_output_1_splitncnn_0 /model.8/m.0/Add_output_0 /model.8/Concat_output_0
Convolution              /model.8/cv2/conv/Conv   1 1 /model.8/Concat_output_0 /model.8/cv2/conv/Conv_output_0 0=256 1=1 5=1 6=98304
Swish                    /model.8/cv2/act/Mul     1 1 /model.8/cv2/conv/Conv_output_0 /model.8/cv2/act/Mul_output_0
Convolution              /model.9/cv1/conv/Conv   1 1 /model.8/cv2/act/Mul_output_0 /model.9/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=32768
Swish                    /model.9/cv1/act/Mul     1 1 /model.9/cv1/conv/Conv_output_0 /model.9/cv1/act/Mul_output_0
Split                    splitncnn_8              1 2 /model.9/cv1/act/Mul_output_0 /model.9/cv1/act/Mul_output_0_splitncnn_0 /model.9/cv1/act/Mul_output_0_splitncnn_1
Pooling                  /model.9/m/MaxPool       1 1 /model.9/cv1/act/Mul_output_0_splitncnn_1 /model.9/m/MaxPool_output_0 1=5 3=2 5=1
Split                    splitncnn_9              1 2 /model.9/m/MaxPool_output_0 /model.9/m/MaxPool_output_0_splitncnn_0 /model.9/m/MaxPool_output_0_splitncnn_1
Pooling                  /model.9/m_1/MaxPool     1 1 /model.9/m/MaxPool_output_0_splitncnn_1 /model.9/m_1/MaxPool_output_0 1=5 3=2 5=1
Split                    splitncnn_10             1 2 /model.9/m_1/MaxPool_output_0 /model.9/m_1/MaxPool_output_0_splitncnn_0 /model.9/m_1/MaxPool_output_0_splitncnn_1
Pooling                  /model.9/m_2/MaxPool     1 1 /model.9/m_1/MaxPool_output_0_splitncnn_1 /model.9/m_2/MaxPool_output_0 1=5 3=2 5=1
Concat                   /model.9/Concat          4 1 /model.9/cv1/act/Mul_output_0_splitncnn_0 /model.9/m/MaxPool_output_0_splitncnn_0 /model.9/m_1/MaxPool_output_0_splitncnn_0 /model.9/m_2/MaxPool_output_0 /model.9/Concat_output_0
Convolution              /model.9/cv2/conv/Conv   1 1 /model.9/Concat_output_0 /model.9/cv2/conv/Conv_output_0 0=256 1=1 5=1 6=131072
Swish                    /model.9/cv2/act/Mul     1 1 /model.9/cv2/conv/Conv_output_0 /model.9/cv2/act/Mul_output_0
Split                    splitncnn_11             1 2 /model.9/cv2/act/Mul_output_0 /model.9/cv2/act/Mul_output_0_splitncnn_0 /model.9/cv2/act/Mul_output_0_splitncnn_1
Interp                   /model.10/Resize         1 1 /model.9/cv2/act/Mul_output_0_splitncnn_1 /model.10/Resize_output_0 0=1 1=2.000000e+00 2=2.000000e+00
Concat                   /model.11/Concat         2 1 /model.10/Resize_output_0 /model.6/cv2/act/Mul_output_0_splitncnn_0 /model.11/Concat_output_0
Convolution              /model.12/cv1/conv/Conv  1 1 /model.11/Concat_output_0 /model.12/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=49152
Swish                    /model.12/cv1/act/Mul    1 1 /model.12/cv1/conv/Conv_output_0 /model.12/cv1/act/Mul_output_0
Slice                    /model.12/Split          1 2 /model.12/cv1/act/Mul_output_0 /model.12/Split_output_0 /model.12/Split_output_1 -23300=2,64,-233
Split                    splitncnn_12             1 2 /model.12/Split_output_1 /model.12/Split_output_1_splitncnn_0 /model.12/Split_output_1_splitncnn_1
Convolution              /model.12/m.0/cv1/conv/Conv 1 1 /model.12/Split_output_1_splitncnn_1 /model.12/m.0/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.12/m.0/cv1/act/Mul 1 1 /model.12/m.0/cv1/conv/Conv_output_0 /model.12/m.0/cv1/act/Mul_output_0
Convolution              /model.12/m.0/cv2/conv/Conv 1 1 /model.12/m.0/cv1/act/Mul_output_0 /model.12/m.0/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.12/m.0/cv2/act/Mul 1 1 /model.12/m.0/cv2/conv/Conv_output_0 /model.12/m.0/cv2/act/Mul_output_0
Concat                   /model.12/Concat         3 1 /model.12/Split_output_0 /model.12/Split_output_1_splitncnn_0 /model.12/m.0/cv2/act/Mul_output_0 /model.12/Concat_output_0
Convolution              /model.12/cv2/conv/Conv  1 1 /model.12/Concat_output_0 /model.12/cv2/conv/Conv_output_0 0=128 1=1 5=1 6=24576
Swish                    /model.12/cv2/act/Mul    1 1 /model.12/cv2/conv/Conv_output_0 /model.12/cv2/act/Mul_output_0
Split                    splitncnn_13             1 2 /model.12/cv2/act/Mul_output_0 /model.12/cv2/act/Mul_output_0_splitncnn_0 /model.12/cv2/act/Mul_output_0_splitncnn_1
Interp                   /model.13/Resize         1 1 /model.12/cv2/act/Mul_output_0_splitncnn_1 /model.13/Resize_output_0 0=1 1=2.000000e+00 2=2.000000e+00
Concat                   /model.14/Concat         2 1 /model.13/Resize_output_0 /model.4/cv2/act/Mul_output_0_splitncnn_0 /model.14/Concat_output_0
Convolution              /model.15/cv1/conv/Conv  1 1 /model.14/Concat_output_0 /model.15/cv1/conv/Conv_output_0 0=64 1=1 5=1 6=12288
Swish                    /model.15/cv1/act/Mul    1 1 /model.15/cv1/conv/Conv_output_0 /model.15/cv1/act/Mul_output_0
Slice                    /model.15/Split          1 2 /model.15/cv1/act/Mul_output_0 /model.15/Split_output_0 /model.15/Split_output_1 -23300=2,32,-233
Split                    splitncnn_14             1 2 /model.15/Split_output_1 /model.15/Split_output_1_splitncnn_0 /model.15/Split_output_1_splitncnn_1
Convolution              /model.15/m.0/cv1/conv/Conv 1 1 /model.15/Split_output_1_splitncnn_1 /model.15/m.0/cv1/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish                    /model.15/m.0/cv1/act/Mul 1 1 /model.15/m.0/cv1/conv/Conv_output_0 /model.15/m.0/cv1/act/Mul_output_0
Convolution              /model.15/m.0/cv2/conv/Conv 1 1 /model.15/m.0/cv1/act/Mul_output_0 /model.15/m.0/cv2/conv/Conv_output_0 0=32 1=3 4=1 5=1 6=9216
Swish                    /model.15/m.0/cv2/act/Mul 1 1 /model.15/m.0/cv2/conv/Conv_output_0 /model.15/m.0/cv2/act/Mul_output_0
Concat                   /model.15/Concat         3 1 /model.15/Split_output_0 /model.15/Split_output_1_splitncnn_0 /model.15/m.0/cv2/act/Mul_output_0 /model.15/Concat_output_0
Convolution              /model.15/cv2/conv/Conv  1 1 /model.15/Concat_output_0 /model.15/cv2/conv/Conv_output_0 0=64 1=1 5=1 6=6144
Swish                    /model.15/cv2/act/Mul    1 1 /model.15/cv2/conv/Conv_output_0 /model.15/cv2/act/Mul_output_0
Split                    splitncnn_15             1 3 /model.15/cv2/act/Mul_output_0 /model.15/cv2/act/Mul_output_0_splitncnn_0 /model.15/cv2/act/Mul_output_0_splitncnn_1 /model.15/cv2/act/Mul_output_0_splitncnn_2
Convolution              /model.16/conv/Conv      1 1 /model.15/cv2/act/Mul_output_0_splitncnn_2 /model.16/conv/Conv_output_0 0=64 1=3 3=2 4=1 5=1 6=36864
Swish                    /model.16/act/Mul        1 1 /model.16/conv/Conv_output_0 /model.16/act/Mul_output_0
Concat                   /model.17/Concat         2 1 /model.16/act/Mul_output_0 /model.12/cv2/act/Mul_output_0_splitncnn_0 /model.17/Concat_output_0
Convolution              /model.18/cv1/conv/Conv  1 1 /model.17/Concat_output_0 /model.18/cv1/conv/Conv_output_0 0=128 1=1 5=1 6=24576
Swish                    /model.18/cv1/act/Mul    1 1 /model.18/cv1/conv/Conv_output_0 /model.18/cv1/act/Mul_output_0
Slice                    /model.18/Split          1 2 /model.18/cv1/act/Mul_output_0 /model.18/Split_output_0 /model.18/Split_output_1 -23300=2,64,-233
Split                    splitncnn_16             1 2 /model.18/Split_output_1 /model.18/Split_output_1_splitncnn_0 /model.18/Split_output_1_splitncnn_1
Convolution              /model.18/m.0/cv1/conv/Conv 1 1 /model.18/Split_output_1_splitncnn_1 /model.18/m.0/cv1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.18/m.0/cv1/act/Mul 1 1 /model.18/m.0/cv1/conv/Conv_output_0 /model.18/m.0/cv1/act/Mul_output_0
Convolution              /model.18/m.0/cv2/conv/Conv 1 1 /model.18/m.0/cv1/act/Mul_output_0 /model.18/m.0/cv2/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.18/m.0/cv2/act/Mul 1 1 /model.18/m.0/cv2/conv/Conv_output_0 /model.18/m.0/cv2/act/Mul_output_0
Concat                   /model.18/Concat         3 1 /model.18/Split_output_0 /model.18/Split_output_1_splitncnn_0 /model.18/m.0/cv2/act/Mul_output_0 /model.18/Concat_output_0
Convolution              /model.18/cv2/conv/Conv  1 1 /model.18/Concat_output_0 /model.18/cv2/conv/Conv_output_0 0=128 1=1 5=1 6=24576
Swish                    /model.18/cv2/act/Mul    1 1 /model.18/cv2/conv/Conv_output_0 /model.18/cv2/act/Mul_output_0
Split                    splitncnn_17             1 3 /model.18/cv2/act/Mul_output_0 /model.18/cv2/act/Mul_output_0_splitncnn_0 /model.18/cv2/act/Mul_output_0_splitncnn_1 /model.18/cv2/act/Mul_output_0_splitncnn_2
Convolution              /model.19/conv/Conv      1 1 /model.18/cv2/act/Mul_output_0_splitncnn_2 /model.19/conv/Conv_output_0 0=128 1=3 3=2 4=1 5=1 6=147456
Swish                    /model.19/act/Mul        1 1 /model.19/conv/Conv_output_0 /model.19/act/Mul_output_0
Concat                   /model.20/Concat         2 1 /model.19/act/Mul_output_0 /model.9/cv2/act/Mul_output_0_splitncnn_0 /model.20/Concat_output_0
Convolution              /model.21/cv1/conv/Conv  1 1 /model.20/Concat_output_0 /model.21/cv1/conv/Conv_output_0 0=256 1=1 5=1 6=98304
Swish                    /model.21/cv1/act/Mul    1 1 /model.21/cv1/conv/Conv_output_0 /model.21/cv1/act/Mul_output_0
Slice                    /model.21/Split          1 2 /model.21/cv1/act/Mul_output_0 /model.21/Split_output_0 /model.21/Split_output_1 -23300=2,128,-233
Split                    splitncnn_18             1 2 /model.21/Split_output_1 /model.21/Split_output_1_splitncnn_0 /model.21/Split_output_1_splitncnn_1
Convolution              /model.21/m.0/cv1/conv/Conv 1 1 /model.21/Split_output_1_splitncnn_1 /model.21/m.0/cv1/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish                    /model.21/m.0/cv1/act/Mul 1 1 /model.21/m.0/cv1/conv/Conv_output_0 /model.21/m.0/cv1/act/Mul_output_0
Convolution              /model.21/m.0/cv2/conv/Conv 1 1 /model.21/m.0/cv1/act/Mul_output_0 /model.21/m.0/cv2/conv/Conv_output_0 0=128 1=3 4=1 5=1 6=147456
Swish                    /model.21/m.0/cv2/act/Mul 1 1 /model.21/m.0/cv2/conv/Conv_output_0 /model.21/m.0/cv2/act/Mul_output_0
Concat                   /model.21/Concat         3 1 /model.21/Split_output_0 /model.21/Split_output_1_splitncnn_0 /model.21/m.0/cv2/act/Mul_output_0 /model.21/Concat_output_0
Convolution              /model.21/cv2/conv/Conv  1 1 /model.21/Concat_output_0 /model.21/cv2/conv/Conv_output_0 0=256 1=1 5=1 6=98304
Swish                    /model.21/cv2/act/Mul    1 1 /model.21/cv2/conv/Conv_output_0 /model.21/cv2/act/Mul_output_0
Split                    splitncnn_19             1 2 /model.21/cv2/act/Mul_output_0 /model.21/cv2/act/Mul_output_0_splitncnn_0 /model.21/cv2/act/Mul_output_0_splitncnn_1
Convolution              /model.22/cv2.0/cv2.0.0/conv/Conv 1 1 /model.15/cv2/act/Mul_output_0_splitncnn_1 /model.22/cv2.0/cv2.0.0/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.22/cv2.0/cv2.0.0/act/Mul 1 1 /model.22/cv2.0/cv2.0.0/conv/Conv_output_0 /model.22/cv2.0/cv2.0.0/act/Mul_output_0
Convolution              /model.22/cv2.0/cv2.0.1/conv/Conv 1 1 /model.22/cv2.0/cv2.0.0/act/Mul_output_0 /model.22/cv2.0/cv2.0.1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.22/cv2.0/cv2.0.1/act/Mul 1 1 /model.22/cv2.0/cv2.0.1/conv/Conv_output_0 /model.22/cv2.0/cv2.0.1/act/Mul_output_0
Convolution              /model.22/cv2.0/cv2.0.2/Conv 1 1 /model.22/cv2.0/cv2.0.1/act/Mul_output_0 /model.22/cv2.0/cv2.0.2/Conv_output_0 0=64 1=1 5=1 6=4096
Convolution              /model.22/cv3.0/cv3.0.0/conv/Conv 1 1 /model.15/cv2/act/Mul_output_0_splitncnn_0 /model.22/cv3.0/cv3.0.0/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=46080
Swish                    /model.22/cv3.0/cv3.0.0/act/Mul 1 1 /model.22/cv3.0/cv3.0.0/conv/Conv_output_0 /model.22/cv3.0/cv3.0.0/act/Mul_output_0
Convolution              /model.22/cv3.0/cv3.0.1/conv/Conv 1 1 /model.22/cv3.0/cv3.0.0/act/Mul_output_0 /model.22/cv3.0/cv3.0.1/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=57600
Swish                    /model.22/cv3.0/cv3.0.1/act/Mul 1 1 /model.22/cv3.0/cv3.0.1/conv/Conv_output_0 /model.22/cv3.0/cv3.0.1/act/Mul_output_0
Convolution              /model.22/cv3.0/cv3.0.2/Conv 1 1 /model.22/cv3.0/cv3.0.1/act/Mul_output_0 /model.22/cv3.0/cv3.0.2/Conv_output_0 0=80 1=1 5=1 6=6400
Concat                   /model.22/Concat         2 1 /model.22/cv2.0/cv2.0.2/Conv_output_0 /model.22/cv3.0/cv3.0.2/Conv_output_0 /model.22/Concat_output_0
Convolution              /model.22/cv2.1/cv2.1.0/conv/Conv 1 1 /model.18/cv2/act/Mul_output_0_splitncnn_1 /model.22/cv2.1/cv2.1.0/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=73728
Swish                    /model.22/cv2.1/cv2.1.0/act/Mul 1 1 /model.22/cv2.1/cv2.1.0/conv/Conv_output_0 /model.22/cv2.1/cv2.1.0/act/Mul_output_0
Convolution              /model.22/cv2.1/cv2.1.1/conv/Conv 1 1 /model.22/cv2.1/cv2.1.0/act/Mul_output_0 /model.22/cv2.1/cv2.1.1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.22/cv2.1/cv2.1.1/act/Mul 1 1 /model.22/cv2.1/cv2.1.1/conv/Conv_output_0 /model.22/cv2.1/cv2.1.1/act/Mul_output_0
Convolution              /model.22/cv2.1/cv2.1.2/Conv 1 1 /model.22/cv2.1/cv2.1.1/act/Mul_output_0 /model.22/cv2.1/cv2.1.2/Conv_output_0 0=64 1=1 5=1 6=4096
Convolution              /model.22/cv3.1/cv3.1.0/conv/Conv 1 1 /model.18/cv2/act/Mul_output_0_splitncnn_0 /model.22/cv3.1/cv3.1.0/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=92160
Swish                    /model.22/cv3.1/cv3.1.0/act/Mul 1 1 /model.22/cv3.1/cv3.1.0/conv/Conv_output_0 /model.22/cv3.1/cv3.1.0/act/Mul_output_0
Convolution              /model.22/cv3.1/cv3.1.1/conv/Conv 1 1 /model.22/cv3.1/cv3.1.0/act/Mul_output_0 /model.22/cv3.1/cv3.1.1/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=57600
Swish                    /model.22/cv3.1/cv3.1.1/act/Mul 1 1 /model.22/cv3.1/cv3.1.1/conv/Conv_output_0 /model.22/cv3.1/cv3.1.1/act/Mul_output_0
Convolution              /model.22/cv3.1/cv3.1.2/Conv 1 1 /model.22/cv3.1/cv3.1.1/act/Mul_output_0 /model.22/cv3.1/cv3.1.2/Conv_output_0 0=80 1=1 5=1 6=6400
Concat                   /model.22/Concat_1       2 1 /model.22/cv2.1/cv2.1.2/Conv_output_0 /model.22/cv3.1/cv3.1.2/Conv_output_0 /model.22/Concat_1_output_0
Convolution              /model.22/cv2.2/cv2.2.0/conv/Conv 1 1 /model.21/cv2/act/Mul_output_0_splitncnn_1 /model.22/cv2.2/cv2.2.0/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=147456
Swish                    /model.22/cv2.2/cv2.2.0/act/Mul 1 1 /model.22/cv2.2/cv2.2.0/conv/Conv_output_0 /model.22/cv2.2/cv2.2.0/act/Mul_output_0
Convolution              /model.22/cv2.2/cv2.2.1/conv/Conv 1 1 /model.22/cv2.2/cv2.2.0/act/Mul_output_0 /model.22/cv2.2/cv2.2.1/conv/Conv_output_0 0=64 1=3 4=1 5=1 6=36864
Swish                    /model.22/cv2.2/cv2.2.1/act/Mul 1 1 /model.22/cv2.2/cv2.2.1/conv/Conv_output_0 /model.22/cv2.2/cv2.2.1/act/Mul_output_0
Convolution              /model.22/cv2.2/cv2.2.2/Conv 1 1 /model.22/cv2.2/cv2.2.1/act/Mul_output_0 /model.22/cv2.2/cv2.2.2/Conv_output_0 0=64 1=1 5=1 6=4096
Convolution              /model.22/cv3.2/cv3.2.0/conv/Conv 1 1 /model.21/cv2/act/Mul_output_0_splitncnn_0 /model.22/cv3.2/cv3.2.0/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=184320
Swish                    /model.22/cv3.2/cv3.2.0/act/Mul 1 1 /model.22/cv3.2/cv3.2.0/conv/Conv_output_0 /model.22/cv3.2/cv3.2.0/act/Mul_output_0
Convolution              /model.22/cv3.2/cv3.2.1/conv/Conv 1 1 /model.22/cv3.2/cv3.2.0/act/Mul_output_0 /model.22/cv3.2/cv3.2.1/conv/Conv_output_0 0=80 1=3 4=1 5=1 6=57600
Swish                    /model.22/cv3.2/cv3.2.1/act/Mul 1 1 /model.22/cv3.2/cv3.2.1/conv/Conv_output_0 /model.22/cv3.2/cv3.2.1/act/Mul_output_0
Convolution              /model.22/cv3.2/cv3.2.2/Conv 1 1 /model.22/cv3.2/cv3.2.1/act/Mul_output_0 /model.22/cv3.2/cv3.2.2/Conv_output_0 0=80 1=1 5=1 6=6400
Concat                   /model.22/Concat_2       2 1 /model.22/cv2.2/cv2.2.2/Conv_output_0 /model.22/cv3.2/cv3.2.2/Conv_output_0 /model.22/Concat_2_output_0
Reshape                  /model.22/Reshape        1 1 /model.22/Concat_output_0 /model.22/Reshape_output_0 0=-1 1=144
Reshape                  /model.22/Reshape_1      1 1 /model.22/Concat_1_output_0 /model.22/Reshape_1_output_0 0=-1 1=144
Reshape                  /model.22/Reshape_2      1 1 /model.22/Concat_2_output_0 /model.22/Reshape_2_output_0 0=-1 1=144
Concat                   /model.22/Concat_3       3 1 /model.22/Reshape_output_0 /model.22/Reshape_1_output_0 /model.22/Reshape_2_output_0 /model.22/Concat_3_output_0 0=1
Permute                  /model.22/Transpose      1 1 /model.22/Concat_3_output_0 output0 0=1

总结

根据我的结果，如果能保证转换的Onnx模型是opset=12/11，输出维度是1×144×8400，转换为ncnn模型后得到的param结构最后一层是Permute层，就应该没有问题。

代码测试

目前网上的参考代码应该有这个，这个和这个，其中前者的代码是针对分割任务的，中间的我测试会出现乱框现象，所以我采用的是最后一位大佬的代码（修改了一丢丢）。代码在下面，如果你是完全按照我的步骤来的话应该是可以直接用的，先不急着看，建议先看下面的修改内容。

yolo.h

#ifndef YOLO_H
#define YOLO_H

#include <opencv2/core/core.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <net.h>

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
};
struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};
class Yolo
{
public:
    Yolo();

    int load(int target_size, const float* mean_vals, const float* norm_vals, bool use_gpu = false);

    int detect(const cv::Mat& rgb, std::vector<Object>& objects, float prob_threshold = 0.4f, float nms_threshold = 0.5f);

    int draw(cv::Mat& rgb, const std::vector<Object>& objects);

private:
    ncnn::Net yolo;
    int target_size;
    float mean_vals[3];
    float norm_vals[3];
    ncnn::UnlockedPoolAllocator blob_pool_allocator;
    ncnn::PoolAllocator workspace_pool_allocator;
};

#endif

yolo.cpp

#include "yolo.h"

#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>

//#include "cpu.h"
#include <iostream>

static float fast_exp(float x)
{
	union {
		uint32_t i;
		float f;
	} v{};
	v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f);
	return v.f;
}

static float sigmoid(float x)
{
	return 1.0f / (1.0f + fast_exp(-x));
}
static float intersection_area(const Object& a, const Object& b)
{
	cv::Rect_<float> inter = a.rect & b.rect;
	return inter.area();
}

static void qsort_descent_inplace(std::vector<Object>& faceobjects, int left, int right)
{
	int i = left;
	int j = right;
	float p = faceobjects[(left + right) / 2].prob;

	while (i <= j)
	{
		while (faceobjects[i].prob > p)
			i++;

		while (faceobjects[j].prob < p)
			j--;

		if (i <= j)
		{
			// swap
			std::swap(faceobjects[i], faceobjects[j]);

			i++;
			j--;
		}
	}

	//     #pragma omp parallel sections
	{
		//         #pragma omp section
		{
			if (left < j) qsort_descent_inplace(faceobjects, left, j);
		}
		//         #pragma omp section
		{
			if (i < right) qsort_descent_inplace(faceobjects, i, right);
		}
	}
}

static void qsort_descent_inplace(std::vector<Object>& faceobjects)
{
	if (faceobjects.empty())
		return;

	qsort_descent_inplace(faceobjects, 0, faceobjects.size() - 1);
}

static void nms_sorted_bboxes(const std::vector<Object>& faceobjects, std::vector<int>& picked, float nms_threshold)
{
	picked.clear();

	const int n = faceobjects.size();

	std::vector<float> areas(n);
	for (int i = 0; i < n; i++)
	{
		areas[i] = faceobjects[i].rect.width * faceobjects[i].rect.height;
	}

	for (int i = 0; i < n; i++)
	{
		const Object& a = faceobjects[i];

		int keep = 1;
		for (int j = 0; j < (int)picked.size(); j++)
		{
			const Object& b = faceobjects[picked[j]];

			// intersection over union
			float inter_area = intersection_area(a, b);
			float union_area = areas[i] + areas[picked[j]] - inter_area;
			// float IoU = inter_area / union_area
			if (inter_area / union_area > nms_threshold)
				keep = 0;
		}

		if (keep)
			picked.push_back(i);
	}
}
static void generate_grids_and_stride(const int target_w, const int target_h, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
	for (int i = 0; i < (int)strides.size(); i++)
	{
		int stride = strides[i];
		int num_grid_w = target_w / stride;
		int num_grid_h = target_h / stride;
		for (int g1 = 0; g1 < num_grid_h; g1++)
		{
			for (int g0 = 0; g0 < num_grid_w; g0++)
			{
				GridAndStride gs;
				gs.grid0 = g0;
				gs.grid1 = g1;
				gs.stride = stride;
				grid_strides.push_back(gs);
			}
		}
	}
}
static void generate_proposals(std::vector<GridAndStride> grid_strides, const ncnn::Mat& pred, float prob_threshold, std::vector<Object>& objects)
{
	const int num_points = grid_strides.size();
	const int num_class = 80; //<--------------
	const int reg_max_1 = 16;

	for (int i = 0; i < num_points; i++)
	{
		const float* scores = pred.row(i) + 4 * reg_max_1;

		// find label with max score
		int label = -1;
		float score = -FLT_MAX;
		for (int k = 0; k < num_class; k++)
		{
			float confidence = scores[k];
			if (confidence > score)
			{
				label = k;
				score = confidence;
			}
		}
		float box_prob = sigmoid(score);
		if (box_prob >= prob_threshold)
		{
			ncnn::Mat bbox_pred(reg_max_1, 4, (void*)pred.row(i));
			{
				ncnn::Layer* softmax = ncnn::create_layer("Softmax");

				ncnn::ParamDict pd;
				pd.set(0, 1); // axis
				pd.set(1, 1);
				softmax->load_param(pd);

				ncnn::Option opt;
				opt.num_threads = 1;
				opt.use_packing_layout = false;

				softmax->create_pipeline(opt);

				softmax->forward_inplace(bbox_pred, opt);

				softmax->destroy_pipeline(opt);

				delete softmax;
			}

			float pred_ltrb[4];
			for (int k = 0; k < 4; k++)
			{
				float dis = 0.f;
				const float* dis_after_sm = bbox_pred.row(k);
				for (int l = 0; l < reg_max_1; l++)
				{
					dis += l * dis_after_sm[l];
				}

				pred_ltrb[k] = dis * grid_strides[i].stride;
			}

			float pb_cx = (grid_strides[i].grid0 + 0.5f) * grid_strides[i].stride;
			float pb_cy = (grid_strides[i].grid1 + 0.5f) * grid_strides[i].stride;

			float x0 = pb_cx - pred_ltrb[0];
			float y0 = pb_cy - pred_ltrb[1];
			float x1 = pb_cx + pred_ltrb[2];
			float y1 = pb_cy + pred_ltrb[3];

			Object obj;
			obj.rect.x = x0;
			obj.rect.y = y0;
			obj.rect.width = x1 - x0;
			obj.rect.height = y1 - y0;
			obj.label = label;
			obj.prob = box_prob;

			objects.push_back(obj);
		}
	}
}

Yolo::Yolo()
{
	blob_pool_allocator.set_size_compare_ratio(0.f);
	workspace_pool_allocator.set_size_compare_ratio(0.f);
}


int Yolo::load(int _target_size, const float* _mean_vals, const float* _norm_vals, bool use_gpu)
{
	/*	yolo.clear();
		blob_pool_allocator.clear();
		workspace_pool_allocator.clear()*/;

		//	ncnn::set_cpu_powersave(2);
		//	ncnn::set_omp_num_threads(ncnn::get_big_cpu_count());

		yolo.opt = ncnn::Option();

		//#if NCNN_VULKAN
		//	yolo.opt.use_vulkan_compute = use_gpu;
		//#endif

			//yolo.opt.num_threads = ncnn::get_big_cpu_count();
			//yolo.opt.blob_allocator = &blob_pool_allocator;
			//yolo.opt.workspace_allocator = &workspace_pool_allocator;



		//QFile::copy("assets:/pic/yolov8s_opt.param", "yolov8s_opt.param");
		//QFile::copy("assets:/pic/yolov8s_opt.bin", "yolov8s_opt.bin");
		yolo.load_param("yolov8n-op11.param");
		yolo.load_model("yolov8n-op11.bin");


		//	yolo.load_param("./model/yolov8s_opt.param");
		//	yolo.load_model("./model/yolov8s_opt.bin");

		target_size = _target_size;
		mean_vals[0] = _mean_vals[0];
		mean_vals[1] = _mean_vals[1];
		mean_vals[2] = _mean_vals[2];
		norm_vals[0] = _norm_vals[0];
		norm_vals[1] = _norm_vals[1];
		norm_vals[2] = _norm_vals[2];

		return 0;
}

int Yolo::detect(const cv::Mat& rgb, std::vector<Object>& objects, float prob_threshold, float nms_threshold)
{
	int width = rgb.cols;
	int height = rgb.rows;

	// pad to multiple of 32
	int w = width;
	int h = height;
	float scale = 1.f;
	if (w > h)
	{
		scale = (float)target_size / w;
		w = target_size;
		h = h * scale;
	}
	else
	{
		scale = (float)target_size / h;
		h = target_size;
		w = w * scale;
	}

	ncnn::Mat in = ncnn::Mat::from_pixels_resize(rgb.data, ncnn::Mat::PIXEL_RGB2BGR, width, height, w, h);

	// pad to target_size rectangle
	int wpad = (w + 31) / 32 * 32 - w;
	int hpad = (h + 31) / 32 * 32 - h;
	ncnn::Mat in_pad;
	ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);

	in_pad.substract_mean_normalize(0, norm_vals);

	ncnn::Extractor ex = yolo.create_extractor();

	ex.input("images", in_pad);

	std::vector<Object> proposals;

	ncnn::Mat out;
	ex.extract("output0", out);

	std::cout << "---------YES, " << out.dims << " width," << out.w << " height," << out.h << " , " << out.c << std::endl;
	// 解析output + NMS TODO

	std::vector<int> strides = { 8, 16, 32 }; // might have stride=64
	std::vector<GridAndStride> grid_strides;
	generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);
	generate_proposals(grid_strides, out, prob_threshold, proposals);

	// sort all proposals by score from highest to lowest
	qsort_descent_inplace(proposals);

	// apply nms with nms_threshold
	std::vector<int> picked;
	nms_sorted_bboxes(proposals, picked, nms_threshold);

	int count = picked.size();

	objects.resize(count);
	for (int i = 0; i < count; i++)
	{
		objects[i] = proposals[picked[i]];

		// adjust offset to original unpadded
		float x0 = (objects[i].rect.x - (wpad / 2)) / scale;
		float y0 = (objects[i].rect.y - (hpad / 2)) / scale;
		float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;
		float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;

		// clip
		x0 = std::max(std::min(x0, (float)(width - 1)), 0.f);
		y0 = std::max(std::min(y0, (float)(height - 1)), 0.f);
		x1 = std::max(std::min(x1, (float)(width - 1)), 0.f);
		y1 = std::max(std::min(y1, (float)(height - 1)), 0.f);

		objects[i].rect.x = x0;
		objects[i].rect.y = y0;
		objects[i].rect.width = x1 - x0;
		objects[i].rect.height = y1 - y0;
	}

	// sort objects by area
	struct
	{
		bool operator()(const Object& a, const Object& b) const
		{
			return a.rect.area() > b.rect.area();
		}
	} objects_area_greater;
	std::sort(objects.begin(), objects.end(), objects_area_greater);

	return 0;
}

int Yolo::draw(cv::Mat& rgb, const std::vector<Object>& objects)
{
	static const char* class_names[] = {
		   "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
		   "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
		   "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
		   "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
		   "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
		   "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
		   "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
		   "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
		   "hair drier", "toothbrush"
	};
	//    static const char* class_names[] = {"blur", "phone", "reflectLight", "reflection"};
	static const unsigned char colors[81][3] = {
			{56,  0,   255},
			{226, 255, 0},
			{0,   94,  255},
			{0,   37,  255},
			{0,   255, 94},
			{255, 226, 0},
			{0,   18,  255},
			{255, 151, 0},
			{170, 0,   255},
			{0,   255, 56},
			{255, 0,   75},
			{0,   75,  255},
			{0,   255, 169},
			{255, 0,   207},
			{75,  255, 0},
			{207, 0,   255},
			{37,  0,   255},
			{0,   207, 255},
			{94,  0,   255},
			{0,   255, 113},
			{255, 18,  0},
			{255, 0,   56},
			{18,  0,   255},
			{0,   255, 226},
			{170, 255, 0},
			{255, 0,   245},
			{151, 255, 0},
			{132, 255, 0},
			{75,  0,   255},
			{151, 0,   255},
			{0,   151, 255},
			{132, 0,   255},
			{0,   255, 245},
			{255, 132, 0},
			{226, 0,   255},
			{255, 37,  0},
			{207, 255, 0},
			{0,   255, 207},
			{94,  255, 0},
			{0,   226, 255},
			{56,  255, 0},
			{255, 94,  0},
			{255, 113, 0},
			{0,   132, 255},
			{255, 0,   132},
			{255, 170, 0},
			{255, 0,   188},
			{113, 255, 0},
			{245, 0,   255},
			{113, 0,   255},
			{255, 188, 0},
			{0,   113, 255},
			{255, 0,   0},
			{0,   56,  255},
			{255, 0,   113},
			{0,   255, 188},
			{255, 0,   94},
			{255, 0,   18},
			{18,  255, 0},
			{0,   255, 132},
			{0,   188, 255},
			{0,   245, 255},
			{0,   169, 255},
			{37,  255, 0},
			{255, 0,   151},
			{188, 0,   255},
			{0,   255, 37},
			{0,   255, 0},
			{255, 0,   170},
			{255, 0,   37},
			{255, 75,  0},
			{0,   0,   255},
			{255, 207, 0},
			{255, 0,   226},
			{255, 245, 0},
			{188, 255, 0},
			{0,   255, 18},
			{0,   255, 75},
			{0,   255, 151},
			{255, 56,  0},
			{245, 255, 0}
	};

	int color_index = 0;

	for (size_t i = 0; i < objects.size(); i++)
	{
		const Object& obj = objects[i];

		//         fprintf(stderr, "%d = %.5f at %.2f %.2f %.2f x %.2f\n", obj.label, obj.prob,
		//                 obj.rect.x, obj.rect.y, obj.rect.width, obj.rect.height);

		const unsigned char* color = colors[color_index % 19];
		color_index++;

		cv::Scalar cc(color[0], color[1], color[2]);

		cv::rectangle(rgb, obj.rect, cc, 2);

		char text[256];
		sprintf_s(text, "%s %.1f%%", class_names[obj.label], obj.prob * 100);
		//std::string text = class_names[obj.label] + std::to_string(round(obj.prob * 10000)/100);

		int baseLine = 0;
		cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

		int x = obj.rect.x;
		int y = obj.rect.y - label_size.height - baseLine;
		if (y < 0)
			y = 0;
		if (x + label_size.width > rgb.cols)
			x = rgb.cols - label_size.width;

		cv::rectangle(rgb, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)), cc, -1);

		cv::Scalar textcc = (color[0] + color[1] + color[2] >= 381) ? cv::Scalar(0, 0, 0) : cv::Scalar(255, 255, 255);

		cv::putText(rgb, text, cv::Point(x, y + label_size.height), cv::FONT_HERSHEY_SIMPLEX, 0.5, textcc, 1);
	}
	cv::imshow("image", rgb);
	//cv::imwrite("demo.png", image);
	cv::waitKey(0);
	return 0;
}

main.cpp

#include "yolo.h"
#include "net.h"

#if defined(USE_NCNN_SIMPLEOCV)
#include "simpleocv.h"
#else
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#endif

#include <stdlib.h>
#include <float.h>
#include <stdio.h>
#include <vector>
#include <iostream>

int main()
{
	Yolo *yolov8 = new Yolo();
	cv::Mat m = cv::imread("image1.png", 1);
	int target_size = 640;
	float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };
	float mean_vals[3] = { 103.53f, 116.28f, 123.675f };

	yolov8->load(target_size, mean_vals, norm_vals);
	std::vector<Object> objects;
	yolov8->detect(m, objects);
	std::cout << objects.size();
	yolov8->draw(m, objects);

	return 0;
}

修改内容（只修改yolo.cpp，修改后的内容参考上面的代码）

加载的模型

在大概256行的位置要把加载的模型修改为你自己的，上面的QFile也注释掉，如果你没有装Qt的话

类别数量和类别标签

在大概148行的位置把类别数量设置成和你的模型匹配，以本教程使用的CoCo目标检测模型为例是80

对应的类别标签也要修改，大概370的位置

总结

加载的模型要改，类别数量要改，类别标签要改。

测试结果

采用官方提供的yolov8n.pt模型，选择opset=12，simplify=True，使用在线网站转换时三项全部勾选，转换得到模型测试效果如下：

总结

模型转换参数opset不能为13，转换得到输出维度应该是1×144×8400，模型结构最后一层应该是Permute。如果已经修改源码，但是转换一直不成功，可能是环境冲突，也就是你调用的ultralytics包并不是你修改过的。

网上有多个版本的测试代码，如果模型转换正确，都能够成功运行，但是预测结果不一定正确，可以多找几个版本的代码测试。

推理代码解读

Object类

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
    cv::Mat mask;
    std::vector<float> mask_feat;
};

GridAndStride类

struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};

main

int main(int argc, char** argv)
{
    cv::Mat m = cv::imread("20240416111118.png", 1);
    std::vector<Object> objects;
    detect_yolov8(m, objects);
    draw_objects(m, objects);
    return 0;
}

在 OpenCV 中，Mat 类的主要成员变量如下：

int flags：用于存储关于矩阵的一些标志，例如矩阵的类型、通道数等信息。

int dims：表示矩阵的维数。

int rows：表示矩阵的行数。

int cols：表示矩阵的列数。

uchar* data：指向矩阵数据存储的指针。

std::shared_ptr<MatAllocator> allocator：智能指针，用于内存分配和释放。

std::vector<int> size：存储矩阵每个维度的大小。

std::vector<int> step：存储矩阵每个维度的步长。

detect_yolo

static int detect_yolov8(const cv::Mat& bgr, std::vector<Object>& objects)
{
    ncnn::Net yolov8; //定义网络

    yolov8.load_param("yolov8s-op13.param"); //加载参数
    yolov8.load_model("yolov8s-op13.bin"); //加载模型

    int width = bgr.cols; //图像的宽
    int height = bgr.rows;//图像的高

    const int target_size = 640;//输入图片大小？
    const float prob_threshold = 0.4f;
    const float nms_threshold = 0.5f;

    // 在图像旁边补像素
    int w = width;
    int h = height;
    float scale = 1.f;
    /*
    
    在C++中，这三种写法的区别在于数值类型和表示精度的方式：

float scale = 1.f;：这种写法明确地将数值标记为单精度浮点数，.f 表示这是一个浮点数，用于表示精度。这样做可以确保初始化的值是单精度浮点数，并且更加清晰和明确。
float scale = 1.0;：这种写法使用的是双精度浮点数（双精度浮点数默认情况下用 double 类型表示）。虽然这也可以用于初始化一个 float 类型的变量，但它是一个双精度浮点数，会占用更多的内存，而且可能会引入不必要的精度。
float scale = 1;：这种写法使用的是整数，编译器会自动将整数转换为单精度浮点数，然后再赋值给 float 类型的变量。虽然这种写法是合法的，但它的可读性较差，并且可能引入不必要的类型转换。
因此，推荐使用 float scale = 1.f; 这种写法，因为它更加明确和清晰，可以确保初始化的值是单精度浮点数，并且不会引入不必要的类型转换或精度损失。
*/
    
    //把图像给变成1：1
    if (w > h)
    {
        scale = (float)target_size / w;
        w = target_size;
        h = h * scale;
    }
    else
    {
        scale = (float)target_size / h;
        h = target_size;
        w = w * scale;
    }

    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, width, height, w, h);

    // pad to target_size rectangle
    int wpad = (w + 31) / 32 * 32 - w; //计算宽度方向上需要填充的像素数，确保填充后的宽度是 32 的整数倍。
    int hpad = (h + 31) / 32 * 32 - h; //计算高度方向上需要填充的像素数，确保填充后的高度是 32 的整数倍
    ncnn::Mat in_pad;
    ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);

    const float norm_vals[3] = { 1 / 255.f, 1 / 255.f, 1 / 255.f };
    in_pad.substract_mean_normalize(0, norm_vals);


    ncnn::Extractor ex = yolov8.create_extractor();
    ex.input("images", in_pad);

    ncnn::Mat out;
    ex.extract("output0", out);

    ncnn::Mat mask_proto;
    ex.extract("output1", mask_proto);

    std::vector<int> strides = { 8, 16, 32 };
    std::vector<GridAndStride> grid_strides;
    generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);

    std::vector<Object> proposals;
    std::vector<Object> objects8;
    generate_proposals(grid_strides, out, prob_threshold, objects8);

    proposals.insert(proposals.end(), objects8.begin(), objects8.end());

    // sort all proposals by score from highest to lowest
    qsort_descent_inplace(proposals);

    // apply nms with nms_threshold
    std::vector<int> picked;
    nms_sorted_bboxes(proposals, picked, nms_threshold);

    int count = picked.size();

    ncnn::Mat mask_feat = ncnn::Mat(32, count, sizeof(float));
    for (int i = 0; i < count; i++) {
        float* mask_feat_ptr = mask_feat.row(i);
        std::memcpy(mask_feat_ptr, proposals[picked[i]].mask_feat.data(), sizeof(float) * proposals[picked[i]].mask_feat.size());
    }

    ncnn::Mat mask_pred_result;
    decode_mask(mask_feat, width, height, mask_proto, in_pad, wpad, hpad, mask_pred_result);

    objects.resize(count);
    for (int i = 0; i < count; i++)
    {
        objects[i] = proposals[picked[i]];

        // adjust offset to original unpadded
        float x0 = (objects[i].rect.x - (wpad / 2)) / scale;
        float y0 = (objects[i].rect.y - (hpad / 2)) / scale;
        float x1 = (objects[i].rect.x + objects[i].rect.width - (wpad / 2)) / scale;
        float y1 = (objects[i].rect.y + objects[i].rect.height - (hpad / 2)) / scale;

        // clip
        x0 = std::max(std::min(x0, (float)(width - 1)), 0.f);
        y0 = std::max(std::min(y0, (float)(height - 1)), 0.f);
        x1 = std::max(std::min(x1, (float)(width - 1)), 0.f);
        y1 = std::max(std::min(y1, (float)(height - 1)), 0.f);

        objects[i].rect.x = x0;
        objects[i].rect.y = y0;
        objects[i].rect.width = x1 - x0;
        objects[i].rect.height = y1 - y0;

        objects[i].mask = cv::Mat::zeros(height, width, CV_32FC1);
        cv::Mat mask = cv::Mat(height, width, CV_32FC1, (float*)mask_pred_result.channel(i));
        mask(objects[i].rect).copyTo(objects[i].mask(objects[i].rect));
    }

    return 0;
}