陕西建新建设有限公司网站,搜索引擎营销特点,做同城特价的网站,jsp小型网站开发文章目录 概述1. 环境部署YOLOv5算法ONNX模型获取opencv-python模块安装 2.关键代码2.1 模型加载2.2 图片数据预处理2.3 模型推理2.4 推理结果后处理2.4.1 NMS2.4.2 score_threshold过滤2.4.3 bbox坐标转换与还原 3. 示例代码(可运行)3.1 未封装3.2 封装成类调用 概述
本文档主… 文章目录 概述1. 环境部署YOLOv5算法ONNX模型获取opencv-python模块安装 2.关键代码2.1 模型加载2.2 图片数据预处理2.3 模型推理2.4 推理结果后处理2.4.1 NMS2.4.2 score_threshold过滤2.4.3 bbox坐标转换与还原 3. 示例代码(可运行)3.1 未封装3.2 封装成类调用 概述
本文档主要描述python平台使用opencv-python深度神经网络模块dnn,推理YOLOv5模型的方法。
文档主要包含以下内容
opencv-python模块的安装YOLOv5模型格式的说明ONNX格式模型的加载图片数据的预处理模型推理推理结果后处理包括NMS,cxcywh坐标转换为xyxy坐标等关键方法的调用与参数说明完整的示例代码
1. 环境部署
YOLOv5算法ONNX模型获取
可通过官方链接下载YOLOv5的官方预训练模型模型格式为pt.下载链接 YOLOv5官方项目提供了pt格式模型转换为ONNX格式模型的脚本项目链接
模型导出指令
python export --weights yolov5s.pt --include onnx注导出文件执行指令所需环境安装配置参考官方项目README文档即可不在赘述。 opencv-python模块安装 创建虚拟环境并激活 conda create -n opencv python3.8 -y
conda activate opencvpip安装opencv-python模块 pip install opencv-python注: 通过pip安装opencv-python模块时默认安装仅支持CPU推理如需支持GPU推理需从源码编译安装具体安装方法较复杂这里不在赘述。
2.关键代码
2.1 模型加载
opencv-python模块提供了readNetFromONNX方法用于加载ONNX格式模型。
import cv2
cv2.dnn.readNetFromONNX(model_path)2.2 图片数据预处理
数据预处理步骤包括resize归一化颜色通道转换NCWH维度转换等。
resize之前有一个非常常用的trick来处理非方形的图片即计算图形的最长边以此最长边为基础创建一个正方形并将原图形放置到左上角剩余部分用黑色填充这样做的好处是不会改变原图形的长宽比同时也不会改变原图形的内容。 # image preprocessing, the trick is to make the frame to be a square but not twist the image
row, col, _ frame.shape # get the row and column of the origin frame array
_max max(row, col) # get the max value of row and column
input_image np.zeros((_max, _max, 3), dtypenp.uint8) # create a new array with the max value
input_image[:row, :col, :] frame # paste the original frame to make the input_image to be a square完成图片的填充后,继续执行resize归一化颜色通道转换等操作。
blob cv2.dnn.blobFromImage(image, scalefactor1 / 255.0, size(640,640), swapRBTrue, cropFalse)image: 输入图片数据,numpy.ndarray格式,shape为(H,W,C),Channel顺序为BGR。scalefactor: 图片数据归一化系数一般为1/255.0。size: 图片resize尺寸以模型的输入要求为准这里是(640,640)。swapRB: 是否交换颜色通道即转换BGR为RGB True表示交换False表示不交换由于opencv读取图片数据的颜色通道顺序为BGR而YOLOv5模型的输入要求为RGB所以这里需要交换颜色通道。crop: 是否裁剪图片False表示不裁剪。
blobFromImage函数返回四维Mat对象(NCHW dimensions order),数据的shape为(1,3,640,640)
2.3 模型推理 设置推理Backend和Target model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)模型加载完成后需要设置推理时的设备一般情况下推理设备为CPU设置方法如下 model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)当然若此时环境中的opencv-python模块支持GPU推理也可以设置为GPU推理设置方法如下 model.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
model.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)注: 判断opencv-python模块是否支持GPU推理的方法如下cv2.cuda.getCudaEnabledDeviceCount(),返回值大于0表示支持GPU推理否则表示不支持。 设置模型输入数据 model.setInput(blob)blob为上一步数据预处理得到的数据。 调用模型前向传播forward方法 outputs model.forward()outputs为模型推理的输出输出格式为(1,25200,5nc),25200为模型输出的网格数量5nc为每个网格预测的5nc个值5为x,y,w,h,confnc为类别数量。
2.4 推理结果后处理
由于推理结果存在大量重叠的bbox需要进行NMS处理后续根据每个bbox的置信度和用户设定的置信度阈值进行过滤最终得到最终的bbox和对应的类别、置信度。
2.4.1 NMS
opencv-python模块提供了NMSBoxes方法用于进行NMS处理。
cv2.dnn.NMSBoxes(bboxes, scores, score_threshold, nms_threshold, etaNone, top_kNone)bboxes: bbox列表shape为(N,4)N为bbox数量4为bbox的x,y,w,h。scores: bbox对应的置信度列表shape为(N,1)N为bbox数量。score_threshold: 置信度阈值小于该阈值的bbox将被过滤。nms_threshold: NMS阈值
NMSBoxes函数返回值为bbox索引列表shape为(M,)M为bbox数量.
2.4.2 score_threshold过滤
根据NMS处理后的bbox索引列表过滤置信度小于score_threshold的bbox。
2.4.3 bbox坐标转换与还原
YOLOv5模型输出的bbox坐标为cxcywh格式需要转换为xyxy格式此外由于之前对图片进行了resize操作所以需要将bbox坐标还原到原始图片的尺寸。 转换方法如下
# 获取原始图片的尺寸(填充后)
image_width, image_height, _ input_image.shape
# 计算缩放比
x_factor image_width / INPUT_WIDTH # 640
y_factor image_height / INPUT_HEIGHT # 640 # 将cxcywh坐标转换为xyxy坐标
x1 int((x - w / 2) * x_factor)
y1 int((y - h / 2) * y_factor)
w int(w * x_factor)
h int(h * y_factor)
x2 x1 w
y2 y1 hx1,y1,x2,y2即为bbox的xyxy坐标。
3. 示例代码(可运行)
源代码一共有两份其中一份是函数的拼接与调用比较方便调试另一份是封装成类方便集成到其他项目中。
3.1 未封装 running the onnx model inference with opencv dnn module
from typing import Listimport cv2
import numpy as np
import time
from pathlib import Pathdef build_model(model_path: str) - cv2.dnn_Net:build the model with opencv dnn moduleArgs:model_path: the path of the model, the model should be in onnx formatReturns:the model object# check if the model file existsif not Path(model_path).exists():raise FileNotFoundError(fmodel file {model_path} not found)model cv2.dnn.readNetFromONNX(model_path)# check if the opencv-python in your environment supports cudacuda_available cv2.cuda.getCudaEnabledDeviceCount() 0if cuda_available: # if cuda is available, use cudamodel.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)model.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)else: # if cuda is not available, use cpumodel.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)return modeldef inference(image: np.ndarray, model: cv2.dnn_Net) - np.ndarray:inference the model with the input imageArgs:image: the input image in numpy array format, the shape should be (height, width, channel),the color channels should be in GBR order, like the original opencv image formatmodel: the model objectReturns:the output data of the model, the shape should be (1, 25200, nc5), nc is the number of classes# image preprocessing, include resize, normalization, channel swap like BGR to RGB, and convert to blob format# get a 4-dimensional Mat with NCHW dimensions order.blob cv2.dnn.blobFromImage(image, 1 / 255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRBTrue, cropFalse)# the alternative way to get the blob# rgb cv2.cvtColor(image, cv2.COLOR_BGR2RGB)# input_image cv2.resize(srcrgb, dsize(INPUT_WIDTH, INPUT_HEIGHT))# blob_img np.float32(input_image) / 255.0# input_x blob_img.transpose((2, 0, 1))# blob np.expand_dims(input_x, 0)if cv2.cuda.getCudaEnabledDeviceCount() 0:model.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)model.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)else:model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)# set the input datamodel.setInput(blob)start time.perf_counter()# inferenceouts model.forward()end time.perf_counter()print(inference time: , end - start)# the shape of the output data is (1, 25200, nc5), nc is the number of classesreturn outsdef xywh_to_xyxy(bbox_xywh, image_width, image_height):Convert bounding box coordinates from (center_x, center_y, width, height) to (x_min, y_min, x_max, y_max) format.Parameters:bbox_xywh (list or tuple): Bounding box coordinates in (center_x, center_y, width, height) format.image_width (int): Width of the image.image_height (int): Height of the image.Returns:tuple: Bounding box coordinates in (x_min, y_min, x_max, y_max) format.center_x, center_y, width, height bbox_xywhx_min max(0, int(center_x - width / 2))y_min max(0, int(center_y - height / 2))x_max min(image_width - 1, int(center_x width / 2))y_max min(image_height - 1, int(center_y height / 2))return x_min, y_min, x_max, y_maxdef wrap_detection(input_image: np.ndarray,output_data: np.ndarray,labels: List[str],confidence_threshold: float 0.6
) - (List[int], List[float], List[List[int]]):# the shape of the output_data is (25200,5nc),# the first 5 elements are [x, y, w, h, confidence], the rest are prediction scores of each classimage_width, image_height, _ input_image.shapex_factor image_width / INPUT_WIDTHy_factor image_height / INPUT_HEIGHT# transform the output_data[:, 0:4] from (x, y, w, h) to (x_min, y_min, x_max, y_max)indices cv2.dnn.NMSBoxes(output_data[:, 0:4].tolist(), output_data[:, 4].tolist(), 0.6, 0.4)raw_boxes output_data[:, 0:4][indices]raw_confidences output_data[:, 4][indices]raw_class_prediction_probabilities output_data[:, 5:][indices]criteria raw_confidences confidence_thresholdraw_class_prediction_probabilities raw_class_prediction_probabilities[criteria]raw_boxes raw_boxes[criteria]raw_confidences raw_confidences[criteria]bounding_boxes, confidences, class_ids [], [], []for class_prediction_probability, box, confidence in zip(raw_class_prediction_probabilities, raw_boxes,raw_confidences):## find the least and most probable classes indices and their probabilities# min_val, max_val, min_loc, mac_loc cv2.minMaxLoc(class_prediction_probability)most_probable_class_index np.argmax(class_prediction_probability)label labels[most_probable_class_index]confidence float(confidence)# bounding_boxes.append(box)# confidences.append(confidence)# class_ids.append(most_probable_class_index)x, y, w, h boxleft int((x - 0.5 * w) * x_factor)top int((y - 0.5 * h) * y_factor)width int(w * x_factor)height int(h * y_factor)bounding_box [left, top, width, height]bounding_boxes.append(bounding_box)confidences.append(confidence)class_ids.append(most_probable_class_index)return class_ids, confidences, bounding_boxescoco_class_names [person, bicycle, car, motorcycle, airplane, bus, train, truck, boat,traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat,dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack,umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball,kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket,bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple,sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair,couch, potted plant, bed, dining table, toilet, tv, laptop, mouse,remote, keyboard, cell phone, microwave, oven, toaster, sink,refrigerator, book, clock, vase, scissors, teddy bear, hair drier,toothbrush]
# generate different colors for coco classes
colors np.random.uniform(0, 255, size(len(coco_class_names), 3))INPUT_WIDTH 640
INPUT_HEIGHT 640
CONFIDENCE_THRESHOLD 0.7
NMS_THRESHOLD 0.45def video_detector(video_src):cap cv2.VideoCapture(video_src)# 3. inference and show the result in a loopwhile cap.isOpened():success, frame cap.read()start time.perf_counter()if not success:break# image preprocessing, the trick is to make the frame to be a square but not twist the imagerow, col, _ frame.shape # get the row and column of the origin frame array_max max(row, col) # get the max value of row and columninput_image np.zeros((_max, _max, 3), dtypenp.uint8) # create a new array with the max valueinput_image[:row, :col, :] frame # paste the original frame to make the input_image to be a square# inferenceoutput_data inference(input_image, net) # the shape of output_data is (1, 25200, 85)# 4. wrap the detection resultclass_ids, confidences, boxes wrap_detection(input_image, output_data[0], coco_class_names)# 5. draw the detection result on the framefor (class_id, confidence, box) in zip(class_ids, confidences, boxes):color colors[int(class_id) % len(colors)]label coco_class_names[int(class_id)]xmin, ymin, width, height boxcv2.rectangle(frame, (xmin, ymin), (xmin width, ymin height), color, 2)# cv2.rectangle(frame, box, color, 2)# cv2.rectangle(frame, [box[0], box[1], box[2], box[3]], color, thickness2)# cv2.rectangle(frame, (box[0], box[1] - 20), (box[0] 100, box[1]), color, -1)cv2.putText(frame, str(label), (box[0], box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)finish time.perf_counter()FPS round(1.0 / (finish - start), 2)cv2.putText(frame, str(FPS), (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)# 6. show the framecv2.imshow(frame, frame)# 7. press q to exitif cv2.waitKey(1) ord(q):break# 8. release the capture and destroy all windowscap.release()cv2.destroyAllWindows()if __name__ __main__:# there are 4 steps to use opencv dnn module to inference onnx model exported by yolov5 and show the result# 1. load the modelmodel_path Path(weights/yolov5s.onnx)net build_model(str(model_path))# 2. load the video capture# video_source 0video_source rtsp://admin:aoto12345192.168.8.204:554/h264/ch1/main/av_streamvideo_detector(video_source)exit(0)3.2 封装成类调用
from typing import Listimport onnx
from torchvision import transformsfrom torchvision.ops import nms,box_convert
import cv2
import time
import numpy as np
import onnxruntime as ort
import torchINPUT_WIDTH 640
INPUT_HEIGHT 640def wrap_detection(input_image: np.ndarray,output_data: np.ndarray,labels: List[str],confidence_threshold: float 0.6
) - (List[int], List[float], List[List[int]]):# the shape of the output_data is (25200,5nc),# the first 5 elements are [x, y, w, h, confidence], the rest are prediction scores of each classimage_width, image_height, _ input_image.shapex_factor image_width / INPUT_WIDTHy_factor image_height / INPUT_HEIGHT# transform the output_data[:, 0:4] from (x, y, w, h) to (x_min, y_min, x_max, y_max)# output_data[:, 0:4] np.apply_along_axis(xywh_to_xyxy, 1, output_data[:, 0:4], image_width, image_height)nms_start time.perf_counter()indices cv2.dnn.NMSBoxes(output_data[:, 0:4].tolist(), output_data[:, 4].tolist(), 0.6, 0.4)nms_finish time.perf_counter()print(fnms time: {nms_finish - nms_start})# print(indices)raw_boxes output_data[:, 0:4][indices]raw_confidences output_data[:, 4][indices]raw_class_prediction_probabilities output_data[:, 5:][indices]criteria raw_confidences confidence_thresholdraw_class_prediction_probabilities raw_class_prediction_probabilities[criteria]raw_boxes raw_boxes[criteria]raw_confidences raw_confidences[criteria]bounding_boxes, confidences, class_ids [], [], []for class_prediction_probability, box, confidence in zip(raw_class_prediction_probabilities, raw_boxes,raw_confidences):## find the least and most probable classes indices and their probabilities# min_val, max_val, min_loc, mac_loc cv2.minMaxLoc(class_prediction_probability)most_probable_class_index np.argmax(class_prediction_probability)label labels[most_probable_class_index]confidence float(confidence)# bounding_boxes.append(box)# confidences.append(confidence)# class_ids.append(most_probable_class_index)x, y, w, h boxleft int((x - 0.5 * w) * x_factor)top int((y - 0.5 * h) * y_factor)width int(w * x_factor)height int(h * y_factor)bounding_box [left, top, width, height]bounding_boxes.append(bounding_box)confidences.append(confidence)class_ids.append(most_probable_class_index)return class_ids, confidences, bounding_boxescoco_class_names [person, bicycle, car, motorcycle, airplane, bus, train, truck, boat,traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat,dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack,umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball,kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket,bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple,sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair,couch, potted plant, bed, dining table, toilet, tv, laptop, mouse,remote, keyboard, cell phone, microwave, oven, toaster, sink,refrigerator, book, clock, vase, scissors, teddy bear, hair drier,toothbrush]colors np.random.uniform(0, 255, size(len(coco_class_names), 3))
if __name__ __main__:# Load the modelmodel_path weights/yolov5s.onnxonnx_model onnx.load(model_path)onnx.checker.check_model(onnx_model)session ort.InferenceSession(model_path, providers[CUDAExecutionProvider,CPUExecutionProvider])capture cv2.VideoCapture(0)trans transforms.Compose([transforms.Resize((640, 640)),transforms.ToTensor()])from PIL import Imagewhile capture.isOpened():success, frame capture.read()start time.perf_counter()if not success:breakrows, cols, channels frame.shape# Preprocessingmax_size max(rows, cols)input_image np.zeros((max_size, max_size, 3), dtypenp.uint8)input_image[:rows, :cols, :] frameinput_image cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB)inputs trans(Image.fromarray(input_image))inputs inputs.unsqueeze(0)print(inputs.shape)# inputs.to(cuda)ort_inputs {session.get_inputs()[0].name: inputs.numpy()}ort_outs session.run(None, ort_inputs)out_prob ort_outs[0][0]print(out_prob.shape)scores out_prob[:, 4] # Confidence scores are in the 5th column (0-indexed)class_ids out_prob[:, 5:].argmax(axis1) # Class labels are from the 6th column onwardsbounding_boxes_xywh out_prob[:, :4] # Bounding boxes in cxcywh format# Filter out boxes based on confidence thresholdconfidence_threshold 0.7mask scores confidence_thresholdclass_ids class_ids[mask]bounding_boxes_xywh bounding_boxes_xywh[mask]scores scores[mask]bounding_boxes_xywh torch.tensor(bounding_boxes_xywh, dtypetorch.float32)# Convert bounding boxes from xywh to xyxy formatbounding_boxes_xyxy box_convert(bounding_boxes_xywh, in_fmtcxcywh, out_fmtxyxy)# Perform Non-Maximum Suppression to filter candidate boxesscores torch.tensor(scores, dtypetorch.float32)bounding_boxes_xyxy.to(cuda)scores.to(cuda)nms_start time.perf_counter()keep_indices nms(bounding_boxes_xyxy, scores, 0.4)nms_end time.perf_counter()print(fNMS took {nms_end - nms_start} seconds)class_ids class_ids[keep_indices]confidences scores[keep_indices]bounding_boxes bounding_boxes_xyxy[keep_indices]# class_ids, confidences, bounding_boxes wrap_detection(input_image, out_prob[0], coco_class_names, 0.6)# breakfor i in range(len(keep_indices)):try:class_id class_ids[i]except IndexError as e:print(e)print(class_ids,i, len(keep_indices))breakconfidence confidences[i]box bounding_boxes[i]color colors[int(class_id) % len(colors)]label coco_class_names[int(class_id)]# cv2.rectangle(frame, box, color, 2)print(type(box), box[0], box[1], box[2], box[3], box)xmin, ymin, xmax, ymax int(box[0]), int(box[1]), int(box[2]), int(box[3])cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 2)# cv2.rectangle(frame, box, color, 2)# cv2.rectangle(frame, [box[0], box[1], box[2], box[3]], color, thickness2)cv2.rectangle(frame, (xmin, ymin - 20), (xmin 100, ymin), color, -1)cv2.putText(frame, str(label), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)finish time.perf_counter()FPS round(1.0 / (finish - start), 2)cv2.putText(frame, fFPS: {str(FPS)}, (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)# 6. show the framecv2.imshow(frame, frame)# 7. press q to exitif cv2.waitKey(1) ord(q):break# 8. release the capture and destroy all windowscapture.release()cv2.destroyAllWindows()exit(0)