tensorRT_Pro-Vision 是一个基于 TensorRT C++ API 的高性能计算机视觉推理框架,支持 20+ 主流视觉模型的一键部署。该仓库从 shouxieai/tensorRT_Pro fork 而来,经过大幅重构和扩展。
目前已支持 检测、分类、实例分割、语义分割、姿态估计、旋转框检测 (OBB)、深度估计、文字识别 (OCR)、车道线检测、多目标跟踪 等多项视觉任务的端到端 GPU 推理。🚀🚀🚀
核心特性:
- 基于 TensorRT C++ API,支持 TensorRT 8.x / 10.x 版本自适应编译(通过
NV_TENSORRT_MAJOR宏自动适配 API 差异) - 使用 TensorRT 官方 ONNX 解析器 (
libnvonnxparser.so),无需 protobuf 依赖 - 预处理、后处理、NMS 全部 GPU 加速(CUDA Kernel 实现)
- 支持 FP32 / FP16 / INT8 三种推理精度,内置 Entropy 和 MinMax 校准器
- 支持动态 Batch,最大 batch size 可在编译时配置
- 代码结构清晰,各模型任务独立模块化,易于扩展
- 2026/6/26 重构
- TensorRT 10.x 全面支持:通过
NV_TENSORRT_MAJOR宏实现 TRT 8.x / 10.x 双版本自适应编译,API 差异由#if NV_TENSORRT_MAJOR >= 10条件编译自动处理 - 移除 vendored ONNX 解析器:不再维护自定义的 onnx-tensorrt 解析器(~15,000 行代码),统一使用 TensorRT 官方
libnvonnxparser.so,大幅简化代码库 - 移除 protobuf 依赖:不再需要单独安装 protobuf 3.11.4,构建更加简单
- 修复
getMaxBatchSize()警告:显式 batch 模式下使用getProfileDimensions/getProfileShape正确获取最大 batch size - 仓库重命名:
tensorRT_Pro-YOLOv8→tensorRT_Pro-Vision,体现多任务视觉推理的定位 - 全新 Banner:9 宫格任务全景图,直观展示所有支持的视觉任务
- TensorRT 10.x 全面支持:通过
📜 完整历史更新记录(2023-2026)请查看 v1.0.0 README,其中包含 CSDN 文章同步讲解链接。
该项目依赖于 CUDA、cuDNN、TensorRT、OpenCV 库,请在 Makefile 或 CMakeLists.txt 中手动指定路径配置。
| 依赖 | TensorRT 8.x 推荐 | TensorRT 10.x 推荐 |
|---|---|---|
| CUDA | >= 10.2 | >= 12.0 |
| cuDNN | >= 8.x | >= 9.x |
| TensorRT | >= 8.4 | >= 10.0 |
| OpenCV | >= 4.x | >= 4.x |
克隆项目:
git clone https://github.com/Melody-Zhou/tensorRT_Pro-Vision.gitMakefile 编译
- 修改 Makefile 中的库文件路径:
# ===== TensorRT 8.x =====
lean_tensor_rt := /opt/TensorRT-8.6.1.6
lean_cudnn := /home/zhouwenguang/lean/cudnn-8.5.0.96
lean_cuda := /usr/local/cuda-11.4
lean_opencv := /home/zhouwenguang/lean/opencv-4.6.0
# ===== 或者 TensorRT 10.x =====
# lean_tensor_rt := /home/zhouwenguang/lean/TensorRT-10.16.1
# lean_cudnn := /home/zhouwenguang/lean/cudnn-9.18.0
# lean_cuda := /usr/local/cuda-12.8
# lean_opencv := /home/zhouwenguang/lean/opencv-4.6.0- 编译:
make -j$(nproc)CMakeLists.txt 编译
-
修改 CMakeLists.txt 中的库文件路径
-
编译:
mkdir build && cd build
cmake .. && make -j$(nproc)YOLOv3支持
- 下载 YOLOv3
git clone https://github.com/ultralytics/yolov3.git- 修改代码, 保证动态 batch
# ========== export.py ==========
# yolov3/export.py第160行
# output_names = ['output0', 'output1'] if isinstance(model, SegmentationModel) else ['output0']
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 1: 'anchors'} # shape(1,25200,85)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 1: 'anchors'} # shape(1,25200,85)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(model, SegmentationModel) else ['output']
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 1: 'anchors'} # shape(1,25200,85)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1,25200,85)- 导出 onnx 模型
cd yolov3
python export.py --weights=yolov3.pt --dynamic --simplify --include=onnx --opset=11- 复制模型并执行
cp yolov3/yolov3.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
# 修改代码在 src/application/app_yolo.cpp: app_yolo 函数中, 使用 V3 的方式即可运行
# test(Yolo::Type::V3, TRT::Mode::FP32, "yolov3");
make yolo -j64YOLOX支持
- 下载 YOLOX
git clone https://github.com/Megvii-BaseDetection/YOLOX.git- 导出 onnx 模型
cd YOLOX
export PYTHONPATH=$PYTHONPATH:.
python tools/export_onnx.py -c yolox_s.pth -f exps/default/yolox_s.py --output-name=yolox_s.onnx --dynamic --decode_in_inference- 复制模型并执行
cp YOLOX/yolox_s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
# 修改代码在 src/application/app_yolo.cpp: app_yolo 函数中, 使用 X 的方式即可运行
# test(Yolo::Type::X, TRT::Mode::FP32, "yolox_s");
make yolo -j64YOLOv5支持
- 下载 YOLOv5
git clone https://github.com/ultralytics/yolov5.git- 修改代码, 保证动态 batch
# ========== export.py ==========
# yolov5/export.py第160行
# output_names = ['output0', 'output1'] if isinstance(model, SegmentationModel) else ['output0']
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 1: 'anchors'} # shape(1,25200,85)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 1: 'anchors'} # shape(1,25200,85)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(model, SegmentationModel) else ['output']
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 1: 'anchors'} # shape(1,25200,85)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1,25200,85)- 导出 onnx 模型
cd yolov5
python export.py --weights=yolov5s.pt --dynamic --simplify --include=onnx --opset=11- 复制模型并执行
cp yolov5/yolov5s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
# 修改代码在 src/application/app_yolo.cpp: app_yolo 函数中, 使用 V5 的方式即可运行
# test(Yolo::Type::V5, TRT::Mode::FP32, "yolov5s");
make yolo -j64YOLOv6支持
- 下载 YOLOv6
git clone https://github.com/meituan/YOLOv6.git- 修改代码, 保证动态 batch,并去除 anchor 维度
# ========== export_onnx.py ==========
# YOLOv6/deploy/ONNX/export_onnx.py第84行
# output_axes = {
# 'outputs': {0: 'batch'},
# }
# 修改为:
output_axes = {
'output': {0: 'batch'},
}
# YOLOv6/deploy/ONNX/export_onnx.py第106行
# torch.onnx.export(model, img, f, verbose=False, opset_version=13,
# training=torch.onnx.TrainingMode.EVAL,
# do_constant_folding=True,
# input_names=['images'],
# output_names=['num_dets', 'det_boxes', 'det_scores', 'det_classes']
# if args.end2end else ['outputs'],
# dynamic_axes=dynamic_axes)
# 修改为:
torch.onnx.export(model, img, f, verbose=False, opset_version=13,
training=torch.onnx.TrainingMode.EVAL,
do_constant_folding=True,
input_names=['images'],
output_names=['num_dets', 'det_boxes', 'det_scores', 'det_classes']
if args.end2end else ['output'],
dynamic_axes=dynamic_axes)
# 根据不同的 head 去除 anchor 维度
# ========== effidehead_distill_ns.py ==========
# YOLOv6/yolov6/models/heads/effidehead_distill_ns.py第141行
# return torch.cat(
# [
# pred_bboxes,
# torch.ones((b, pred_bboxes.shape[1], 1), device=pred_bboxes.device, dtype=pred_bboxes.dtype),
# cls_score_list
# ],
# axis=-1)
# 修改为:
return torch.cat(
[
pred_bboxes,
cls_score_list
],
axis=-1)
# ========== effidehead_fuseab.py ==========
# YOLOv6/yolov6/models/heads/effidehead_fuseab.py第191行
# return torch.cat(
# [
# pred_bboxes,
# torch.ones((b, pred_bboxes.shape[1], 1), device=pred_bboxes.device, dtype=pred_bboxes.dtype),
# cls_score_list
# ],
# axis=-1)
# 修改为:
return torch.cat(
[
pred_bboxes,
cls_score_list
],
axis=-1)
# ========== effidehead_lite.py ==========
# YOLOv6/yolov6/models/heads/effidehead_lite.py第123行
# return torch.cat(
# [
# pred_bboxes,
# torch.ones((b, pred_bboxes.shape[1], 1), device=pred_bboxes.device, dtype=pred_bboxes.dtype),
# cls_score_list
# ],
# axis=-1)
# 修改为:
return torch.cat(
[
pred_bboxes,
cls_score_list
],
axis=-1)- 导出 onnx 模型
cd YOLOv6
python deploy/ONNX/export_onnx.py --weights yolov6s.pt --img 640 --dynamic-batch --simplify- 复制模型并执行
cp YOLOv6/yolov6s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
# 修改代码在 src/application/app_yolo.cpp: app_yolo 函数中, 使用 V6 的方式即可运行
# test(Yolo::Type::V6, TRT::Mode::FP32, "yolov6s");
make yolo -j64YOLOv7支持
- 下载 YOLOv7
git clone https://github.com/WongKinYiu/yolov7.git - 导出 onnx 模型
python export.py --dynamic-batch --grid --simplify --weights=yolov7.pt- 复制模型并执行
cp yolov7/yolov7.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
# 修改代码在 src/application/app_yolo.cpp: app_yolo 函数中, 使用 V7 的方式即可运行
# test(Yolo::Type::V7, TRT::Mode::FP32, "yolov7");
make yolo -j64YOLOv8支持
- 下载 YOLOv8
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第72行,forward函数
# return y if self.export else (y, x)
# 修改为:
return y.permute(0, 2, 1) if self.export else (y, x)
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第323行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型, 在 ultralytics-main 新建导出文件
export.py内容如下:
# ========== export.py ==========
from ultralytics import YOLO
model = YOLO("yolov8s.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolov8s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo -j64YOLOv8-Cls支持
- 下载 YOLOv8
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第323行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
dynamic['output'] = {0: 'batch'}
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型, 在 ultralytics-main 新建导出文件
export.py内容如下:
# ========== export.py ==========
from ultralytics import YOLO
model = YOLO("yolov8s-cls.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolov8s-cls.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_cls -j64YOLOv8-Seg支持
- 下载 YOLOv8
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第106行,forward函数
# return (torch.cat([x, mc], 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
# 修改为:
return (torch.cat([x, mc], 1).permute(0, 2, 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第323行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)- 导出 onnx 模型, 在 ultralytics-main 新建导出文件
export.py内容如下:
# ========== export.py ==========
from ultralytics import YOLO
model = YOLO("yolov8s-seg.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolov8s-seg.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_seg -j64YOLOv8-OBB支持
- 下载 YOLOv8
glit clone https://github.com/ultralytics/ultralytics.git
cd ultralytics
git checkout tags/v8.1.0 -b v8.1.0- 修改代码, 保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第141行,forward函数
# return torch.cat([x, angle], 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
# 修改为:
return torch.cat([x, angle], 1).permute(0, 2, 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第353行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型, 在 ultralytics-main 新建导出文件
export.py内容如下:
# ========== export.py ==========
from ultralytics import YOLO
model = YOLO("yolov8s-obb.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolov8s-obb.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_obb -j64YOLOv8-Pose支持
- 下载 YOLOv8
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第130行,forward函数
# return torch.cat([x, pred_kpt], 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))
# 修改为:
return torch.cat([x, pred_kpt], 1).permute(0, 2, 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第323行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
dynamic['output'] = {0: 'batch'}
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)- 导出 onnx 模型, 在 ultralytics-main 新建导出文件
export.py内容如下:
# ========== export.py ==========
from ultralytics import YOLO
model = YOLO("yolov8s-pose.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolov8s-pose.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_pose -j64RT-DETR支持
- 前置条件
- tensorRT >= 8.6
- 下载 YOLOv8
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第323行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型,在 ultralytics-main 新建导出文件
export.py内容如下(可能会由于 torch 版本问题导出失败, 具体可参考 #6144)
from ultralytics import RTDETR
model = RTDETR("rtdetr-l.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp ultralytics/yolov8s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
bash build.sh- 执行
make rtdetr -j64RF-DETR支持
- 前置条件
- tensorRT >= 8.6
- 下载 RF-DETR
git clone https://github.com/roboflow/rf-detr.git- 修改代码,将后处理塞进 ONNX 计算图
- step 1. 在
rf-detr/src/rfdetr/export/_onnx文件夹下新建 post_process.py,内容如下:
# ------------------------------------------------------------------------
# RF-DETR
# Copyright (c) 2025 Roboflow. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
# ------------------------------------------------------------------------
"""Lightweight wrapper that embeds post-processing into the ONNX graph for deployment."""
import torch
from torch import nn
class ExportPostProcessor(nn.Module):
"""Wraps the raw LWDETR model so that sigmoid + argmax + merge are part of the ONNX graph.
The wrapped model's ``forward()`` produces a single tensor that is ready for threshold filtering
at the caller side — no additional CUDA kernels are needed for post-processing.
Input:
images: (B, 3, H, W) float32, already resized and normalized.
Output:
output: (B, 300, 6) where the last dim is ``[cx, cy, w, h, confidence, class_id]``.
"""
def __init__(self, model: nn.Module) -> None:
super().__init__()
self.model = model
def forward(self, images: torch.Tensor) -> torch.Tensor:
outputs = self.model(images)
# forward_export() returns a tuple (boxes, logits, ...); standard forward() returns a dict.
if isinstance(outputs, tuple):
pred_boxes = outputs[0]
pred_logits = outputs[1]
else:
pred_boxes = outputs["pred_boxes"]
pred_logits = outputs["pred_logits"]
# Sigmoid → per-query argmax to select the best class.
probs = pred_logits.sigmoid() # (B, 300, 91)
scores, labels = probs.max(dim=-1) # (B, 300), (B, 300)
labels = labels.float().unsqueeze(-1) # (B, 300, 1)
scores = scores.unsqueeze(-1) # (B, 300, 1)
# Merge: [cx, cy, w, h, confidence, class_id]
return torch.cat([pred_boxes, scores, labels], dim=-1) # (B, 300, 6)
class ExportRawProcessor(nn.Module):
"""Wraps LWDETR for ONNX export with raw outputs — no post-processing in the graph.
Use this to isolate whether FP16 issues come from post-processing fusion or the main network.
Input:
images: (B, 3, H, W) float32
Output:
pred_boxes: (B, 300, 4) cxcywh, normalized [0,1]
pred_logits: (B, 300, 91) raw class logits
"""
def __init__(self, model: nn.Module) -> None:
super().__init__()
self.model = model
def forward(self, images: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
outputs = self.model(images)
if isinstance(outputs, tuple):
return outputs[0], outputs[1]
else:
return outputs["pred_boxes"], outputs["pred_logits"]- step 2. 在
rf-detr/src/rfdetr/detr.py文件的 1133 行也就是 export() 函数后面新增如下函数:
def export_for_deployment(
self,
output_dir: str = "output",
shape: tuple[int, int] | None = None,
batch_size: int = 1,
opset_version: int = 17,
simplify: bool = True,
dynamic_batch: bool = False,
patch_size: int | None = None,
) -> Path:
"""Export ONNX with embedded post-processing for TensorRT deployment.
Produces a single ONNX file where sigmoid, argmax, and box-label merging are baked into the
graph. The caller only needs to resize, normalise, and run the model — the output is ready for
threshold filtering without any additional CUDA kernels.
Args:
output_dir: Directory to write the exported ONNX model.
shape: ``(height, width)`` tuple; defaults to ``(model.resolution, model.resolution)``.
batch_size: Static batch size baked into the ONNX graph.
opset_version: ONNX opset version to target.
simplify: Whether to run ``onnxsim`` on the exported graph.
patch_size: Backbone patch size. Defaults to ``model_config.patch_size``.
Returns:
Path to the exported ``.onnx`` file.
"""
from copy import deepcopy
from rfdetr.export._onnx.post_process import ExportPostProcessor
from rfdetr.export.main import make_infer_image
patch_size = _resolve_patch_size(patch_size, self.model_config, "export_for_deployment")
num_windows = getattr(self.model_config, "num_windows", 1)
if isinstance(num_windows, bool) or not isinstance(num_windows, int) or num_windows <= 0:
raise ValueError(f"num_windows must be a positive integer, got {num_windows!r}")
block_size = patch_size * num_windows
if shape is None:
shape = (self.model.resolution, self.model.resolution)
if shape[0] % block_size != 0:
raise ValueError(
f"Model's default resolution ({self.model.resolution}) is not divisible by "
f"block_size={block_size} (patch_size={patch_size} * num_windows={num_windows}). "
f"Provide an explicit shape divisible by {block_size}.",
)
else:
shape = _validate_shape_dims(shape, block_size, patch_size, num_windows)
device = self.model.device
self.model.model = self.model.model.to("cpu")
model = deepcopy(self.model.model)
model.to(device)
try:
os.makedirs(output_dir, exist_ok=True)
input_tensors = make_infer_image(
None, shape, batch_size, device, num_channels=self.model_config.num_channels
).to(device)
# Wrap with post-processing so sigmoid + argmax are baked into the ONNX graph.
# Must call model.export() first to disable anti-alias upsampling (incompatible with ONNX).
if hasattr(model, "export"):
model.export()
wrapped = ExportPostProcessor(model)
wrapped.eval()
input_names = ["images"]
output_names = ["output"]
dynamic_axes = {"images": {0: "batch"}, "output": {0: "batch"}} if dynamic_batch else None
export_kwargs = {}
if "dynamo" in __import__("inspect").signature(torch.onnx.export).parameters:
export_kwargs["dynamo"] = False
output_file = os.path.join(output_dir, f"{getattr(self, 'size', 'inference_model')}.onnx")
torch.onnx.export(
wrapped,
input_tensors,
output_file,
input_names=input_names,
output_names=output_names,
export_params=True,
keep_initializers_as_inputs=False,
do_constant_folding=True,
verbose=False,
opset_version=opset_version,
dynamic_axes=dynamic_axes,
**export_kwargs,
)
logger.info(f"Successfully exported ONNX model to: {output_file}")
if simplify:
from rfdetr.export._onnx.exporter import onnx_simplify
onnx_simplify(output_file, input_names=["images"], input_tensors=input_tensors)
logger.info("Export for deployment completed successfully")
return Path(output_file)
finally:
self.model.model = self.model.model.to(device)- 导出 onnx 模型,在 rf-detr 新建导出文件
export.py内容如下:
from rfdetr import RFDETRMedium
model = RFDETRMedium(pretrain_weights="./checkpoint_best_regular.pth")
# Export with embedded post-processing (sigmoid+argmax+concat)
model.export_for_deployment(output_dir="output", shape=(576, 576))cd rf-detr
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp rf-detr/output/rfdetr-medium.sim.onnx tensorRT_Pro-Vision/workspace/rfdetr-medium.onnx
cd tensorRT_Pro-Vision/workspace
bash build.sh- 执行
make rfdetr -j64ByteTrack支持
- 说明
代码 copy 自:https://github.com/CYYAI/AiInfer/tree/main/utils/tracker/ByteTracker
以 YOLOv8 作为检测器实现基本跟踪功能(其它检测器也行)
- demo 演示
cd tensorRT_Pro-Vision
make bytetrack -j64YOLOv9支持
- 说明
本项目的 YOLOv9 部署实现并不是官方原版,而是采用的集成到 ultralytics 的 YOLOv9
- 下载 YOLOv8
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第75行,forward函数
# return y if self.export else (y, x)
# 修改为:
return y.permute(0, 2, 1) if self.export else (y, x)
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第365行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型, 在 ultralytics-main 新建导出文件
export.py内容如下:
# ========== export.py ==========
from ultralytics import YOLO
model = YOLO("yolov9c.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolov9c.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo -j64YOLOv10支持
- 前置条件
- tensorRT >= 8.5
- 下载 YOLOv10
git clone https://github.com/THU-MIG/yolov10- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# yolov10-main/ultralytics/engine/exporter.py第323行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型,在 yolov10-main 新建导出文件
export.py内容如下
from ultralytics import YOLO
model = YOLO("yolov10s.pt")
success = model.export(format="onnx", dynamic=True, simplify=True, opset=13)cd yolov10-main
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp yolov10-main/yolov10s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
# 取消 build.sh 中 yolov10 engine 生成的注释
bash build.sh- 执行
make yolo -j64RTMO支持
- 前置条件
- tensorRT >= 8.6
- RTMO 导出环境搭建
conda create -n mmpose python=3.9
conda activate mmpose
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0rc2"
mim install "mmpose>=1.1.0"
pip install mmdeploy==1.3.1
pip install mmdeploy-runtime==1.3.1- 项目克隆
git clone https://github.com/open-mmlab/mmpose.git- 预训练权重下载
- 导出 onnx 模型,在 mmpose-main 新建导出文件
export.py内容如下:
import torch
from mmpose.apis import init_model
from mmpose.structures.bbox import bbox_xyxy2cs
class MyModel(torch.nn.Module):
def __init__(self) -> None:
super().__init__()
self.model = init_model(config_file, checkpoint_file, device=device)
test_cfg = {'input_size': (640, 640)}
self.model.neck.switch_to_deploy(test_cfg)
self.model.head.switch_to_deploy(test_cfg)
self.model.head.dcc.switch_to_deploy(test_cfg)
def forward(self, x):
x = self.model.backbone(x)
x = self.model.neck(x)
cls_scores, bbox_preds, _, kpt_vis, pose_vecs = self.model.head(x)[:5]
scores = self.model.head._flatten_predictions(cls_scores).sigmoid()
flatten_bbox_preds = self.model.head._flatten_predictions(bbox_preds)
flatten_pose_vecs = self.model.head._flatten_predictions(pose_vecs)
flatten_kpt_vis = self.model.head._flatten_predictions(kpt_vis).sigmoid()
bboxes = self.model.head.decode_bbox(flatten_bbox_preds, self.model.head.flatten_priors,
self.model.head.flatten_stride)
dets = torch.cat([bboxes, scores], dim=2)
grids = self.model.head.flatten_priors
bbox_cs = torch.cat(bbox_xyxy2cs(dets[..., :4], self.model.head.bbox_padding), dim=-1)
keypoints = self.model.head.dcc.forward_test(flatten_pose_vecs, bbox_cs, grids)
pred_kpts = torch.cat([keypoints, flatten_kpt_vis.unsqueeze(-1)], dim=-1)
bs, bboxes, ny, nx = map(int, pred_kpts.shape)
bs = -1
pred_kpts = pred_kpts.view(bs, bboxes, ny*nx)
return torch.cat([dets, pred_kpts], dim=2)
if __name__ == "__main__":
device = "cpu"
config_file = "configs/body_2d_keypoint/rtmo/body7/rtmo-s_8xb32-600e_body7-640x640.py"
checkpoint_file = "rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth"
model = MyModel()
model.eval()
x = torch.zeros(1, 3, 640, 640, device=device)
dynamic_batch = {'images': {0: 'batch'}, 'output': {0: 'batch'}}
torch.onnx.export(
model,
(x,),
"rtmo-s_8xb32-600e_body7-640x640.onnx",
input_names=["images"],
output_names=["output"],
opset_version=17,
dynamic_axes=dynamic_batch
)
# Checks
import onnx
model_onnx = onnx.load("rtmo-s_8xb32-600e_body7-640x640.onnx")
# onnx.checker.check_model(model_onnx) # check onnx model
# Simplify
try:
import onnxsim
print(f"simplifying with onnxsim {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "Simplified ONNX model could not be validated"
except Exception as e:
print(f"simplifier failure: {e}")
onnx.save(model_onnx, "rtmo-s_8xb32-600e_body7-640x640.onnx")
print(f"simplify done.")cd mmpose-main
conda activate mmpose
python export.py- engien 生成
Engine 生成:利用 trtexec 工具生成 engine
cp mmpose/rtmo-s_8xb32-600e_body7-640x640.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
# 取消 build.sh 中 rtmo engine 生成的注释
bash build.sh- 执行
make rtmo -j64LayerNorm Plugin支持
- 说明
- 当需要在低版本的 tensorRT 中解析 LayerNorm 算子时可以通过该插件支持
- LayerNorm 插件实现代码 copy 自 CUDA-BEVFusion/src/plugins/custom_layernorm.cu,代码进行了略微修改
- LayerNorm 插件的封装在推理时存在一些问题,因此并未使用
- libcustom_layernorm.so 生成
cd tensorRT_Pro-Vision
mkdir build && cd build
cmake .. && make -j64
cp libcustom_layernorm.so ../workspace- ONNX 模型修改(RTMO 为例说明,其它模型类似)
利用 onnx_graphsurgeon 修改原始 LayerNorm 的 op_type,代码如下:
import onnx
import onnx_graphsurgeon as gs
# 加载 ONNX 模型
input_model_path = "rtmo-s_8xb32-600e_body7-640x640.onnx"
output_model_path = "rtmo-s_8xb32-600e_body7-640x640.plugin.onnx"
graph = gs.import_onnx(onnx.load(input_model_path))
# 遍历图中的所有节点
for node in graph.nodes:
if node.op == "LayerNormalization":
node.op = "CustomLayerNormalization"
# 添加自定义属性
node.attrs["name"] = "LayerNormPlugin"
node.attrs["info"] = "This is custom LayerNormalization node"
# 删除无用的节点和张量
graph.cleanup()
# 导出修改后的模型
onnx.save(gs.export_onnx(graph), output_model_path)- engine 生成
利用 trtexec 工具加载插件解析 ONNX,新建 build.sh 脚本文件并执行,内容如下:
#! /usr/bin/bash
TRTEXEC=/home/jarvis/lean/TensorRT-8.5.1.7/bin/trtexec
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/jarvis/lean/TensorRT-8.5.1.7/lib
${TRTEXEC} \
--onnx=rtmo-s_8xb32-600e_body7-640x640.plugin.onnx \
--plugins=libcustom_layernorm.so \
--minShapes=images:1x3x640x640 \
--optShapes=images:1x3x640x640 \
--maxShapes=images:4x3x640x640 \
--memPoolSize=workspace:2048 \
--saveEngine=rtmo-s_8xb32-600e_body7-640x640.plugin.FP32.trtmodel \
> trtexec_output.log 2>&1PP-OCRv4支持
- 导出环境搭建
conda create --name paddleocr python=3.9
conda activate paddleocr
pip install shapely scikit-image imgaug pyclipper lmdb tqdm numpy==1.26.4 rapidfuzz onnxruntime
pip install "opencv-python<=4.6.0.66" "opencv-contrib-python<=4.6.0.66" cython "Pillow>=10.0.0" pyyaml requests
pip install paddlepaddle paddleocr paddle2onnx- 项目克隆
git clone https://github.com/PaddlePaddle/PaddleOCR.git- 预训练权重下载
-
导出 onnx 模型,具体流程请参考:PaddleOCR-PP-OCRv4推理详解及部署实现(上)
-
engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cd tensorRT_Pro-Vision/workspace
bash ocr_build.sh- 执行
make ppocr -j64LaneATT支持
- 导出环境搭建
conda create -n laneatt python=3.10
conda activate laneatt
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install pyyaml opencv-python scipy imgaug numpy==1.26.4 tqdm p_tqdm ujson scikit-learn tensorboard
pip install onnx onnxruntime onnx-simplifier- 项目克隆
git clone https://github.com/lucastabelini/LaneATT.git- 预训练权重下载
gdown "https://drive.google.com/uc?id=1R638ou1AMncTCRvrkQY6I-11CPwZy23T" # main experiments on TuSimple, CULane and LLAMAS (1.3 GB)
unzip laneatt_experiments.zip- 导出 onnx 模型,在 laneatt-main 新建导出文件
export.py内容如下:
import torch
from lib.models.laneatt import LaneATT
class LaneATTONNX(torch.nn.Module):
def __init__(self, model):
super(LaneATTONNX, self).__init__()
# Params
self.fmap_h = model.fmap_h # 11
self.fmap_w = model.fmap_w # 20
self.anchor_feat_channels = model.anchor_feat_channels # 64
self.anchors = model.anchors
self.cut_xs = model.cut_xs
self.cut_ys = model.cut_ys
self.cut_zs = model.cut_zs
self.invalid_mask = model.invalid_mask
# Layers
self.feature_extractor = model.feature_extractor
self.conv1 = model.conv1
self.cls_layer = model.cls_layer
self.reg_layer = model.reg_layer
self.attention_layer = model.attention_layer
# Exporting the operator eye to ONNX opset version 11 is not supported
attention_matrix = torch.eye(1000)
self.non_diag_inds = torch.nonzero(attention_matrix == 0., as_tuple=False)
self.non_diag_inds = self.non_diag_inds[:, 1] + 1000 * self.non_diag_inds[:, 0] # 999000
self.anchor_parts_1 = self.anchors[:, 2:4]
self.anchor_parts_2 = self.anchors[:, 4:]
def forward(self, x):
batch_features = self.feature_extractor(x)
batch_features = self.conv1(batch_features)
# batch_anchor_features = self.cut_anchor_features(batch_features)
# batchx15360
batch_anchor_features = batch_features.reshape(-1, int(batch_features.numel()))
# h, w = batch_features.shape[2:4] # 12, 20
indices = self.cut_xs + 20 * self.cut_ys + 12 * 20 * self.cut_zs
batch_anchor_features = batch_anchor_features[:, indices].\
view(-1, 1000, self.anchor_feat_channels, self.fmap_h, 1)
# batch_anchor_features[self.invalid_mask] = 0
batch_anchor_features = batch_anchor_features * torch.logical_not(self.invalid_mask)
# Join proposals from all images into a single proposals features batch
# batchx1000x704
batch_anchor_features = batch_anchor_features.view(-1, 1000, self.anchor_feat_channels * self.fmap_h)
# Add attention features
softmax = torch.nn.Softmax(dim=2)
# batchx1000x999
scores = self.attention_layer(batch_anchor_features)
attention = softmax(scores)
# bs, _, _ = scores.shape
bs, _, _ =scores.shape
attention_matrix = torch.zeros(bs, 1000 * 1000, device=x.device)
attention_matrix[:, self.non_diag_inds] = attention.reshape(-1, int(attention.numel()))
attention_matrix = attention_matrix.view(-1, 1000, 1000)
attention_features = torch.matmul(torch.transpose(batch_anchor_features, 1, 2),
torch.transpose(attention_matrix, 1, 2)).transpose(1, 2)
batch_anchor_features = torch.cat((attention_features, batch_anchor_features), dim=2)
# Predict
cls_logits = self.cls_layer(batch_anchor_features)
reg = self.reg_layer(batch_anchor_features)
anchor_expanded_1 = self.anchor_parts_1.repeat(reg.shape[0], 1, 1)
anchor_expanded_2 = self.anchor_parts_2.repeat(reg.shape[0], 1, 1)
# Add offsets to anchors (1000, 2+2+73)
reg_proposals = torch.cat([softmax(cls_logits), anchor_expanded_1, anchor_expanded_2 + reg], dim=2)
return reg_proposals
def export_onnx(onnx_file_path):
# e.g. laneatt_r18_culane
backbone_name = 'resnet18'
checkpoint_file_path = 'experiments/laneatt_r18_culane/models/model_0015.pt'
anchors_freq_path = 'data/culane_anchors_freq.pt'
# Load specified checkpoint
model = LaneATT(backbone=backbone_name, anchors_freq_path=anchors_freq_path, topk_anchors=1000)
checkpoint = torch.load(checkpoint_file_path)
model.load_state_dict(checkpoint['model'])
model.eval()
# Export to ONNX
onnx_model = LaneATTONNX(model)
dummy_input = torch.randn(1, 3, 360, 640)
dynamic_batch = {'images': {0: 'batch'}, 'output': {0: 'batch'}}
torch.onnx.export(
onnx_model,
dummy_input,
onnx_file_path,
input_names=["images"],
output_names=["output"],
dynamic_axes=dynamic_batch
)
import onnx
model_onnx = onnx.load(onnx_file_path)
# Simplify
try:
import onnxsim
print(f"simplifying with onnxsim {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "Simplified ONNX model could not be validated"
except Exception as e:
print(f"simplifier failure: {e}")
onnx.save(model_onnx, "laneatt.sim.onnx")
print(f"simplify done. onnx model save in laneatt.sim.onnx")
if __name__ == '__main__':
export_onnx('./laneatt.onnx')cd laneatt-main
conda activate laneatt
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cd tensorRT_Pro-Vision/workspace
bash lane_build.shCLRNet支持
1. 前置条件
- tensorRT >= 8.6
2. 导出环境搭建
conda create -n clrnet python=3.9
conda activate clrnet
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install pandas addict scikit-learn opencv-python pytorch_warmup scikit-image tqdm p_tqdm
pip install imgaug yapf timm pathspec pthflops
pip install numpy==1.26.4 mmcv==1.2.5 albumentations==0.4.6 ujson==1.35 Shapely==2.0.5
pip install onnx onnx-simplifier onnxruntime3. 项目克隆
git clone https://github.com/Turoad/CLRNet.git4. 预训练权重下载
- 下载链接(Baidu Drive)
5. 导出 onnx 模型,在 clrnet-main 新建导出文件 export.py 内容如下:
import math
import torch
import torch.nn.functional as F
from clrnet.utils.config import Config
from mmcv.parallel import MMDataParallel
from clrnet.models.registry import build_net
class CLRNetONNX(torch.nn.Module):
def __init__(self, model):
super(CLRNetONNX, self).__init__()
self.backbone = model.backbone
self.neck = model.neck
self.head = model.heads
def forward(self, x):
x = self.backbone(x)
x = self.neck(x)
batch_features = list(x[len(x) - self.head.refine_layers:])
# 1x64x10x25+1x64x20x50+1x64x40x100
batch_features.reverse()
batch_size = batch_features[-1].shape[0]
# 1x192x78
priors = self.head.priors.repeat(batch_size, 1, 1)
# 1x192x36
priors_on_featmap = self.head.priors_on_featmap.repeat(batch_size, 1, 1)
prediction_lists = []
prior_features_stages = []
for stage in range(self.head.refine_layers):
# 1. anchor ROI pooling
num_priors = int(priors_on_featmap.shape[1])
prior_xs = torch.flip(priors_on_featmap, dims=[2])
batch_prior_features = self.head.pool_prior_features(
batch_features[stage], num_priors, prior_xs)
prior_features_stages.append(batch_prior_features)
# 2. ROI gather
fc_features = self.head.roi_gather(prior_features_stages,
batch_features[stage], stage)
# 3. cls and reg head
# fc_features = fc_features.view(num_priors, batch_size, -1).reshape(batch_size * num_priors, self.head.fc_hidden_dim)
fc_features = fc_features.view(num_priors, -1, 64).reshape(-1, self.head.fc_hidden_dim)
cls_features = fc_features.clone()
reg_features = fc_features.clone()
for cls_layer in self.head.cls_modules:
cls_features = cls_layer(cls_features)
for reg_layer in self.head.reg_modules:
reg_features = reg_layer(reg_features)
cls_logits = self.head.cls_layers(cls_features)
reg = self.head.reg_layers(reg_features)
# cls_logits = cls_logits.reshape(batch_size, -1, cls_logits.shape[1]) # (B, num_priors, 2)
cls_logits = cls_logits.reshape(-1, 192, 2) # (B, num_priors, 2)
# add softmax
softmax = torch.nn.Softmax(dim=2)
cls_logits = softmax(cls_logits)
# reg = reg.reshape(batch_size, -1, reg.shape[1])
reg = reg.reshape(-1, 192, 76)
predictions = priors.clone()
predictions[:, :, :2] = cls_logits
predictions[:, :, 2:5] += reg[:, :, :3]
# add n_strips * length
# predictions[:, :, 5] = reg[:, :, 3] # length
predictions[:, :, 5] = reg[:, :, 3] * self.head.n_strips # length
def tran_tensor(t):
return t.unsqueeze(2).clone().repeat(1, 1, self.head.n_offsets)
batch_size = reg.shape[0]
predictions[..., 6:] = (
tran_tensor(predictions[..., 3]) * (self.head.img_w - 1) +
((1 - self.head.prior_ys.repeat(batch_size, num_priors, 1) -
tran_tensor(predictions[..., 2])) * self.head.img_h /
torch.tan(tran_tensor(predictions[..., 4]) * math.pi + 1e-5))) / (self.head.img_w - 1)
prediction_lines = predictions.clone()
predictions[..., 6:] += reg[..., 4:]
prediction_lists.append(predictions)
if stage != self.head.refine_layers - 1:
priors = prediction_lines.detach().clone()
priors_on_featmap = priors[..., 6 + self.head.sample_x_indexs]
return prediction_lists[-1]
def export_onnx(onnx_file_path):
# e.g. clrnet_culane_r18
cfg = Config.fromfile("configs/clrnet/clr_resnet18_culane.py")
checkpoint_file_path = "culane_r18.pth"
# load checkpoint
net = build_net(cfg)
net = MMDataParallel(net, device_ids=range(1)).cuda()
pretrained_model = torch.load(checkpoint_file_path)
net.load_state_dict(pretrained_model['net'], strict=False)
net.eval()
model = net.to("cpu")
onnx_model = CLRNetONNX(model.module)
# Export to ONNX
dummy_input = torch.randn(1, 3 ,320, 800)
dynamic_batch = {'images': {0: 'batch'}, 'output': {0: 'batch'}}
torch.onnx.export(
onnx_model,
dummy_input,
onnx_file_path,
input_names=["images"],
output_names=["output"],
opset_version=17,
dynamic_axes=dynamic_batch
)
print(f"finished export onnx model")
import onnx
model_onnx = onnx.load(onnx_file_path)
onnx.checker.check_model(model_onnx) # check onnx model
# Simplify
try:
import onnxsim
print(f"simplifying with onnxsim {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "Simplified ONNX model could not be validated"
except Exception as e:
print(f"simplifier failure: {e}")
onnx.save(model_onnx, "clrnet.sim.onnx")
print(f"simplify done. onnx model save in clrnet.sim.onnx")
if __name__ == "__main__":
export_onnx("./clrnet.onnx")cd clrnet-main
conda activate clrnet
python export.py5. engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cd tensorRT_Pro-Vision/workspace
bash lane_build.shCLRerNet支持
1. 前置条件
- tensorRT >= 8.6
2. 导出环境搭建
conda create -n clrernet python=3.8
conda activate clrernet
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -U openmim==0.3.3
mim install mmcv-full==1.7.0
pip install albumentations==0.4.6 p_tqdm==1.3.3 yapf==0.40.1 mmdet==2.28.0
pip install pytest pytest-cov tensorboard
pip install onnx onnx-simplifier onnxruntime3. 项目克隆
git clone https://github.com/hirotomusiker/CLRerNet.git4. 预训练权重下载
- 下载链接(Baidu Drive)
5. 导出 onnx 模型,在 clrernet-main 新建导出文件 export.py 内容如下:
import torch
from mmcv import Config
from mmdet.models import build_detector
from mmcv.runner import load_checkpoint
class CLRerNetONNX(torch.nn.Module):
def __init__(self, model):
super(CLRerNetONNX, self).__init__()
self.model = model
self.bakcbone = model.backbone
self.neck = model.neck
self.head = model.bbox_head
def forward(self, x):
x = self.bakcbone(x)
x = self.neck(x)
batch = x[0].shape[0]
feature_pyramid = list(x[len(x) - self.head.refine_layers:])
# 1x64x10x25+1x64x20x50+1x64x40x100
feature_pyramid.reverse()
_, sampled_xs = self.head.anchor_generator.generate_anchors(
self.head.anchor_generator.prior_embeddings.weight,
self.head.prior_ys,
self.head.sample_x_indices,
self.head.img_w,
self.head.img_h
)
anchor_params = self.head.anchor_generator.prior_embeddings.weight.clone().repeat(batch, 1, 1)
priors_on_featmap = sampled_xs.repeat(batch, 1, 1)
predictions_list = []
pooled_features_stages = []
for stage in range(self.head.refine_layers):
# 1. anchor ROI pooling
prior_xs = priors_on_featmap
pooled_features = self.head.pool_prior_features(feature_pyramid[stage], prior_xs)
pooled_features_stages.append(pooled_features)
# 2. ROI gather
fc_features = self.head.attention(pooled_features_stages, feature_pyramid, stage)
# fc_features = fc_features.view(self.head.num_priors, batch, -1).reshape(batch * self.head.num_priors, self.head.fc_hidden_dim)
fc_features = fc_features.view(self.head.num_priors, -1, 64).reshape(-1, self.head.fc_hidden_dim)
# 3. cls and reg head
cls_features = fc_features.clone()
reg_features = fc_features.clone()
for cls_layer in self.head.cls_modules:
cls_features = cls_layer(cls_features)
for reg_layer in self.head.reg_modules:
reg_features = reg_layer(reg_features)
cls_logits = self.head.cls_layers(cls_features)
# cls_logits = cls_logits.reshape(batch, -1, cls_logits.shape[1])
cls_logits = cls_logits.reshape(-1, 192, 2)
reg = self.head.reg_layers(reg_features)
# reg = reg.reshape(batch, -1, reg.shape[1])
reg = reg.reshape(-1, 192, 76)
# 4. reg processing
anchor_params += reg[:, :, :3]
updated_anchor_xs, _ = self.head.anchor_generator.generate_anchors(
anchor_params.view(-1, 3),
self.head.prior_ys,
self.head.sample_x_indices,
self.head.img_w,
self.head.img_h
)
# updated_anchor_xs = updated_anchor_xs.view(batch, self.head.num_priors, -1)
updated_anchor_xs = updated_anchor_xs.view(-1, 192, 72)
reg_xs = updated_anchor_xs + reg[..., 4:]
# start_y, start_x, theta
# some problem.
# anchor_params[:, :, 0] = 1.0 - anchor_params[:, :, 0]
# anchor_params_ = anchor_params.clone()
# anchor_params_[:, :, 0] = 1.0 - anchor_params_[:, :, 0]
# print(f"anchor_params.shape = {anchor_params_.shape}")
softmax = torch.nn.Softmax(dim=2)
cls_logits = softmax(cls_logits)
reg[:, :, 3:4] = reg[:, :, 3:4] * self.head.n_strips
predictions = torch.concat([cls_logits, anchor_params, reg[:, :, 3:4], reg_xs], dim=2)
# predictions = torch.concat([cls_logits, anchor_params_, reg[:, :, 3:4], reg_xs], dim=2)
predictions_list.append(predictions)
if stage != self.head.refine_layers - 1:
anchor_params = anchor_params.detach().clone()
priors_on_featmap = updated_anchor_xs.detach().clone()[
..., self.head.sample_x_indices
]
return predictions_list[-1]
if __name__ == "__main__":
cfg = Config.fromfile("configs/clrernet/culane/clrernet_culane_dla34.py")
model = build_detector(cfg.model, test_cfg=cfg.get("test_cfg"))
load_checkpoint(model, "clrernet_culane_dla34.pth", map_location="cpu")
model.eval()
model = model.to("cpu")
# Export to ONNX
onnx_model = CLRerNetONNX(model)
dummy_input = torch.randn(1, 3, 320, 800)
dynamic_batch = {'images': {0: 'batch'}, 'output': {0: 'batch'}}
torch.onnx.export(
onnx_model,
dummy_input,
"model.onnx",
input_names=["images"],
output_names=["output"],
opset_version=17,
dynamic_axes=dynamic_batch
)
print(f"finished export onnx model")
import onnx
model_onnx = onnx.load("model.onnx")
onnx.checker.check_model(model_onnx) # check onnx model
# Simplify
try:
import onnxsim
print(f"simplifying with onnxsim {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "Simplified ONNX model could not be validated"
except Exception as e:
print(f"simplifier failure: {e}")
onnx.save(model_onnx, "clrernet.sim.onnx")
print(f"simplify done. onnx model save in clrernet.sim.onnx")cd clrernet-main
conda activate clrernet
python export.py5. engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cd tensorRT_Pro-Vision/workspace
bash lane_build.shYOLO11支持
- 下载 YOLO11
git clone https://github.com/ultralytics/ultralytics.git- 修改代码,保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第68行,forward函数
# return y if self.export else (y, x)
# 修改为:
return y.permute(0, 2, 1) if self.export else (y, x)
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第400行
# output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# 修改为:
output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch"} # shape(1, 84, 8400)- 导出 onnx 模型,在 ultralytics-main 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO("yolo11s.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolo11s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo -j64YOLO11-Cls支持
- 下载 YOLO11
git clone https://github.com/ultralytics/ultralytics.git- 修改代码,保证动态 batch
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第400行
# output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# 修改为:
output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch"} # shape(1, 84, 8400)- 导出 onnx 模型,在 ultralytics-main 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO("yolo11s-cls.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolo11s-cls.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_cls -j64YOLO11-Seg支持
- 下载 YOLO11
git clone https://github.com/ultralytics/ultralytics.git- 修改代码,保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第186行,forward函数
# return (torch.cat([x, mc], 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
# 修改为:
return (torch.cat([x, mc], 1).permute(0, 2, 1), p) if self.export else (torch.cat([x[0], mc], 1), (x[1], mc, p))
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第400行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)- 导出 onnx 模型,在 ultralytics-main 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO("yolo11s-seg.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolo11s-seg.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_seg -j64YOLO11-OBB支持
- 下载 YOLO11
git clone https://github.com/ultralytics/ultralytics.git- 修改代码,保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第212行,forward函数
# return torch.cat([x, angle], 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
# 修改为:
return torch.cat([x, angle], 1).permute(0, 2, 1) if self.export else (torch.cat([x[0], angle], 1), (x[1], angle))
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第400行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output'] = {0: 'batch'} # shape(1, 84, 8400)- 导出 onnx 模型,在 ultralytics-main 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO("yolo11s-obb.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolo11s-obb.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_obb -j64YOLO11-Pose支持
- 下载 YOLO11
git clone https://github.com/ultralytics/ultralytics.git- 修改代码,保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第239行,forward函数
# return torch.cat([x, pred_kpt], 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))
# 修改为:
return torch.cat([x, pred_kpt], 1).permute(0, 2, 1) if self.export else (torch.cat([x[0], pred_kpt], 1), (x[1], kpt))
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第400行
# output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
# dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)
# 修改为:
output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output']
dynamic = self.args.dynamic
if dynamic:
dynamic = {'images': {0: 'batch'}} # shape(1,3,640,640)
dynamic['output'] = {0: 'batch'}
if isinstance(self.model, SegmentationModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 116, 8400)
dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic['output0'] = {0: 'batch', 2: 'anchors'} # shape(1, 84, 8400)- 导出 onnx 模型,在 ultralytics-main 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO("yolo11s-pose.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics-main
python export.py- 复制模型并执行
cp ultralytics/yolo11s-pose.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_pose -j64Depth-Anything-V1支持
1. 前置条件
- tensorRT >= 8.6
2. 项目克隆
git clone https://github.com/LiheYoung/Depth-Anything.git3. 预训练权重下载
- 下载链接(Baidu Drive)
4. 修改代码,保证正确导出
# ========== dpt.py ==========
# depth_anything/dpt.py第5行,注释
# from huggingface_hub import PyTorchModelHubMixin, hf_hub_download
# depth_anything/dpt.py第166行,forward函数
# return depth.squeeze(1)
# 修改为:
return depth5. 导出 onnx 模型,在 Depth-Anything 项目下新建导出文件 export.py,内容如下:
import torch
import argparse
import torch.onnx
from depth_anything.dpt import DPT_DINOv2
def export_model(encoder: str, load_from: str, image_shape: tuple):
# Initializing model
assert encoder in ['vits', 'vitb', 'vitl']
if encoder == 'vits':
depth_anything = DPT_DINOv2(encoder='vits', features=64, out_channels=[48, 96, 192, 384], localhub='localhub')
elif encoder == 'vitb':
depth_anything = DPT_DINOv2(encoder='vitb', features=128, out_channels=[96, 192, 384, 768], localhub='localhub')
else:
depth_anything = DPT_DINOv2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024], localhub='localhub')
total_params = sum(param.numel() for param in depth_anything.parameters())
print('Total parameters: {:.2f}M'.format(total_params / 1e6))
# Loading model weight
depth_anything.load_state_dict(torch.load(load_from, map_location='cpu'), strict=True)
depth_anything.eval()
# Define dummy input data
dummy_input = torch.ones(image_shape).unsqueeze(0)
onnx_path = load_from.split('/')[-1].split('.pth')[0] + '.onnx'
dynamic_batch = {"images": {0: "batch"}, "output": {0: "batch"}}
# Export the PyTorch model to ONNX format
torch.onnx.export(
depth_anything,
dummy_input,
onnx_path,
opset_version=17,
input_names=["images"],
output_names=["output"],
dynamic_axes=None
)
import onnx
model_onnx = onnx.load(onnx_path)
# Simplify
try:
import onnxsim
print(f"simplifying with onnxsim {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "Simplified ONNX model could not be validated"
except Exception as e:
print(f"simplifier failure: {e}")
onnx.save(model_onnx, f"depth_anything_{encoder}.sim.onnx")
print(f"simplify done. onnx model save in depth_anything_{encoder}.sim.onnx")
print(f"Model exported to {onnx_path}")
def main():
parser = argparse.ArgumentParser(description="Export Depth DPT model to ONNX format")
parser.add_argument("--encoder", type=str, choices=['vits', 'vitb', 'vitl'], help="Type of encoder to use ('vits', 'vitb', 'vitl')")
parser.add_argument("--load_from", type=str, help="Path to the pre-trained model checkpoint")
parser.add_argument("--image_shape", type=int, nargs=3, metavar=("channels", "height", "width"), help="Shape of the input image")
args = parser.parse_args()
export_model(args.encoder, args.load_from, tuple(args.image_shape))
if __name__ == "__main__":
main()cd Depth-Anything
python export.py --encoder vits --load_from depth_anything_vits14.pth --image_shape 3 518 5186. engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cd tensorRT_Pro-Vision/workspace
bash depth_anything_build.sh7. 执行
cd tensorRT_Pro-Vision
make depth_anything -j64Depth-Anything-V2支持
1. 前置条件
- tensorRT >= 8.6
2. 项目克隆
git clone https://github.com/DepthAnything/Depth-Anything-V2.git3. 预训练权重下载
- 下载链接(Baidu Drive)
4. 修改代码,保证正确导出
# ========== dpt.py ==========
# depth_anything_v2/dpt.py第184行,forward函数
# return depth.squeeze(1)
# 修改为:
return depth5. 导出 onnx 模型,在 Depth-Anything-V2 项目下新建导出文件 export.py,内容如下:
import torch
import argparse
from depth_anything_v2.dpt import DepthAnythingV2
def main():
parser = argparse.ArgumentParser(description='Depth Anything V2')
parser.add_argument('--input-size', type=int, default=518)
parser.add_argument('--encoder', type=str, default='vits', choices=['vits', 'vitb', 'vitl', 'vitg'])
args = parser.parse_args()
# we are undergoing company review procedures to release Depth-Anything-Giant checkpoint
model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
}
depth_anything = DepthAnythingV2(**model_configs[args.encoder])
depth_anything.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
depth_anything = depth_anything.to('cpu').eval()
# Define dummy input data
dummy_input = torch.ones((3, args.input_size, args.input_size)).unsqueeze(0)
onnx_path = f'depth_anything_v2_{args.encoder}.onnx'
dynamic_batch = {"images": {0: "batch"}, "output": {0: "batch"}}
# Export the PyTorch model to ONNX format
torch.onnx.export(
depth_anything,
dummy_input,
onnx_path,
opset_version=17,
input_names=["images"],
output_names=["output"],
dynamic_axes=None
)
import onnx
model_onnx = onnx.load(onnx_path)
# Simplify
try:
import onnxsim
print(f"simplifying with onnxsim {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "Simplified ONNX model could not be validated"
except Exception as e:
print(f"simplifier failure: {e}")
onnx.save(model_onnx, f"depth_anything_v2_{args.encoder}.sim.onnx")
print(f"simplify done. onnx model save in depth_anything_v2_{args.encoder}.sim.onnx")
if __name__ == "__main__":
main()cd Depth-Anything-V2
python export.py --encoder vits --input-size 5186. engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cd tensorRT_Pro-Vision/workspace
bash depth_anything_build.sh7. 执行
cd tensorRT_Pro-Vision
make depth_anything -j64YOLOv12支持
- 下载 YOLOv12
git clone https://github.com/sunsmarterjie/yolov12- 修改代码,保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第74行,forward函数
# return y if self.export else (y, x)
# 修改为:
return y.permute(0, 2, 1) if self.export else (y, x)
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第499行
# output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# 修改为:
output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch"} # shape(1, 84, 8400)- 导出 onnx 模型,在 yolov12 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO('yolov12s.pt')
model.export(format="onnx", dynamic=True)cd yolov12
python export.py- 复制模型并执行
cp yolov12/yolov12s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo -j64YOLOv13支持
- 环境搭建
git clone https://github.com/iMoonLab/yolov13.git
cd yolov13
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu11torch2.2cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
conda create -n yolov13 python=3.11
conda activate yolov13
pip install -r requirements.txt
pip install -e .- 修改代码,保证动态 batch
# ========== head.py ==========
# ultralytics/nn/modules/head.py第74行,forward函数
# return y if self.export else (y, x)
# 修改为:
return y.permute(0, 2, 1) if self.export else (y, x)
# ========== exporter.py ==========
# ultralytics/engine/exporter.py第499行
# output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# 修改为:
output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch"} # shape(1, 84, 8400)- 导出 onnx 模型,在 yolov13 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO('yolov13s.pt')
model.export(format="onnx", dynamic=True)cd yolov13
conda activate yolov13
python export.py- 复制模型并执行
cp yolov13/yolov13s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo -j64YOLO26支持
- 前置条件
- tensorRT >= 8.6
- 下载 YOLO26
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/ultralytics/engine/exporter.py第687行
# output_names = ["output0", "output1"] if self.model.task == "segment" else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# if self.args.nms: # only batch size is dynamic with NMS
# dynamic["output0"].pop(2)
# 修改为:
output_names = ["output0", "output1"] if self.model.task == "segment" else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output"] = {0: "batch"} # shape(1, 84, 8400)
if self.args.nms: # only batch size is dynamic with NMS
dynamic["output0"].pop(2)- 导出 onnx 模型,在 ultralytics 新建导出文件
export.py内容如下
from ultralytics import YOLO
model = YOLO("yolo26s.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp ultralytics/yolo26s.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
# 取消 build.sh 中 yolo26 engine 生成的注释
bash build.sh- 执行
make yolo -j64Note:YOLO26 和 YOLOv10 类似,都是 anchor-free 的模型,无 NMS 后处理,所以部署流程完全可以参考 YOLOv10,甚至用同一套推理代码
YOLO26-Cls支持
- 下载 YOLO26
git clone https://github.com/ultralytics/ultralytics.git- 修改代码,保证动态 batch
# ========== exporter.py ==========
# ultralytics/ultralytics/engine/exporter.py第687行
# output_names = ["output0", "output1"] if self.model.task == "segment" else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# if self.args.nms: # only batch size is dynamic with NMS
# dynamic["output0"].pop(2)
# 修改为:
output_names = ["output0", "output1"] if self.model.task == "segment" else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
dynamic['output'] = {0: 'batch'}
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
if self.args.nms: # only batch size is dynamic with NMS
dynamic["output0"].pop(2)- 导出 onnx 模型,在 ultralytics 新建导出文件
export.py内容如下:
from ultralytics import YOLO
model = YOLO("yolo26s-cls.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics
python export.py- 复制模型并执行
cp ultralytics/yolo26s-cls.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision
make yolo_cls -j64YOLO26-Seg支持
- 前置条件
- tensorRT >= 8.6
- 下载 YOLO26
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/ultralytics/engine/exporter.py第687行
# output_names = ["output0", "output1"] if self.model.task == "segment" else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# if self.args.nms: # only batch size is dynamic with NMS
# dynamic["output0"].pop(2)
# 修改为:
output_names = ["output0", "output1"] if self.model.task == "segment" else ["output0"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
if self.args.nms: # only batch size is dynamic with NMS
dynamic["output0"].pop(2)- 导出 onnx 模型,在 ultralytics 新建导出文件
export.py内容如下
from ultralytics import YOLO
model = YOLO("yolo26s-seg.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp ultralytics/yolo26s-seg.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
# 取消 build.sh 中 yolo26-seg engine 生成的注释
bash build.sh- 执行
make yolo_seg -j64YOLO26-OBB支持
- 前置条件
- tensorRT >= 8.6
- 下载 YOLO26
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/ultralytics/engine/exporter.py第687行
# output_names = ["output0", "output1"] if self.model.task == "segment" else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# if self.args.nms: # only batch size is dynamic with NMS
# dynamic["output0"].pop(2)
# 修改为:
output_names = ["output0", "output1"] if self.model.task == "segment" else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
dynamic["output"] = {0: "batch"}
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
if self.args.nms: # only batch size is dynamic with NMS
dynamic["output0"].pop(2)- 导出 onnx 模型,在 ultralytics 新建导出文件
export.py内容如下
from ultralytics import YOLO
model = YOLO("yolo26s-obb.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp ultralytics/yolo26s-obb.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
# 取消 build.sh 中 yolo26-obb engine 生成的注释
bash build.sh- 执行
make yolo_obb -j64YOLO26-Pose支持
- 前置条件
- tensorRT >= 8.6
- 下载 YOLO26
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/ultralytics/engine/exporter.py第687行
# output_names = ["output0", "output1"] if self.model.task == "segment" else ["output0"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# if isinstance(self.model, SegmentationModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
# dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
# elif isinstance(self.model, DetectionModel):
# dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
# if self.args.nms: # only batch size is dynamic with NMS
# dynamic["output0"].pop(2)
# 修改为:
output_names = ["output0", "output1"] if self.model.task == "segment" else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)
dynamic["output"] = {0: "batch"}
if isinstance(self.model, SegmentationModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 116, 8400)
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
if self.args.nms: # only batch size is dynamic with NMS
dynamic["output0"].pop(2)- 导出 onnx 模型,在 ultralytics 新建导出文件
export.py内容如下
from ultralytics import YOLO
model = YOLO("yolo26s-pose.pt")
success = model.export(format="onnx", dynamic=True, simplify=True)cd ultralytics
python export.py- engine 生成
Engine 生成:利用 trtexec 工具生成 engine
cp ultralytics/yolo26s-pose.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
# 取消 build.sh 中 yolo26-pose engine 生成的注释
bash build.sh- 执行
make yolo_pose -j64YOLO26-Sem支持
- 下载 YOLO26
git clone https://github.com/ultralytics/ultralytics.git- 修改代码, 保证动态 batch
# ========== exporter.py ==========
# ultralytics/engine/exporter.py 第 638 行
# output_names = ["output0", "output1"] if self.model.task == "segment" else ["output1"]
# dynamic = self.args.dynamic
# if dynamic:
# dynamic = {"images": {0: "batch", 2: "height", 3: "width"}} # shape(1,3,640,640)
# 修改为:
output_names = ["output0", "output1"] if self.model.task == "segment" else ["output"]
dynamic = self.args.dynamic
if dynamic:
dynamic = {"images": {0: "batch"}} # shape(1,3,640,640)- 导出 onnx 模型,在 ultralytics 新建导出文件
export.py内容如下
from ultralytics import YOLO
# Load a model
model = YOLO("yolo26s-sem.pt") # load an official model
# Export the model
model.export(format="onnx", imgsz=(512, 1024), dynamic=True, simplify=True)cd ultralytics
python export.py- 复制模型并执行
cp ultralytics/yolo26s-sem.onnx tensorRT_Pro-Vision/workspace
cd tensorRT_Pro-Vision/workspace
make yolo_sem -j64编译接口
TRT::compile(
mode, // FP32、FP16、INT8
test_batch_size, // max batch size
onnx_file, // source
model_file, // save to
{}, // redefine the input shape
int8process, // the recall function for calibration
"inference", // the dir where the image data is used for calibration
"" // the dir where the data generated from calibration is saved (a.k.a where to load the calibration data.)
);- TRT 原编译接口,支持 FP32、FP16、INT8 编译
- 模型的编译工作也可以通过
trtexec工具完成
推理接口
// 创建推理引擎在 0 号显卡上
auto engine = YoloPose::create_infer(
engine_file, // engine file
deviceid, // gpu id
0.25f, // confidence threshold
0.45f, // nms threshold
YoloPose::NMSMethod::FastGPU, // NMS method, fast GPU / CPU
1024, // max objects
false // preprocess use multi stream
);
// 加载图像
auto image = cv::imread("inference/car.jpg");
// 推理并获取结果
auto boxes = engine->commit(image).get() // 得到的是 vector<Box>