提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件


0 引言


1 项目概述

1.1 系统框图


1.2 实验所需设备

  1. ESP32-CAM开发板(带OV2640摄像头)
  2. Python上位机

1.3 项目程序使用说明

  1. PlatformIO\Projects\esp32cam-test\src\esp32cam-test.cpp中配置WIFI_SSID和WIFI密码
  2. 使用PlatformIO或Arduino编译程序并烧录至ESP32CAM
  3. 在上位机运行Python脚本esp32cam-pytorch-thread-mjpeg_yolo.pyesp32cam-pytorch-thread-jpg_yolo.py
  4. 通过键盘输入选择要执行的推理任务:
键值 任务类型 任务 模型架构
g 图像分类 垃圾分类 ConvNeXt
d 图像分类 日常用品 ViT
r 图像分类 万物识别 ResNeSt
o 目标检测 通用目标检测 DAMO-YOLO
h 目标检测 人头检测 DAMO-YOLO
f 目标检测 口罩检测 DAMO-YOLO
p 目标检测 手机检测 DAMO-YOLO
  1. q退出程序

1.4 项目程序运行演示


垃圾分类1 垃圾分类2


日常用品1 日常用品2


万物识别1 万物识别2
通用目标检测 手机检测
通用目标检测 手机检测

2 下位机端构建流程

2.1 ESP32-CAM介绍


2.1.1 ESP32-CAM作为Web Server

ESP32-CAM的一种常见部署方式是作为局域网的Web Server。连接在同一网段的下位机或其他物联网设备可以通过访问ESP32-CAM的IP地址来获取ESP32-CAM捕获到的图片或视频。在这种情况下,网间进程的工作模式是两层的Client-Server模式,应用层协议为HTTP,传输层协议为TCP。

ESP32-CAM Web Server 和它的 Client (以上位机的python程序为例)的一种可能的数据传输过程如下:




2.2 VSCode PlatformIO 开发环境搭建

为了在VSCode开发环境中构建和调试ESP32-CAM工程,以及获得更快的编译速度(不用像Arduino那样每次下载都重新编译),本项目使用PlatformIO作为开发环境。首先,在VSCode中下载PlatformIO插件,并重启VSCode以安装PlatformIO IDE。


2.2.1 安装ESP32离线框架

PlatformIO的在线框架对国内用户不太友好,因此本项目从阿里云盘上下载打包好的PlatformIO ESP32离线框架,并添加到PlatformIO的默认库文件路径C:\Users\your_user_name\.platformio\packages\中。

阿里云盘:https://www.aliyundrive.com/s/pFDFnmdz8mi CSDN下载:PlatformIO 离线安装资源


2.2.2 新建ESP32-CAM工程

VSCode左下角PIO Home主页新建工程,选择AI Thinker ESP32-CAM作为板子。第一次新建工程可能需要等一段时间,如果太久没反应,说明之前的开发环境配置可能有问题。


2.3 ESP32-CAM 开发环境搭建

2.3.1 安装yoursunny的esp32cam库

在 PlatformIO ESP32 库文件目录C:\Users\your_user_name\.platformio\packages\framework-arduinoespressif32\libraries\路径下,把 yoursunny 的 esp32cam库用 Git Bash 克隆到本地:

git clone https://gitcode.com/mirrors/yoursunny/esp32cam.git

2.3.2 修改波特率


platform = espressif32
board = esp32cam
framework = arduino
upload_port = COM8      ;下载程序端口号
upload_speed = 115200   ;下载波特率
monitor_port = COM8     ;串口控制台端口号
monitor_speed = 115200  ;串口控制台波特率


2.4 获取ESP32-CAM视频流

2.4.1 编译和烧录





其中,代表ESP32-CAM的HTTP Web Server的IP地址。将上位机连接到同一网段下,就可以作为Web Client访问ESP32-CAM捕获到的图片或视频流。

2.4.2 测试




CAPTURE OK 800x600 15089b
"No." "Time" "Source" "Destination" "Protocol" "Length" "Info"
"31" "3.368120" "" "" "TCP" "66" "60673 > 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM"
"32" "3.368294" "" "" "TCP" "66" "60674 > 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM"
"33" "3.539742" "" "" "TCP" "58" "80 > 60673 [SYN, ACK] Seq=0 Ack=1 Win=5744 Len=0 MSS=1436"
"34" "3.539742" "" "" "TCP" "58" "80 > 60674 [SYN, ACK] Seq=0 Ack=1 Win=5744 Len=0 MSS=1436"
"35" "3.539870" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
"36" "3.539934" "" "" "TCP" "54" "60674 > 80 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
"37" "3.540280" "" "" "HTTP" "523" "GET /cam-hi.jpg HTTP/1.1 "
"38" "3.562168" "" "" "TCP" "141" "80 > 60673 [PSH, ACK] Seq=1 Ack=470 Win=5275 Len=87 [TCP segment of a reassembled PDU]"
"39" "3.562827" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=88 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"40" "3.562853" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=470 Ack=1524 Win=64620 Len=0"
"41" "3.567484" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=1524 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"42" "3.576185" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=2960 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"43" "3.576243" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=470 Ack=4396 Win=64620 Len=0"
"44" "3.579808" "" "" "TCP" "1403" "80 > 60673 [PSH, ACK] Seq=4396 Ack=470 Win=5275 Len=1349 [TCP segment of a reassembled PDU]"
"45" "3.590868" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=5745 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"46" "3.590929" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=470 Ack=7181 Win=64620 Len=0"
"47" "3.595111" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=7181 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"48" "3.595111" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=8617 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"49" "3.595111" "" "" "TCP" "141" "80 > 60673 [PSH, ACK] Seq=10053 Ack=470 Win=5275 Len=87 [TCP segment of a reassembled PDU]"
"50" "3.595191" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=470 Ack=10140 Win=64620 Len=0"
"51" "3.607203" "" "" "TCP" "1490" "80 > 60673 [ACK] Seq=10140 Ack=470 Win=5275 Len=1436 [TCP segment of a reassembled PDU]"
"52" "3.607254" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=470 Ack=11576 Win=64620 Len=0"
"53" "3.612163" "" "" "TCP" "2926" "80 > 60673 [ACK] Seq=11576 Ack=470 Win=5275 Len=2872 [TCP segment of a reassembled PDU]"
"54" "3.612163" "" "" "HTTP" "783" "HTTP/1.1 200 OK (JPEG JFIF image)"
"55" "3.612163" "" "" "TCP" "54" "80 > 60673 [FIN, ACK] Seq=15177 Ack=470 Win=5275 Len=0"
"56" "3.612277" "" "" "TCP" "54" "60673 > 80 [ACK] Seq=470 Ack=15178 Win=64620 Len=0"
"57" "3.612606" "" "" "TCP" "54" "60673 > 80 [FIN, ACK] Seq=470 Ack=15178 Win=64620 Len=0"
"58" "3.623687" "" "" "TCP" "54" "80 > 60673 [ACK] Seq=15178 Ack=471 Win=5274 Len=0"

3 上位机端构建流程

3.1 python多线程


3.1.1 多线程之间共享全局变量



3.1.2 设置键盘监听来正常退出各个线程

在为了实现各个线程的正常退出而设计的机制中,我们引入了键盘监听的概念。通过调用 keyboard.wait() 方法,主线程在此阻塞等待用户的键盘输入。一旦用户按下 'q' 键,exit_flag 标志位被设置为 True。这一标志的变更起到通知子线程退出当前循环迭代的作用,从而引导子线程在适当的时机正常释放资源并顺利终止。这一设计巧妙地确保了在程序结束时,各个线程都得以执行清理任务,保障程序的有序退出和资源释放。



def loop1():
    '''esp32cam image'''
    global img
    while not exit_flag:
        img = ...
    print('loop1 exit normally')

def loop2():
    '''opencv processing'''
    while not exit_flag:
        ... = img.copy()
        ... = result_c.copy()
        ... = result_d.copy()
    print('loop2 exit normally')
def loop3():
    '''modelscope processing'''
    global result_c
    global result_d
    while not exit_flag:
        result_c = ...
        result_d = ...
    print('loop3 exit normally')

if __name__ == '__main__':
    '''loop1 initialization'''
    img = ...

    '''loop3 initialization'''
    result_c = ...
    result_d = ...

    # 初始化控制线程正常退出的标志位
    exit_flag = False
    # 创建线程
    t1 = threading.Thread(target=loop1)
    t2 = threading.Thread(target=loop2)
    t3 = threading.Thread(target=loop3)
    # 启动线程
    # 等待用户输入 q (阻塞等待)
    # 如果用户输入 q 则置标志位为True,告知线程退出循环
    exit_flag = True
    # 等待线程结束
    print('main thread exit normally')


KeyboardEvent(q down)
loop2 exit normally
loop1 exit normally
{'scores': array([], dtype=float32), 'labels': [], 'boxes': array([], shape=(0, 5), dtype=float32)}
loop3 exit normally
main thread exit normally

3.2 解析 MJPEG 视频流或 JPG 图像(loop1)


3.1.1 解析 MJPEG 视频流


def loop1():
    '''esp32cam image mjpeg'''
    global img
    r = requests.get(url+'cam.mjpeg', stream=True)
    if(r.status_code == 200):
        bytes = builtins.bytes()
        for chunk in r.iter_content(chunk_size=1024):
            bytes += chunk
            a = bytes.find(b'\xff\xd8')
            b = bytes.find(b'\xff\xd9')
            if a != -1 and b != -1:
                jpg = bytes[a:b+2]
                bytes = bytes[b+2:]
                img = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
            if exit_flag:
        print("Received unexpected status code {}".format(r.status_code))
    print('loop1 exit normally')

3.1.2 解析 JPG 图像


def loop1():
    '''esp32cam image jpg'''
    global img
    while not exit_flag:
        except urllib.error.URLError as e:
    print('loop1 exit normally')

3.3 OpenCV操作(loop2)


3.3.1 OpenCV显示视频


while True:	
    if ord('q')==cv2.waitKey(10):

3.3.2 cv2.putText()显示中文



def cv2AddChineseText(img, text, position, textColor=(0, 255, 0), textSize=30):
    if (isinstance(img, np.ndarray)):  # 判断是否OpenCV图片类型
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    # 创建一个可以在给定图像上绘图的对象
    draw = ImageDraw.Draw(img)
    # 字体的格式
    fontStyle = ImageFont.truetype("simsun.ttc", textSize, encoding="utf-8")
    # 绘制文本
    draw.text(position, text, textColor, font=fontStyle)
    # 转换回OpenCV格式
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

3.2.3 目标检测框可视化



  1. 目标绘制: 通过循环遍历检测结果中的目标,提取每个目标的坐标信息,并使用 OpenCV 库的 rectangle 方法在图像上绘制矩形框,表示目标的位置。

  2. 标签添加: 对于每个目标框,使用 OpenCV 的文本绘制方法,将目标的标签信息(类别标签和置信度)添加到图像上,以便在可视化中展示更多信息。

  3. 颜色处理: 通过定义颜色处理函数,为每个目标框和标签设置不同的颜色,增强可视化效果。

  4. 位置调整: 根据目标框的位置,动态调整标签的位置,以确保标签不会超出图像边界,并保持良好的可读性。

通过简单调用 vis_det_img 方法,可以在图像上直观显示目标检测的结果,有助于更深入地理解和评估算法的性能。


def loop2():
    '''opencv processing'''
    while not exit_flag:
        img_puttext = img.copy()
        result_c_puttext = result_c.copy()
        result_d_puttext = result_d.copy()
        if model_sel == 'g' or model_sel == 'd' or model_sel == 'r':
            '''image classification'''
            if model_sel == 'g':
                img_puttext = cv2AddChineseText(img_puttext, '垃圾分类', (4, 6), (0, 255, 0), 20)
            elif model_sel == 'd':
                img_puttext = cv2AddChineseText(img_puttext, '日常用品', (4, 6), (0, 255, 0), 20)
            elif model_sel == 'r':
                img_puttext = cv2AddChineseText(img_puttext, '万物识别', (4, 6), (0, 255, 0), 20)
            for i in range(len(result_c_puttext['labels'])):
                text = str(result_c_puttext['labels'][i]) + ' ' + str(result_c_puttext['scores'][i])
                img_puttext = cv2AddChineseText(img_puttext, text, (4, 26 + 20 * i), (0, 255, 0), 20)
        elif model_sel == 'o' or model_sel == 'h' or model_sel == 'f' or model_sel == 'p':
            '''object detection'''
            if model_sel == 'o':
                img_puttext = cv2AddChineseText(img_puttext, '通用目标检测', (4, 6), (0, 255, 0), 20)
            elif model_sel == 'h':
                img_puttext = cv2AddChineseText(img_puttext, '人头检测', (4, 6), (0, 255, 0), 20)
            elif model_sel == 'f':
                img_puttext = cv2AddChineseText(img_puttext, '口罩检测', (4, 6), (0, 255, 0), 20)
            elif model_sel == 'p':
                img_puttext = cv2AddChineseText(img_puttext, '手机检测', (4, 6), (0, 255, 0), 20)
                img_puttext = vis_det_img(img_puttext, result_d_puttext)
            except cv2.error:
        if ord('q')==cv2.waitKey(10):
    print('loop2 exit normally')

3.4 搭建ModelScope框架人工神经网络(loop3)




    # 导入模型和pipeline(loop3)
    '''image classification'''
    garbage_classification = pipeline(Tasks.image_classification, model='damo/cv_convnext-base_image-classification_garbage')
    dailylife_classification = pipeline(Tasks.image_classification, model='damo/cv_vit-base_image-classification_Dailylife-labels')
    general_recognition = pipeline(Tasks.general_recognition, model='damo/cv_resnest101_general_recognition')
    '''object detection'''
    object_detection = pipeline(Tasks.image_object_detection,model='damo/cv_tinynas_object-detection_damoyolo')
    head_detection = pipeline(Tasks.domain_specific_object_detection, model='damo/cv_tinynas_head-detection_damoyolo')
    facemask_detection = pipeline(Tasks.domain_specific_object_detection, model='damo/cv_tinynas_object-detection_damoyolo_facemask')
    phone_detection = pipeline(Tasks.domain_specific_object_detection, model='damo/cv_tinynas_object-detection_damoyolo_phone')

    # 初始化global result_c和global result_d变量,用于存储模型推理的结果(loop3)
    result_c = {'scores': [0,0,0,0,0], 'labels': ['','','','','']}
    result_d = {'scores': [], 'labels': [], 'boxes': []}


def loop3():
    '''modelscope processing'''
    global result_c
    global result_d
    while not exit_flag:
        if model_sel == 'g' or model_sel == 'd' or model_sel == 'r':
            '''image classification'''
            if model_sel == 'g':
                result_c = garbage_classification(img)
            elif model_sel == 'd':
                result_c = dailylife_classification(img)
            elif model_sel == 'r':
                result_c = general_recognition(img)
        elif model_sel == 'o' or model_sel == 'h' or model_sel == 'f' or model_sel == 'p':
            '''object detection'''
            if model_sel == 'o':
                result_d = object_detection(img)
            elif model_sel == 'h':
                result_d = head_detection(img)
            elif model_sel == 'f':
                result_d = facemask_detection(img)
            elif model_sel == 'p':
                result_d = phone_detection(img)
    print('loop3 exit normally')

3.5 上位机python代码运行结果演示

4 项目总结

4.1 项目的可改进之处




4.2 项目潜在应用前景


  1. 智能监控与安防:在安防领域,利用图像分类和目标检测技术,可以实现对特定目标的实时监测与识别,提高监控系统的智能化和响应速度。

  2. 农业领域:应用于农业领域,可以通过图像分类识别植物状况、害虫及病害,为农业生产提供实时的数据支持,实现智能化的农业管理。

  3. 智能交通系统:在交通管理中,通过图像分类和目标检测,可以实现对交通流量、车辆违规行为等情况的实时监测,提高交通管理的效率和安全性。

  4. 零售业:在零售领域,可以利用图像分类技术实现对商品的实时识别,为智能化的购物体验提供支持,同时通过目标检测确保零售环境的安全。

  5. 工业生产:在工业领域,通过监测生产线上的设备状态和产品质量,可以提高生产效率和产品质量,降低生产过程中的人为错误。

  6. 医疗领域:应用于医疗领域,可以通过图像分类技术辅助医生进行疾病诊断,同时通过目标检测监测医疗设备的运行状态。

  7. 环境监测:在环境保护领域,通过图像分类可以实时监测大气、水域等环境因素,通过目标检测可以监测野生动物的活动情况,为环保工作提供数据支持。


4.3 项目总结与体会




一种基于ESP32-CAM的物联网图像分类/目标检测平台 随着图像分类和目标检测技术的蓬勃发展,它们与物联网的综合应用在各个领域展现出广泛的潜在应用价值。本项目基于ESP32-CAM下位机和Python上位机,旨在创造一个多领域应用的低成本通用图像分类/目标检测物联网平台。
