【计算机视觉】24-Object Detection

news/2024/7/10 0:30:23 标签: 计算机视觉, 目标检测, 人工智能

文章目录

  • 24-Object Detection
    • 1. Introduction
    • 2. Methods
      • 2.1 Sliding Window
      • 2.2 R-CNN: Region-Based CNN
      • 2.3 Fast R-CNN
      • 2.4 Faster R-CNN: Learnable Region Proposals
      • 2.5 Results of objects detection
    • 3. Summary
    • Reference

24-Object Detection

1. Introduction

  1. Task Definition

    Input: Single RGB Image

    Output: A set of detected objects;

    For each object predict:

    • Category label (from fixed, known set of categories)

    • Bounding box(four numbers: x, y, width, height)

  2. Challenges

    • Multiple outputs: Need to output variable numbers of objects per image
    • Multiple types of output: Need to predict ”what” (category label) as well as “where” (bounding box)
    • Large images: Classification works at 224x224; need higher resolution for detection, often ~800x600
  3. Detecting a single object

    image-20231120145632741

    With two branches, outputting label, and box

    Problem: Images can have more than one object! And if we use multiple single object detection, it will decrease the efficiency.

2. Methods

2.1 Sliding Window

Apply a CNN to many different crops of the image, CNN classifies each crop as an object or background:

image-20231120150748738

Problem: Need too many calculations

  • Consider an image of size H*W and a box of size h*w
  • Total possible boxes: ∑ h = 1 H ∑ w = 1 W ( W − w + 1 ) ( H − h + 1 ) = H ( H + 1 ) 2 W ( W + 1 ) 2 \sum_{h=1}^{H}\sum_{w=1}^{W}(W-w+1)(H-h+1)=\frac{H(H+1)}{2}\frac{W(W+1)}{2} h=1Hw=1W(Ww+1)(Hh+1)=2H(H+1)2W(W+1)
  • 800 x 600 image has ~58M boxes! No way we can evaluate them all.

2.2 R-CNN: Region-Based CNN

  1. Region Proposals(Selective Search)

    Selective Search is a region proposal algorithm used in object detection. It is based on computing hierarchical grouping of similar regions based on color, texture, size and shape compatibility.

    Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher.

    image-20231120213007261

    Selective Search algorithm takes these oversegments as initial input and performs the following steps

    1. Add all bounding boxes corresponding to segmented parts to the list of regional proposals
    2. Group adjacent segments based on similarity
    3. Go to step 1

    At each iteration, larger segments are formed and added to the list of region proposals. Hence we create region proposals from smaller segments to larger segments in a bottom-up approach.

    As for the calculation of similarity measures based on color, texture, size and shape compatibility, please refer to Selective Search for Object Detection (C++ / Python) | LearnOpenCV

  2. Architecture of the network

    image-20231120214110598

    On two thousand selected regions, we narrow them down to the size required for classification, and after passing through the convolutional network, we output the category along with the box offset

  3. Steps

    1. Run region proposal method to compute ~2000 region proposals
    2. Resize each region to 224x224 and run independently through CNN to predict class scores and bbox transform
    3. Use scores to select a subset of region proposals to output (Many choices here: threshold on background, or per-category? Or take top K proposals per image?)
    4. Compare with ground-truth boxes
  4. Details(Focus on step3 and 4)

    1. Intersection over Union (IoU)
      I o U = Area of Intersection Area of Union IoU=\frac{\color{yellow}{\text{Area of Intersection}}}{\color{purple}{\text{Area of Union}}} IoU=Area of UnionArea of Intersection
      在这里插入图片描述

    2. Non-Max Suppression (NMS)

      • Select next highest-scoring box

      • Eliminate lower-scoring boxes(Comparing the highest-scoring box to all the others ) with IoU > threshold (e.g. 0.7)

      • If any boxes remain, GOTO 1

      Problem: NMS may eliminate ”good” boxes when objects are highly overlapping:

在这里插入图片描述

  1. Mean Average Precision (mAP)

    Use the gif to understand it(but I only have the final image):

在这里插入图片描述 For example, the mAP in COCO dataset is 0.4.

  1. Problem: Very slow! Need to do ~2k forward passes for each image!

    Solution: Run CNN before warping!

2.3 Fast R-CNN

  1. Architecture:

    image-20231120151757798
    • Most of the computation happens in the backbone network; this saves work for overlapping region proposals

    • Per-Region network is relatively lightweight

  2. The concrete architecture in Alexnet and Resnet:

    image-20231120152141617 image-20231120152156583
  3. Details:

    How to crop features?

    image-20231120222841764

    In this process, there are two errors:

    img

    如下图,假设输入图像经过一系列卷积层下采样32倍后输出的特征图大小为8x8,现有一 RoI 的左上角和右下角坐标(x, y 形式)分别为(0, 100) 和 (198, 224),映射至特征图上后坐标变为(0, 100 / 32)和(198 / 32,224 / 32),由于像素点是离散的,因此向下取整后最终坐标为(0, 3)和(6, 7),这里产生了第一次量化误差。

    假设最终需要将 RoI 变为固定的2x2大小,那么将 RoI 平均划分为2x2个区域,每个区域长宽分别为 (6 - 0 + 1) / 2 和 (7 - 3 + 1) / 2 即 3.5 和 2.5,同样,由于像素点是离散的,因此有些区域的长取3,另一些取4,而有些区域的宽取2,另一些取3,这里产生了第二次量化误差。

  4. RoI Align in Mask R-CNN

在这里插入图片描述

Notice: RoI Align needs to set a hyperparameter to represent the number of sampling points in each region, which is usually 4.

  1. Speed

    It has an enormous increase from R-CNN. But we can find that region proposals costs lots of time.

2.4 Faster R-CNN: Learnable Region Proposals

  1. Architecture:

    Insert Region Proposal Network (RPN) to predict proposals from feature
    在这里插入图片描述

  2. Details:

在这里插入图片描述

At each point, predict whether the corresponding anchor contains an object. And we use logistic regression to express the error. predict scores with conv layer

  1. Evaluation

在这里插入图片描述

  1. Improvement

    Faster R-CNN is a Two-stage object detector:

    But we want to design the structure of end to end, eliminating the second stage. So we change the function of region proposal network to predict the class label.
    在这里插入图片描述

2.5 Results of objects detection

在这里插入图片描述

  • Two-stage method (Faster R-CNN) gets the best accuracy but are slower.
  • Single-stage methods (SSD) are much faster but don’t perform as well
  • Bigger backbones improve performance, but are slower
  • Diminishing returns for slower methods

在这里插入图片描述

These results are a few years old …since then GPUs have gotten faster, and we’ve improved performance with many tricks:

  • Train longer!
  • Multiscale backbone: Feature
    Pyramid Networks
  • Better backbone: ResNeXt
  • Single-Stage methods have improved
  • Very big models work better
  • Test-time augmentation pushes
    numbers up
  • Big ensembles, more data, etc

3. Summary

Reference

[1] RoI Pooling 系列方法介绍(文末附源码) - 知乎 (zhihu.com)

[2] Selective Search for Object Detection (C++ / Python) | LearnOpenCV


http://www.niftyadmin.cn/n/5197998.html

相关文章

NAS层协议栈学习笔记

NAS(Non-Access Stratum)是无线网络中非接入层及包括移动性管理(MM)和会话管理(SM)协议 ,在5G(NR)系统中连接管理(Connection Management)用于建立和释放UE与AMF之间的控制面(CP)信令连接。 5G中移动性管理是通过NAS信令在UE与核心网之间进行交互的,连接…

给大伙讲个笑话:阿里云服务器开了安全组防火墙还是无法访问到服务

铺垫: 某天我在阿里云上买了一个服务器,买完我就通过MobaXterm进行了ssh(这个软件是会保存登录信息的) 故事开始: 过了n天之后我想用这个服务器来部署流媒体服务,咔咔两下就部署好了流媒体服务器&#x…

手把手从零开始训练YOLOv8改进项目(官方ultralytics版本)教程

手把手从零开始训练 YOLOv8 改进项目 (Ultralytics版本) 教程,改进 YOLOv8 算法 本文以Windows服务器为例:从零开始使用Windows训练 YOLOv8 算法项目 《芒果 YOLOv8 目标检测算法 改进》 适用于芒果专栏改进 YOLOv8 算法 文章目录 官方 YOLOv8 算法介绍改进网络代码汇总第…

flink源码分析之功能组件(一)-metrics

简介 本系列是flink源码分析的第二个系列,上一个《flink源码分析之集群与资源》分析集群与资源,本系列分析功能组件,kubeclient,rpc,心跳,高可用,slotpool,rest,metric,future。其中kubeclient上一个系列介绍过,本系列不在介绍。 本文介绍flink metrics组件,metric…

睡前随笔1

这个世界上为什么会有自卑内向一词,难道是因为大家潜意识里的弱肉强食吗? 在一个恶劣的环境中,在一个资源有限的环境中,人们必定会弱肉强食,抢占资源。 这个世界的一切有哪些活动是你喜欢的呢?我们处在一…

为什么 Django 后台管理系统那么“丑”?

哈喽大家好,我是咸鱼 相信使用过 Django 的小伙伴都知道 Django 有一个默认的后台管理系统——Django Admin 它的 UI 很多年都没有发生过变化,现在看来显得有些“过时且简陋” 那为什么 Django 的维护者却不去优化一下呢?原文作者去询问了多…

四旋翼无人机的飞行原理--【其利天下分享】

近年来,无人机在多领域的便捷应用促使其迅猛的发展,如近年来的多场战争,无人机的战场运用发挥得淋漓尽致。 下面我们针对生活中常见的四旋翼无人机的飞行原理做个基础的介绍,以飨各位对无人机有兴趣的朋友。 一:四旋翼…

通过OEKO如何申请亚马逊气候友好绿标

如果您的纺织品已通过OEKO认证,您可以按照以下步骤申请亚马逊气候友好标志(Climate Pledge Friendly): 创建亚马逊卖家账户:首先,您需要成为亚马逊的注册卖家。如果您已经是亚马逊卖家,请跳过此…