当前位置: 首页 > news >正文

《Keras 2 :使用 RetinaNet 进行对象检测》:此文为AI自动翻译

《Keras 2 :使用 RetinaNet 进行对象检测》

作者:Srihari Humbarwadi
创建日期:2020/05/17
最后修改日期:2023/07/10
描述:实施 RetinaNet:用于密集对象检测的焦点损失。

(i) 此示例使用 Keras 2

 在 Colab 中查看 •


介绍

目标检测是计算机中非常重要的问题 视觉。在这里,模型的任务是定位 图像,同时将它们分为不同的类别。 对象检测模型大致可分为“单阶段”和 “两级”探测器。两级检测器通常更准确,但在 变慢的代价。在此示例中,我们将实现 RetinaNet, 一种流行的单级检测器,准确且运行速度快。 RetinaNet 使用特征金字塔网络来有效地检测 多个尺度,并引入了一种新的损失,即 Focal loss 函数,以减轻 极端的前景-背景阶级不平衡问题。

引用:

  • RetinaNet 纸
  • 特征金字塔网络论文
import os
import re
import zipfileimport numpy as np
import tensorflow as tf
from tensorflow import kerasimport matplotlib.pyplot as plt
import tensorflow_datasets as tfds

下载 COCO2017 数据集

对包含大约 118k 张图像的整个 COCO2017 数据集进行训练需要一个 ),因此我们将使用 ~500 张图像的较小子集 trained 在此示例中。

url = "https://github.com/srihari-humbarwadi/datasets/releases/download/v0.1.0/data.zip"
filename = os.path.join(os.getcwd(), "data.zip")
keras.utils.get_file(filename, url)with zipfile.ZipFile("data.zip", "r") as z_fp:z_fp.extractall("./")
Downloading data from https://github.com/srihari-humbarwadi/datasets/releases/download/v0.1.0/data.zip 560529408/560525318 [==============================] - 7s 0us/step 560537600/560525318 [==============================] - 7s 0us/step 

实现实用程序函数

边界框可以用多种方式表示,最常见的格式是:

  • 存储角的坐标[xmin, ymin, xmax, ymax]
  • 存储中心和框尺寸的坐标[x, y, width, height]

由于我们需要这两种格式,因此我们将实现用于转换 在格式之间。

def swap_xy(boxes):"""Swaps order the of x and y coordinates of the boxes.    Arguments:
      boxes: A tensor with shape `(num_boxes, 4)` representing bounding boxes.    Returns:
      swapped boxes with shape same as that of boxes.
    """return tf.stack([boxes[:, 1], boxes[:, 0], boxes[:, 3], boxes[:, 2]], axis=-1)def convert_to_xywh(boxes):"""Changes the box format to center, width and height.    Arguments:
      boxes: A tensor of rank 2 or higher with a shape of `(..., num_boxes, 4)`
        representing bounding boxes where each box is of the format
        `[xmin, ymin, xmax, ymax]`.    Returns:
      converted boxes with shape same as that of boxes.
    """return tf.concat([(boxes[..., :2] + boxes[..., 2:]) / 2.0, boxes[..., 2:] - boxes[..., :2]],axis=-1,)def convert_to_corners(boxes):"""Changes the box format to corner coordinates    Arguments:
      boxes: A tensor of rank 2 or higher with a shape of `(..., num_boxes, 4)`
        representing bounding boxes where each box is of the format
        `[x, y, width, height]`.    Returns:
      converted boxes with shape same as that of boxes.
    """return tf.concat([boxes[..., :2] - boxes[..., 2:] / 2.0, boxes[..., :2] + boxes[..., 2:] / 2.0],axis=-1,)

计算成对交并集 (IOU)

正如我们将在示例后面看到的那样,我们将分配真值框 以根据重叠范围锚定框。这将要求我们 计算所有锚点之间的交并比 (IOU) 框和真实框对。

def compute_iou(boxes1, boxes2):"""Computes pairwise IOU matrix for given two sets of boxes    Arguments:
      boxes1: A tensor with shape `(N, 4)` representing bounding boxes
        where each box is of the format `[x, y, width, height]`.
        boxes2: A tensor with shape `(M, 4)` representing bounding boxes
        where each box is of the format `[x, y, width, height]`.    Returns:
      pairwise IOU matrix with shape `(N, M)`, where the value at ith row
        jth column holds the IOU between ith box and jth box from
        boxes1 and boxes2 respectively.
    """boxes1_corners = convert_to_corners(boxes1)boxes2_corners = convert_to_corners(boxes2)lu = tf.maximum(boxes1_corners[:, None, :2], boxes2_corners[:, :2])rd = tf.minimum(boxes1_corners[:, None, 2:], boxes2_corners[:, 2:])intersection = tf.maximum(0.0, rd - lu)intersection_area = intersection[:, :, 0] * intersection[:, :, 1]boxes1_area = boxes1[:, 2] * boxes1[:, 3]boxes2_area = boxes2[:, 2] * boxes2[:, 3]union_area = tf.maximum(boxes1_area[:, None] + boxes2_area - intersection_area, 1e-8)return tf.clip_by_value(intersection_area / union_area, 0.0, 1.0)def visualize_detections(image, boxes, classes, scores, figsize=(7, 7), linewidth=1, color=[0, 0, 1]
):"""Visualize Detections"""image = np.array(image, dtype=np.uint8)plt.figure(figsize=figsize)plt.axis("off")plt.imshow(image)ax = plt.gca()for box, _cls, score in zip(boxes, classes, scores):text = "{}: {:.2f}".format(_cls, score)x1, y1, x2, y2 = boxw, h = x2 - x1, y2 - y1patch = plt.Rectangle([x1, y1], w, h, fill=False, edgecolor=color, linewidth=linewidth)ax.add_patch(patch)ax.text(x1,y1,text,bbox={"facecolor": color, "alpha": 0.4},clip_box=ax.clipbox,clip_on=True,)plt.show()return ax

实现 Anchor 生成器

锚框是模型用于预测边界的固定大小的框 对象的框。它通过回归位置 对象的中心和锚框的中心,然后使用宽度 和锚点框的高度来预测对象的相对比例。在 在 RetinaNet 的情况下,给定特征图上的每个位置都有 9 个锚框 (三个比例和三个比率)。

class AnchorBox:"""Generates anchor boxes.    This class has operations to generate anchor boxes for feature maps at
    strides `[8, 16, 32, 64, 128]`. Where each anchor each box is of the
    format `[x, y, width, height]`.    Attributes:
      aspect_ratios: A list of float values representing the aspect ratios of
        the anchor boxes at each location on the feature map
      scales: A list of float values representing the scale of the anchor boxes
        at each location on the feature map.
      num_anchors: The number of anchor boxes at each location on feature map
      areas: A list of float values representing the areas of the anchor
        boxes for each feature map in the feature pyramid.
      strides: A list of float value representing the strides for each feature
        map in the feature pyramid.
    """def __init__(self):self.aspect_ratios = [0.5, 1.0, 2.0]self.scales = [2 ** x for x in [0, 1 / 3, 2 / 3]]self._num_anchors = len(self.aspect_ratios) * len(self.scales)self._strides = [2 ** i for i in range(3, 8)]self._areas = [x ** 2 for x in [32.0, 64.0, 128.0, 256.0, 512.0]]self._anchor_dims = self._compute_dims()def _compute_dims(self):"""Computes anchor box dimensions for all ratios and scales at all levels
        of the feature pyramid.
        """anchor_dims_all = []for area in self._areas:anchor_dims = []for ratio in self.aspect_ratios:anchor_height = tf.math.sqrt(area / ratio)anchor_width = area / anchor_heightdims = tf.reshape(tf.stack([anchor_width, anchor_height], axis=-1), [1, 1, 2])for scale in self.scales:anchor_dims.append(scale * dims)anchor_dims_all.append(tf.stack(anchor_dims, axis=-2))return anchor_dims_alldef _get_anchors(

http://www.mrgr.cn/news/91860.html

相关文章:

  • Helix——Figure 02发布通用人形机器人控制的VLA:一组神经网络权重下的快与慢双系统,让两个机器人协作干活
  • qt5实现表盘的旋转效果,通过提升QLabel类
  • go 并发 gorouting chan channel select Mutex sync.One
  • 【OS安装与使用】part6-ubuntu 22.04+CUDA 12.4运行MARL算法(多智能体强化学习)
  • DDD架构实战:用Java实现一个电商订单系统,快速掌握领域驱动设计
  • 一文详解U盘启动Legacy/UEFI方式以及GPT/MBR关系
  • 【工具篇】【深度解析 DeepAI 工具:开启 AI 应用新体验】
  • RNN中远距离时间步梯度消失问题及解决办法
  • Linux----线程
  • 《Keras 3 :使用 Vision Transformers 进行物体检测》:此文为AI自动翻译
  • 《Keras 3 : 使用迁移学习进行关键点检测》:此文为AI自动翻译
  • IO模型与NIO基础--NIO网络传输选择器--字符编码
  • 代码随想录算法训练营第四十五天| 动态规划08
  • JavaScript变量的作用域介绍
  • nodejs运行的坎坷之路
  • 《论系统需求分析方法》写作心得 - 系统分析师
  • 业务流程中的流程管理
  • STL —— 洛谷字符串(string库)入门题(蓝桥杯题目训练)(二)
  • 企业组网IP规划与先关协议分析
  • 第一个CMAKE项目hello cmake