当前位置：首页 > news >正文

《Keras 2 ：使用 RetinaNet 进行对象检测》：此文为AI自动翻译

news 2025/7/6 11:50:48

《Keras 2 ：使用 RetinaNet 进行对象检测》

作者：Srihari Humbarwadi
创建日期：2020/05/17
最后修改日期：2023/07/10
描述：实施 RetinaNet：用于密集对象检测的焦点损失。

（i）此示例使用 Keras 2

在 Colab 中查看 •

介绍

目标检测是计算机中非常重要的问题视觉。在这里，模型的任务是定位图像，同时将它们分为不同的类别。对象检测模型大致可分为“单阶段”和 “两级”探测器。两级检测器通常更准确，但在变慢的代价。在此示例中，我们将实现 RetinaNet，一种流行的单级检测器，准确且运行速度快。 RetinaNet 使用特征金字塔网络来有效地检测多个尺度，并引入了一种新的损失，即 Focal loss 函数，以减轻极端的前景-背景阶级不平衡问题。

引用：

RetinaNet 纸
特征金字塔网络论文

import os
import re
import zipfileimport numpy as np
import tensorflow as tf
from tensorflow import kerasimport matplotlib.pyplot as plt
import tensorflow_datasets as tfds

下载 COCO2017 数据集

对包含大约 118k 张图像的整个 COCO2017 数据集进行训练需要一个），因此我们将使用 ~500 张图像的较小子集 trained 在此示例中。

url = "https://github.com/srihari-humbarwadi/datasets/releases/download/v0.1.0/data.zip"
filename = os.path.join(os.getcwd(), "data.zip")
keras.utils.get_file(filename, url)with zipfile.ZipFile("data.zip", "r") as z_fp:z_fp.extractall("./")

Downloading data from https://github.com/srihari-humbarwadi/datasets/releases/download/v0.1.0/data.zip 560529408/560525318 [==============================] - 7s 0us/step 560537600/560525318 [==============================] - 7s 0us/step

实现实用程序函数

边界框可以用多种方式表示，最常见的格式是：

存储角的坐标[xmin, ymin, xmax, ymax]
存储中心和框尺寸的坐标[x, y, width, height]

由于我们需要这两种格式，因此我们将实现用于转换在格式之间。

def swap_xy(boxes):"""Swaps order the of x and y coordinates of the boxes.    Arguments:
      boxes: A tensor with shape `(num_boxes, 4)` representing bounding boxes.    Returns:
      swapped boxes with shape same as that of boxes.
    """return tf.stack([boxes[:, 1], boxes[:, 0], boxes[:, 3], boxes[:, 2]], axis=-1)def convert_to_xywh(boxes):"""Changes the box format to center, width and height.    Arguments:
      boxes: A tensor of rank 2 or higher with a shape of `(..., num_boxes, 4)`
        representing bounding boxes where each box is of the format
        `[xmin, ymin, xmax, ymax]`.    Returns:
      converted boxes with shape same as that of boxes.
    """return tf.concat([(boxes[..., :2] + boxes[..., 2:]) / 2.0, boxes[..., 2:] - boxes[..., :2]],axis=-1,)def convert_to_corners(boxes):"""Changes the box format to corner coordinates    Arguments:
      boxes: A tensor of rank 2 or higher with a shape of `(..., num_boxes, 4)`
        representing bounding boxes where each box is of the format
        `[x, y, width, height]`.    Returns:
      converted boxes with shape same as that of boxes.
    """return tf.concat([boxes[..., :2] - boxes[..., 2:] / 2.0, boxes[..., :2] + boxes[..., 2:] / 2.0],axis=-1,)

计算成对交并集（IOU）

正如我们将在示例后面看到的那样，我们将分配真值框以根据重叠范围锚定框。这将要求我们计算所有锚点之间的交并比（IOU）框和真实框对。

def compute_iou(boxes1, boxes2):"""Computes pairwise IOU matrix for given two sets of boxes    Arguments:
      boxes1: A tensor with shape `(N, 4)` representing bounding boxes
        where each box is of the format `[x, y, width, height]`.
        boxes2: A tensor with shape `(M, 4)` representing bounding boxes
        where each box is of the format `[x, y, width, height]`.    Returns:
      pairwise IOU matrix with shape `(N, M)`, where the value at ith row
        jth column holds the IOU between ith box and jth box from
        boxes1 and boxes2 respectively.
    """boxes1_corners = convert_to_corners(boxes1)boxes2_corners = convert_to_corners(boxes2)lu = tf.maximum(boxes1_corners[:, None, :2], boxes2_corners[:, :2])rd = tf.minimum(boxes1_corners[:, None, 2:], boxes2_corners[:, 2:])intersection = tf.maximum(0.0, rd - lu)intersection_area = intersection[:, :, 0] * intersection[:, :, 1]boxes1_area = boxes1[:, 2] * boxes1[:, 3]boxes2_area = boxes2[:, 2] * boxes2[:, 3]union_area = tf.maximum(boxes1_area[:, None] + boxes2_area - intersection_area, 1e-8)return tf.clip_by_value(intersection_area / union_area, 0.0, 1.0)def visualize_detections(image, boxes, classes, scores, figsize=(7, 7), linewidth=1, color=[0, 0, 1]
):"""Visualize Detections"""image = np.array(image, dtype=np.uint8)plt.figure(figsize=figsize)plt.axis("off")plt.imshow(image)ax = plt.gca()for box, _cls, score in zip(boxes, classes, scores):text = "{}: {:.2f}".format(_cls, score)x1, y1, x2, y2 = boxw, h = x2 - x1, y2 - y1patch = plt.Rectangle([x1, y1], w, h, fill=False, edgecolor=color, linewidth=linewidth)ax.add_patch(patch)ax.text(x1,y1,text,bbox={"facecolor": color, "alpha": 0.4},clip_box=ax.clipbox,clip_on=True,)plt.show()return ax

实现 Anchor 生成器

锚框是模型用于预测边界的固定大小的框对象的框。它通过回归位置对象的中心和锚框的中心，然后使用宽度和锚点框的高度来预测对象的相对比例。在在 RetinaNet 的情况下，给定特征图上的每个位置都有 9 个锚框（三个比例和三个比率）。

class AnchorBox:"""Generates anchor boxes.    This class has operations to generate anchor boxes for feature maps at
    strides `[8, 16, 32, 64, 128]`. Where each anchor each box is of the
    format `[x, y, width, height]`.    Attributes:
      aspect_ratios: A list of float values representing the aspect ratios of
        the anchor boxes at each location on the feature map
      scales: A list of float values representing the scale of the anchor boxes
        at each location on the feature map.
      num_anchors: The number of anchor boxes at each location on feature map
      areas: A list of float values representing the areas of the anchor
        boxes for each feature map in the feature pyramid.
      strides: A list of float value representing the strides for each feature
        map in the feature pyramid.
    """def __init__(self):self.aspect_ratios = [0.5, 1.0, 2.0]self.scales = [2 ** x for x in [0, 1 / 3, 2 / 3]]self._num_anchors = len(self.aspect_ratios) * len(self.scales)self._strides = [2 ** i for i in range(3, 8)]self._areas = [x ** 2 for x in [32.0, 64.0, 128.0, 256.0, 512.0]]self._anchor_dims = self._compute_dims()def _compute_dims(self):"""Computes anchor box dimensions for all ratios and scales at all levels
        of the feature pyramid.
        """anchor_dims_all = []for area in self._areas:anchor_dims = []for ratio in self.aspect_ratios:anchor_height = tf.math.sqrt(area / ratio)anchor_width = area / anchor_heightdims = tf.reshape(tf.stack([anchor_width, anchor_height], axis=-1), [1, 1, 2])for scale in self.scales:anchor_dims.append(scale * dims)anchor_dims_all.append(tf.stack(anchor_dims, axis=-2))return anchor_dims_alldef _get_anchors(