`torchsight.metrics.iou` module

Module to provide methods to calculate Intersection over Union.

Source code

"""Module to provide methods to calculate Intersection over Union."""
import torch

def iou(boxes, others):
    """Calculates the Intersection over Union between the given boxes.

    Each box must have the 4 values as x1, y1 (top left corner), x2, y2 (bottom
    right corner).

    Arguments:
        boxes (torch.Tensor): First group of boxes.
            Shape:
                (number of boxes in group 1, 4)
        others (torch.Tensor): Second group of boxes.
            Shape:
                (number of boxes in group 2, 4)

    Returns:
        torch.Tensor: The IoU between all the boxes in group 1 versus group 2.
            Shape:
                (number of boxes in group 1, number of boxes in group 2)

    How this work?

    Let's say there are 'd' boxes in group 1 and 'b' boxes in group 2.

    The beautiful idea behind this function is to unsqueeze the group 1 (add one dimension) to change the
    shape from (d) to (d, 1) and with the boxes with shape (b) the operation broadcast
    to finally get shape (d, b).

    Broadcast:
    PyTorch: https://pytorch.org/docs/stable/notes/broadcasting.html
    Numpy: https://docs.scipy.org/doc/numpy-1.10.4/user/basics.broadcasting.html

    The broadcast always goes from the last dimension to the first, so we have (b) with (d, 1), that is
    to look the tensors as:

    (d, 1)
    (   b)
    ------
    (d, b)

    So the final result is a tensor with shape (d, b).

    How can we read this broadcasting?

    The tensor with shape (d, 1) has "d rows with 1 column", and the vector with shape (b) has "b columns",
    this is the correct way to read the shapes. That's why if you print a tensor with shape (b) you get
    a vector as a row. And if you print a tensor with shape (d, 1) is like a matrix with d rows
    and 1 column. Always we start reading the tensors from the last element of its shape and in the order
    of "columns -> rows -> channels -> batches".

    So the first tensor with shape (d, 1) to match the (d, b) shape must repeat its single element of each
    row to match the column size. Ex:
    [[0],       [[0, 0, 0],
     [1],  -->   [1, 1, 1],
     [2]]        [2, 2, 2]]

    And the second tensor with shape (b) to match the (d, b) shape must repeat the values for each column
    to match the d rows. Ex:
    [3, 4, 5, 6]  --> [[3, 4, 5, 6],
                       [3, 4, 5, 6],
                       [3, 4, 5, 6]]

    Keep in mind this trick.
    Check how these "vectors" broadcast to the (d, b) matrix but with different ways. One repeat the column
    and the other repeat the rows.

    And now we can make calculations all vs all between the d and the b boxes without
    the need of a for loop. That's beautiful and efficient.

    Also, notice that the images are (channels, height, width), totally consequent with this vision on how
    to read the shapes.
    """
    # Shape (number of boxes)
    boxes_areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    # Shape (number of others)
    others_areas = (others[:, 2] - others[:, 0]) * (others[:, 3] - others[:, 1])

    # Shape (number of boxes, number of others)
    intersection_x1 = torch.max(boxes[:, 0].unsqueeze(dim=1), others[:, 0])
    intersection_y1 = torch.max(boxes[:, 1].unsqueeze(dim=1), others[:, 1])
    intersection_x2 = torch.min(boxes[:, 2].unsqueeze(dim=1), others[:, 2])
    intersection_y2 = torch.min(boxes[:, 3].unsqueeze(dim=1), others[:, 3])

    intersection_width = torch.clamp(intersection_x2 - intersection_x1, min=0)
    intersection_height = torch.clamp(intersection_y2 - intersection_y1, min=0)
    intersection_area = intersection_width * intersection_height

    union_area = torch.unsqueeze(boxes_areas, dim=1) + others_areas - intersection_area
    union_area = torch.clamp(union_area, min=1e-8)

    return intersection_area / union_area

Functions

def iou(boxes, others)

Calculates the Intersection over Union between the given boxes.

Each box must have the 4 values as x1, y1 (top left corner), x2, y2 (bottom right corner).

Arguments

boxes : torch.Tensor: First group of boxes. Shape: (number of boxes in group 1, 4)
others : torch.Tensor: Second group of boxes. Shape: (number of boxes in group 2, 4)

Returns

torch.Tensor: The IoU between all the boxes in group 1 versus group 2. Shape: (number of boxes in group 1, number of boxes in group 2) How this work?

Let's say there are 'd' boxes in group 1 and 'b' boxes in group 2.

The beautiful idea behind this function is to unsqueeze the group 1 (add one dimension) to change the shape from (d) to (d, 1) and with the boxes with shape (b) the operation broadcast to finally get shape (d, b).

Broadcast:
PyTorch : <https://pytorch.org/docs/stable/notes/broadcasting.html>
Numpy : <https://docs.scipy.org/doc/numpy-1.10.4/user/basics.broadcasting.html>

The broadcast always goes from the last dimension to the first, so we have (b) with (d, 1), that is to look the tensors as:

(d, 1) ( b)

(d, b)

So the final result is a tensor with shape (d, b).

How can we read this broadcasting?

The tensor with shape (d, 1) has "d rows with 1 column", and the vector with shape (b) has "b columns", this is the correct way to read the shapes. That's why if you print a tensor with shape (b) you get a vector as a row. And if you print a tensor with shape (d, 1) is like a matrix with d rows and 1 column. Always we start reading the tensors from the last element of its shape and in the order of "columns -> rows -> channels -> batches".

So the first tensor with shape (d, 1) to match the (d, b) shape must repeat its single element of each row to match the column size. Ex: [[0], [[0, 0, 0], [1], –> [1, 1, 1], [2]] [2, 2, 2]]

And the second tensor with shape (b) to match the (d, b) shape must repeat the values for each column to match the d rows. Ex: [3, 4, 5, 6] –> [[3, 4, 5, 6], [3, 4, 5, 6], [3, 4, 5, 6]]

Keep in mind this trick. Check how these "vectors" broadcast to the (d, b) matrix but with different ways. One repeat the column and the other repeat the rows.

And now we can make calculations all vs all between the d and the b boxes without the need of a for loop. That's beautiful and efficient.

Also, notice that the images are (channels, height, width), totally consequent with this vision on how to read the shapes.

Source code

def iou(boxes, others):
    """Calculates the Intersection over Union between the given boxes.

    Each box must have the 4 values as x1, y1 (top left corner), x2, y2 (bottom
    right corner).

    Arguments:
        boxes (torch.Tensor): First group of boxes.
            Shape:
                (number of boxes in group 1, 4)
        others (torch.Tensor): Second group of boxes.
            Shape:
                (number of boxes in group 2, 4)

    Returns:
        torch.Tensor: The IoU between all the boxes in group 1 versus group 2.
            Shape:
                (number of boxes in group 1, number of boxes in group 2)

    How this work?

    Let's say there are 'd' boxes in group 1 and 'b' boxes in group 2.

    The beautiful idea behind this function is to unsqueeze the group 1 (add one dimension) to change the
    shape from (d) to (d, 1) and with the boxes with shape (b) the operation broadcast
    to finally get shape (d, b).

    Broadcast:
    PyTorch: https://pytorch.org/docs/stable/notes/broadcasting.html
    Numpy: https://docs.scipy.org/doc/numpy-1.10.4/user/basics.broadcasting.html

    The broadcast always goes from the last dimension to the first, so we have (b) with (d, 1), that is
    to look the tensors as:

    (d, 1)
    (   b)
    ------
    (d, b)

    So the final result is a tensor with shape (d, b).

    How can we read this broadcasting?

    The tensor with shape (d, 1) has "d rows with 1 column", and the vector with shape (b) has "b columns",
    this is the correct way to read the shapes. That's why if you print a tensor with shape (b) you get
    a vector as a row. And if you print a tensor with shape (d, 1) is like a matrix with d rows
    and 1 column. Always we start reading the tensors from the last element of its shape and in the order
    of "columns -> rows -> channels -> batches".

    So the first tensor with shape (d, 1) to match the (d, b) shape must repeat its single element of each
    row to match the column size. Ex:
    [[0],       [[0, 0, 0],
     [1],  -->   [1, 1, 1],
     [2]]        [2, 2, 2]]

    And the second tensor with shape (b) to match the (d, b) shape must repeat the values for each column
    to match the d rows. Ex:
    [3, 4, 5, 6]  --> [[3, 4, 5, 6],
                       [3, 4, 5, 6],
                       [3, 4, 5, 6]]

    Keep in mind this trick.
    Check how these "vectors" broadcast to the (d, b) matrix but with different ways. One repeat the column
    and the other repeat the rows.

    And now we can make calculations all vs all between the d and the b boxes without
    the need of a for loop. That's beautiful and efficient.

    Also, notice that the images are (channels, height, width), totally consequent with this vision on how
    to read the shapes.
    """
    # Shape (number of boxes)
    boxes_areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    # Shape (number of others)
    others_areas = (others[:, 2] - others[:, 0]) * (others[:, 3] - others[:, 1])

    # Shape (number of boxes, number of others)
    intersection_x1 = torch.max(boxes[:, 0].unsqueeze(dim=1), others[:, 0])
    intersection_y1 = torch.max(boxes[:, 1].unsqueeze(dim=1), others[:, 1])
    intersection_x2 = torch.min(boxes[:, 2].unsqueeze(dim=1), others[:, 2])
    intersection_y2 = torch.min(boxes[:, 3].unsqueeze(dim=1), others[:, 3])

    intersection_width = torch.clamp(intersection_x2 - intersection_x1, min=0)
    intersection_height = torch.clamp(intersection_y2 - intersection_y1, min=0)
    intersection_area = intersection_width * intersection_height

    union_area = torch.unsqueeze(boxes_areas, dim=1) + others_areas - intersection_area
    union_area = torch.clamp(union_area, min=1e-8)

    return intersection_area / union_area