torchsight.models.resnet module

Module that contains ResNet implementation.

ResNet could contains several depths. The paper includes 5 different models and the model_zoo contains pretrained weights for each one:

  • ResNet 18
  • ResNet 34
  • ResNet 50
  • ResNet 101
  • ResNet 152

Each one of this different architectures is based on "blocks" that help to reduce complexity of the network. There are two type of blocks:

  • Basic: Only applies two 3x3 convolutions. Used in the 18 and 34 architectures.
  • Bottleneck: Applies a 1x1 convolution to reduce the channel size of the feature map to a 1/4 (i.e. a feature map with 512 channels is reduced to 128 channels), then applies a 3x3 convolution with this reduced channels and finally increase the channel dimensions again to the original size using a 1x1 convolution. This help to reduce the weights to learn and the complexity of the network.

After each convolution it applies a batch normalization and after each block applies a "Residual connection" that implies to sum the input of the block to the output of it. This is helpful to learn identity maps, because if F(x) is the output of the block the final output is F(x) + x, so if the weights went to zero, the output of the block is only x. The hypothesis of the authors were that is more easy to learn zero weights than learning 1 weights.

Finally, an architecture is composed by several "layers" that contains several "blocks". All architectures has 5 layers, the difference is the kind of block and how many of them are used in each one. The first layer is only a single convolutional layer with kernel 7x7 with stride 2, so the layers that contains blocks are only 4.

To see the architectures in detail you can go to the Table 1 in the original paper.

Original paper: https://arxiv.org/pdf/1512.03385.pdf

Heavily inspired by the original code at: https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py

Source code
"""Module that contains ResNet implementation.

ResNet could contains several depths. The paper includes 5 different models
and the model_zoo contains pretrained weights for each one:

- ResNet 18
- ResNet 34
- ResNet 50
- ResNet 101
- ResNet 152

Each one of this different architectures is based on "blocks" that help to
reduce complexity of the network. There are two type of blocks:

- Basic: Only applies two 3x3 convolutions. Used in the 18 and 34 architectures.
- Bottleneck: Applies a 1x1 convolution to reduce the channel size of the feature
    map to a 1/4 (i.e. a feature map with 512 channels is reduced to 128 channels),
    then applies a 3x3 convolution with this reduced channels and finally increase
    the channel dimensions again to the original size using a 1x1 convolution.
    This help to reduce the weights to learn and the complexity of the network.

After each convolution it applies a batch normalization and after each block applies
a "Residual connection" that implies to sum the input of the block to the output of it.
This is helpful to learn identity maps, because if F(x) is the output of the block
the final output is F(x) + x, so if the weights went to zero, the output of the block
is only x. The hypothesis of the authors were that is more easy to learn zero weights
than learning 1 weights.

Finally, an architecture is composed by several "layers" that contains several "blocks".
All architectures has 5 layers, the difference is the kind of block and how many of them
are used in each one. The first layer is only a single convolutional layer with kernel
7x7 with stride 2, so the layers that contains blocks are only 4.

To see the architectures in detail you can go to the Table 1 in the original paper.

Original paper:
https://arxiv.org/pdf/1512.03385.pdf

Heavily inspired by the original code at:
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
"""
from torch import nn
from torch.utils import model_zoo


__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152']


MODEL_URLS = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}


class BasicBlock(nn.Module):
    """Basic block for ResNet.

    It applies two 3x3 convolutions to the input. After each convolution
    applies a batch normalization.

    You can provide a downsample module to downsample the input and sum
    to the output (Residual connection) if not provided it assumes that
    the input has the same dimension of the output.

    This block has no expansion (i.e. = 1), this means that the number of
    channels of the output feature map are the same as the input.
    """

    expansion = 1

    def __init__(self, in_channels, channels, stride=1, downsample=None):
        """Initialize the block and set all the modules needed.

        Args:
            in_channels (int): Number of channels of the input feature map.
            channels (int): Number of channels that the block must have.
                Also, this is the number of channels to output.
            stride (int): The stride of the convolutional layers.
            downsample (torch.nn.Module): Module downsample the output.
        """
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(channels)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        """Forward pass of the block.

        Args:
            x (torch.Tensor): Any tensor with shape (batch size, in_channels, height, width).

        Returns:
            torch.Tensor: The output of the block with shape
                (batch size, channels, height / stride, width / stride)
        """
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    """Bottleneck block for ResNet.

    Applies a convolution of kernel 1x1 to reduce the number of channels from
    in_channels to channels, then applies a 3x3 convolution and finally expand
    the channels to 4 * channels (i.e. expansion = 4) with a 1x1 convolution.

    You can provide a downsample module to downsample the input and sum
    to the output (Residual connection) if not provided it assumes that
    the input has the same dimension of the output.
    """
    expansion = 4

    def __init__(self, in_channels, channels, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, channels, kernel_size=1, stride=1, bias=False)
        self.bn1 = nn.BatchNorm2d(channels)

        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(channels)

        self.conv3 = nn.Conv2d(channels, channels * self.expansion, kernel_size=1, stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(channels * self.expansion)

        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        """Forward pass of the block.

        Args:
            x (torch.Tensor): Any tensor with shape (batch size, in_channels, height, width).

        Returns:
            torch.Tensor: The output of the block with shape
                (batch size, channels * expansion, height / stride, width / stride)
        """
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class ResNet(nn.Module):
    """ResNet architecture.

    Implements a ResNet given the type of block and the depth of the layers.

    If you provide the number of classes it can be used as a classifier, if not
    it return the output of each layer starting from the deepest.

    Keep in mind that for the first layer the stride is 2 ** 2 = 4, and the
    consecutive ones are 2 ** 3 = 8, 2 ** 4 = 16, 2 ** 5 = 32.

    So, if you provide an image with shape (3, 800, 800) the output of the last layer
    will be (512 * block.expansion, 25, 25).
    """

    def __init__(self, block, layers, num_classes=None):
        """Initialize the network.

        Args:
            block (torch.nn.Module): Indicates the block to use in the network. Must be
                a BasicBlock or a Bottleneck.
            layers (seq): Sequence to indicate the number of blocks per each layer.
                It must have length 4.
            num_classes (int, optional): If present initialize the architecture as a classifier
                 and append a fully connected layer to map from the feature map to the class
                 probabilities. If not present, the module returns the output of each layer.
        """
        super(ResNet, self).__init__()
        # Set the expansion of the net
        self.expansion = block.expansion
        # The depths of each layer
        depths = [64, 128, 256, 512]
        # in_channels help us to keep track of the number of channels before each block
        self.in_channels = depths[0]

        # Layer 1
        self.conv1 = nn.Conv2d(3, self.in_channels, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channels)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # The first layer does not apply stride because we use maxPool with stride 2
        self.layer1 = self._make_layer(block, depths[0], layers[0])
        self.layer2 = self._make_layer(block, depths[1], layers[1], stride=2)
        self.layer3 = self._make_layer(block, depths[2], layers[2], stride=2)
        self.layer4 = self._make_layer(block, depths[3], layers[3], stride=2)

        self.classifier = False
        if num_classes is not None and num_classes > 0:
            # Set the classifier
            self.classifier = True
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
            self.fully = nn.Linear(512 * block.expansion, num_classes)
        else:
            # Set the output's number of channels for each layer, useful to get the output depth outside this module
            # when using it as feature extractor
            self.output_channels = [depth * self.expansion for depth in depths[-3:]]
            self.output_channels.reverse()

        # Initialize network
        for module in self.modules():
            if isinstance(module, nn.Conv2d):
                nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(module, nn.BatchNorm2d):
                nn.init.constant_(module.weight, 1)
                nn.init.constant_(module.bias, 0)

    def _make_layer(self, block, channels, blocks, stride=1):
        """Creates a layer for the ResNet architecture.

        It uses the given 'block' for the layer and repeat it 'blocks' times.
        Each block expands the number of channels by a factor of block.expansion
        times, so the 'in_channels' for every block after the first is block.expansion
        times the 'channel' amount.

        This method modifies the in_channels attribute of the object to keep track of the
        number of channels before each block.

        Args:
            block (Module): Block class to use as the base block of the layer.
            channels (int): The number of channels that the block must have.
            blocks (int): How many blocks the layer must have.
            stride (int): Stride that the first block must apply. None other block
                applies stride.
        """
        downsample = None

        if stride != 1 or self.in_channels != channels * block.expansion:
            # Apply a module to reduce the width and height of the input feature map with the given stride
            # or to adjust the number of channels that the first block will receive
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channels * block.expansion),
            )

        layers = []
        # Only the first block applies a stride to the input
        layers.append(block(self.in_channels, channels, stride, downsample))
        # Now the in_channels are the output of the block that is the channels times block.expansion
        self.in_channels = channels * block.expansion

        for _ in range(1, blocks):
            layers.append(block(self.in_channels, channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        """Forward pass of the module.

        Pass the input tensor for the layers and has two different outputs depending
        if the module is used as a classifier or not.

        Args:
            x (torch.Tensor): A tensor with shape (batch size, 3, height, width).

        Returns:
            torch.Tensor: If the module is a classifier returns a tensor as (batch size, num_classes).
                If the module is a feature extractor (no num classes given) then returns a tuple
                with the output of the last 3 layers.
                The shapes are:
                    - layer 4: (batch size, 512 * block.expansion, height / 32, width / 32)
                    - layer 3: (batch size, 256 * block.expansion, height / 16, width / 16)
                    - layer 2: (batch size, 128 * block.expansion, height / 8,  width / 8)
        """
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)

        if self.classifier:
            x = self.layer2(x)
            x = self.layer3(x)
            x = self.layer4(x)
            x = self.avgpool(x)
            x = x.view(x.size(0), -1)
            x = self.fully(x)
            return x

        output2 = self.layer2(x)
        output3 = self.layer3(output2)
        output4 = self.layer4(output3)

        return output4, output3, output2


def resnet18(pretrained=False, **kwargs):
    """Constructs a ResNet-18 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet18']), strict=False)
    return model


def resnet34(pretrained=False, **kwargs):
    """Constructs a ResNet-34 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet34']), strict=False)
    return model


def resnet50(pretrained=False, **kwargs):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet50']), strict=False)
    return model


def resnet101(pretrained=False, **kwargs):
    """Constructs a ResNet-101 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet101']), strict=False)
    return model


def resnet152(pretrained=False, **kwargs):
    """Constructs a ResNet-152 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet152']), strict=False)
    return model

Functions

def resnet101(pretrained=False, **kwargs)

Constructs a ResNet-101 model.

Args

pretrained : bool
If True, returns a model pre-trained on ImageNet
Source code
def resnet101(pretrained=False, **kwargs):
    """Constructs a ResNet-101 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet101']), strict=False)
    return model
def resnet152(pretrained=False, **kwargs)

Constructs a ResNet-152 model.

Args

pretrained : bool
If True, returns a model pre-trained on ImageNet
Source code
def resnet152(pretrained=False, **kwargs):
    """Constructs a ResNet-152 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet152']), strict=False)
    return model
def resnet18(pretrained=False, **kwargs)

Constructs a ResNet-18 model.

Args

pretrained : bool
If True, returns a model pre-trained on ImageNet
Source code
def resnet18(pretrained=False, **kwargs):
    """Constructs a ResNet-18 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet18']), strict=False)
    return model
def resnet34(pretrained=False, **kwargs)

Constructs a ResNet-34 model.

Args

pretrained : bool
If True, returns a model pre-trained on ImageNet
Source code
def resnet34(pretrained=False, **kwargs):
    """Constructs a ResNet-34 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet34']), strict=False)
    return model
def resnet50(pretrained=False, **kwargs)

Constructs a ResNet-50 model.

Args

pretrained : bool
If True, returns a model pre-trained on ImageNet
Source code
def resnet50(pretrained=False, **kwargs):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(MODEL_URLS['resnet50']), strict=False)
    return model

Classes

class ResNet (ancestors: torch.nn.modules.module.Module)

ResNet architecture.

Implements a ResNet given the type of block and the depth of the layers.

If you provide the number of classes it can be used as a classifier, if not it return the output of each layer starting from the deepest.

Keep in mind that for the first layer the stride is 2 2 = 4, and the consecutive ones are 2 3 = 8, 2 4 = 16, 2 5 = 32.

So, if you provide an image with shape (3, 800, 800) the output of the last layer will be (512 * block.expansion, 25, 25).

Source code
class ResNet(nn.Module):
    """ResNet architecture.

    Implements a ResNet given the type of block and the depth of the layers.

    If you provide the number of classes it can be used as a classifier, if not
    it return the output of each layer starting from the deepest.

    Keep in mind that for the first layer the stride is 2 ** 2 = 4, and the
    consecutive ones are 2 ** 3 = 8, 2 ** 4 = 16, 2 ** 5 = 32.

    So, if you provide an image with shape (3, 800, 800) the output of the last layer
    will be (512 * block.expansion, 25, 25).
    """

    def __init__(self, block, layers, num_classes=None):
        """Initialize the network.

        Args:
            block (torch.nn.Module): Indicates the block to use in the network. Must be
                a BasicBlock or a Bottleneck.
            layers (seq): Sequence to indicate the number of blocks per each layer.
                It must have length 4.
            num_classes (int, optional): If present initialize the architecture as a classifier
                 and append a fully connected layer to map from the feature map to the class
                 probabilities. If not present, the module returns the output of each layer.
        """
        super(ResNet, self).__init__()
        # Set the expansion of the net
        self.expansion = block.expansion
        # The depths of each layer
        depths = [64, 128, 256, 512]
        # in_channels help us to keep track of the number of channels before each block
        self.in_channels = depths[0]

        # Layer 1
        self.conv1 = nn.Conv2d(3, self.in_channels, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channels)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # The first layer does not apply stride because we use maxPool with stride 2
        self.layer1 = self._make_layer(block, depths[0], layers[0])
        self.layer2 = self._make_layer(block, depths[1], layers[1], stride=2)
        self.layer3 = self._make_layer(block, depths[2], layers[2], stride=2)
        self.layer4 = self._make_layer(block, depths[3], layers[3], stride=2)

        self.classifier = False
        if num_classes is not None and num_classes > 0:
            # Set the classifier
            self.classifier = True
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
            self.fully = nn.Linear(512 * block.expansion, num_classes)
        else:
            # Set the output's number of channels for each layer, useful to get the output depth outside this module
            # when using it as feature extractor
            self.output_channels = [depth * self.expansion for depth in depths[-3:]]
            self.output_channels.reverse()

        # Initialize network
        for module in self.modules():
            if isinstance(module, nn.Conv2d):
                nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(module, nn.BatchNorm2d):
                nn.init.constant_(module.weight, 1)
                nn.init.constant_(module.bias, 0)

    def _make_layer(self, block, channels, blocks, stride=1):
        """Creates a layer for the ResNet architecture.

        It uses the given 'block' for the layer and repeat it 'blocks' times.
        Each block expands the number of channels by a factor of block.expansion
        times, so the 'in_channels' for every block after the first is block.expansion
        times the 'channel' amount.

        This method modifies the in_channels attribute of the object to keep track of the
        number of channels before each block.

        Args:
            block (Module): Block class to use as the base block of the layer.
            channels (int): The number of channels that the block must have.
            blocks (int): How many blocks the layer must have.
            stride (int): Stride that the first block must apply. None other block
                applies stride.
        """
        downsample = None

        if stride != 1 or self.in_channels != channels * block.expansion:
            # Apply a module to reduce the width and height of the input feature map with the given stride
            # or to adjust the number of channels that the first block will receive
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channels * block.expansion),
            )

        layers = []
        # Only the first block applies a stride to the input
        layers.append(block(self.in_channels, channels, stride, downsample))
        # Now the in_channels are the output of the block that is the channels times block.expansion
        self.in_channels = channels * block.expansion

        for _ in range(1, blocks):
            layers.append(block(self.in_channels, channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        """Forward pass of the module.

        Pass the input tensor for the layers and has two different outputs depending
        if the module is used as a classifier or not.

        Args:
            x (torch.Tensor): A tensor with shape (batch size, 3, height, width).

        Returns:
            torch.Tensor: If the module is a classifier returns a tensor as (batch size, num_classes).
                If the module is a feature extractor (no num classes given) then returns a tuple
                with the output of the last 3 layers.
                The shapes are:
                    - layer 4: (batch size, 512 * block.expansion, height / 32, width / 32)
                    - layer 3: (batch size, 256 * block.expansion, height / 16, width / 16)
                    - layer 2: (batch size, 128 * block.expansion, height / 8,  width / 8)
        """
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)

        if self.classifier:
            x = self.layer2(x)
            x = self.layer3(x)
            x = self.layer4(x)
            x = self.avgpool(x)
            x = x.view(x.size(0), -1)
            x = self.fully(x)
            return x

        output2 = self.layer2(x)
        output3 = self.layer3(output2)
        output4 = self.layer4(output3)

        return output4, output3, output2

Methods

def __init__(self, block, layers, num_classes=None)

Initialize the network.

Args

block : torch.nn.Module
Indicates the block to use in the network. Must be a BasicBlock or a Bottleneck.
layers : seq
Sequence to indicate the number of blocks per each layer. It must have length 4.
num_classes : int, optional
If present initialize the architecture as a classifier and append a fully connected layer to map from the feature map to the class probabilities. If not present, the module returns the output of each layer.
Source code
def __init__(self, block, layers, num_classes=None):
    """Initialize the network.

    Args:
        block (torch.nn.Module): Indicates the block to use in the network. Must be
            a BasicBlock or a Bottleneck.
        layers (seq): Sequence to indicate the number of blocks per each layer.
            It must have length 4.
        num_classes (int, optional): If present initialize the architecture as a classifier
             and append a fully connected layer to map from the feature map to the class
             probabilities. If not present, the module returns the output of each layer.
    """
    super(ResNet, self).__init__()
    # Set the expansion of the net
    self.expansion = block.expansion
    # The depths of each layer
    depths = [64, 128, 256, 512]
    # in_channels help us to keep track of the number of channels before each block
    self.in_channels = depths[0]

    # Layer 1
    self.conv1 = nn.Conv2d(3, self.in_channels, kernel_size=7, stride=2, padding=3, bias=False)
    self.bn1 = nn.BatchNorm2d(self.in_channels)
    self.relu = nn.ReLU(inplace=True)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

    # The first layer does not apply stride because we use maxPool with stride 2
    self.layer1 = self._make_layer(block, depths[0], layers[0])
    self.layer2 = self._make_layer(block, depths[1], layers[1], stride=2)
    self.layer3 = self._make_layer(block, depths[2], layers[2], stride=2)
    self.layer4 = self._make_layer(block, depths[3], layers[3], stride=2)

    self.classifier = False
    if num_classes is not None and num_classes > 0:
        # Set the classifier
        self.classifier = True
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fully = nn.Linear(512 * block.expansion, num_classes)
    else:
        # Set the output's number of channels for each layer, useful to get the output depth outside this module
        # when using it as feature extractor
        self.output_channels = [depth * self.expansion for depth in depths[-3:]]
        self.output_channels.reverse()

    # Initialize network
    for module in self.modules():
        if isinstance(module, nn.Conv2d):
            nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')
        elif isinstance(module, nn.BatchNorm2d):
            nn.init.constant_(module.weight, 1)
            nn.init.constant_(module.bias, 0)
def forward(self, x)

Forward pass of the module.

Pass the input tensor for the layers and has two different outputs depending if the module is used as a classifier or not.

Args

x : torch.Tensor
A tensor with shape (batch size, 3, height, width).

Returns

torch.Tensor: If the module is a classifier returns a tensor as (batch size, num_classes). If the module is a feature extractor (no num classes given) then returns a tuple with the output of the last 3 layers. The shapes are: - layer 4: (batch size, 512 * block.expansion, height / 32, width / 32) - layer 3: (batch size, 256 * block.expansion, height / 16, width / 16) - layer 2: (batch size, 128 * block.expansion, height / 8, width / 8)

Source code
def forward(self, x):
    """Forward pass of the module.

    Pass the input tensor for the layers and has two different outputs depending
    if the module is used as a classifier or not.

    Args:
        x (torch.Tensor): A tensor with shape (batch size, 3, height, width).

    Returns:
        torch.Tensor: If the module is a classifier returns a tensor as (batch size, num_classes).
            If the module is a feature extractor (no num classes given) then returns a tuple
            with the output of the last 3 layers.
            The shapes are:
                - layer 4: (batch size, 512 * block.expansion, height / 32, width / 32)
                - layer 3: (batch size, 256 * block.expansion, height / 16, width / 16)
                - layer 2: (batch size, 128 * block.expansion, height / 8,  width / 8)
    """
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)
    x = self.layer1(x)

    if self.classifier:
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fully(x)
        return x

    output2 = self.layer2(x)
    output3 = self.layer3(output2)
    output4 = self.layer4(output3)

    return output4, output3, output2