CV中的注意力机制
date: 2025-01-17 tags:
- python
- cv
- Attention
本文记录了在图像方面的注意力机制的代码,主要包括三个方面:
1、通道域
通道域——类似于给每个通道上的信号都增加一个权重,来代表该通道与关键信息的相关度的话,这个权重越大,则表示相关度越高。对通道生成掩码mask,进行打分,代表是senet, Channel Attention Module。
2、空间域
空间域——将图片中的的空间域信息做对应的空间变换,从而能将关键的信息提取出来。对空间进行掩码的生成,进行打分,代表是Spatial Attention Module。
3、混合域
混合域——空间域的注意力是忽略了通道域中的信息,将每个通道中的图片特征同等处理,这种做法会将空间域变换方法局限在原始图片特征提取阶段,应用在神经网络层其他层的可解释性不强。而通道域的注意力是对一个通道内的信息直接全局平均池化,而忽略每一个通道内的局部信息,这种做法其实也是比较暴力的行为。所以结合两种思路,就可以设计出混合域的注意力机制模型。同时对通道注意力和空间注意力进行评价打分,代表的有BAM, CBAM。
参考博客:
图像中的注意力机制详解(SEBlock | ECABlock | CBAM)_图像注意力机制-CSDN博客
视觉注意力机制——SENet、CBAM、SKNet_双重注意力机制与cbam注意力机制区别-CSDN博客
SKnet:Selective Kernel Networks学习笔记+Pytorch代码实现_sknet论文地址-CSDN博客
【深度学习注意力机制系列】—— SKNet注意力机制(附pytorch实现)-CSDN博客
首先,导入包:
import math
from functools import reduce
import torch.nn as nn
import torch1 通道域
1.1 SEnet
1.1.1 介绍
1、SEnet,全名为Squeeze-and-Excitation Networks,是一种通道注意力机制,发表在==2017CVPR==上
2、论文地址:https://arxiv.org/abs/1709.01507
3、代码地址:https://github.com/hujie-frank/SENet
1.1.2 代码
1、模型代码实现:
class SEBlock(nn.Module):
def __init__(self, mode, channels, ratio):
super(SEBlock, self).__init__()
self.avg_pooling = nn.AdaptiveAvgPool2d(1)
self.max_pooling = nn.AdaptiveMaxPool2d(1)
if mode == "max":
self.global_pooling = self.max_pooling
elif mode == "avg":
self.global_pooling = self.avg_pooling
self.fc_layers = nn.Sequential(
nn.Linear(in_features=channels, out_features=channels // ratio, bias=False),
nn.ReLU(),
nn.Linear(in_features=channels // ratio, out_features=channels, bias=False),
)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
b, c, _, _ = x.shape
v = self.global_pooling(x).view(b, c)
v = self.fc_layers(v).view(b, c, 1, 1)
v = self.sigmoid(v)
return x * v2、测试输入输出:
if __name__ == '__main__':
# 创建一个形状为[2,64,16,16]的输入张量
x = torch.randn([2, 64, 16, 16]) # batch size为2,通道数为64,高度和宽度为16
print('x', x.shape) # [2,64,16,16]
seNet = SEBlock("max", 64, 9)
print("seNet parameters:", sum(p.numel() for p in seNet.parameters())) # model parameters: 896
# print(seNet)
outputs = seNet(x)
# 打印输出张量的形状
print("outputs", outputs.shape) # outputs1 torch.Size([2, 64, 16, 16])
"""
打印结果:
x torch.Size([2, 64, 16, 16])
seNet parameters: 896
outputs torch.Size([2, 64, 16, 16])
"""说明:
模型的输入输出维度一样,可以当作一个模块插入已有模型中
1.2 ECANet
1.2.1 介绍
1、ECANet在SENet的基础上,为了避免地增加了模型的复杂性,将一维卷积替换SENet中的MLP
2、论文名称:ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
3、论文链接:https://arxiv.org/pdf/1910.03151.pdf
4、论文代码:https://github.com/BangguWu/ECANet
1.2.2 代码
1、模型代码实现
class ECABlock(nn.Module):
def __init__(self, channels, gamma=2, b=1):
super(ECABlock, self).__init__()
kernel_size = int(abs((math.log(channels, 2) + b) / gamma))
kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
print("==============================ECABlock==============================")
print("x", x.shape)
v = self.avg_pool(x)
print("v", v.shape)
v = self.conv(v.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
print("v", v.shape)
v = self.sigmoid(v)
print("v", v.shape)
return x * v2、测试输入输出
if __name__ == '__main__':
# 创建一个形状为[2,64,16,16]的输入张量
x = torch.randn([2, 64, 16, 16]) # batch size为2,通道数为64,高度和宽度为16
print('x', x.shape) # [2,64,16,16]
ecaNet = ECABlock(64, gamma=2, b=1)
print("ecabNet parameters:", sum(p.numel() for p in ecaNet.parameters())) # model parameters: 3
# print(ecabNet)
output = ecaNet(x)
print("output", output.shape) # outputs2 torch.Size([2, 64, 16, 16])
"""
打印结果:
x torch.Size([2, 64, 16, 16])
ecabNet parameters: 3
==============================ECABlock==============================
x torch.Size([2, 64, 16, 16])
v torch.Size([2, 64, 1, 1])
v torch.Size([2, 64, 1, 1])
v torch.Size([2, 64, 1, 1])
output torch.Size([2, 64, 16, 16])
"""说明:
模型的输入输出维度一样,可以当作一个模块插入已有模型中。并且可以发现,SCANet相比SENet参数量少了很多
1.3 SKnet
1.3.1 介绍
SKNet(Selective Kernel Network)是一种用于图像分类和目标检测任务的深度神经网络架构,其核心创新是引入了选择性的多尺度卷积核(Selective Kernel)以及一种新颖的注意力机制,从而在不增加网络复杂性的情况下提升了特征提取的能力。SKNet的设计旨在解决多尺度信息融合的问题,使网络能够适应不同尺度的特征。
论文地址:https://arxiv.org/abs/1903.06586 代码地址:https://github.com/implus/SKNet
1.3.2 代码
1、模型代码
# SKConv
class SKConv(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, M=2, r=16, L=32):
'''
:param in_channels: 输入通道维度
:param out_channels: 输出通道维度 原论文中 输入输出通道维度相同
:param stride: 步长,默认为1
:param M: 分支数
:param r: 特征Z的长度,计算其维度d 时所需的比率(论文中 特征S->Z 是降维,故需要规定 降维的下界)
:param L: 论文中规定特征Z的下界,默认为32
采用分组卷积: groups = 32,所以输入channel的数值必须是group的整数倍
'''
super(SKConv, self).__init__()
d = max(in_channels // r, L) # 计算从向量C降维到 向量Z 的长度d
self.M = M
self.out_channels = out_channels
self.conv = nn.ModuleList() # 根据分支数量 添加 不同核的卷积操作
for i in range(M):
# 为提高效率,原论文中 扩张卷积5x5为 (3X3,dilation=2)来代替。 且论文中建议组卷积G=32
self.conv.append(nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, stride, padding=1 + i, dilation=1 + i, groups=32, bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)))
self.global_pool = nn.AdaptiveAvgPool2d(output_size=1) # 自适应pool到指定维度 这里指定为1,实现 GAP
self.fc1 = nn.Sequential(nn.Conv2d(out_channels, d, 1, bias=False),
nn.BatchNorm2d(d),
nn.ReLU(inplace=True)) # 降维
self.fc2 = nn.Conv2d(d, out_channels * M, 1, 1, bias=False) # 升维
self.softmax = nn.Softmax(dim=1) # 指定dim=1 使得两个全连接层对应位置进行softmax,保证 对应位置a+b+..=1
def forward(self, input):
batch_size = input.size(0)
output = []
# the part of split
for i, conv in enumerate(self.conv):
# print(i,conv(input).size())
output.append(conv(input)) # [batch_size,out_channels,H,W]
# the part of fusion
U = reduce(lambda x, y: x + y, output) # 逐元素相加生成 混合特征U [batch_size,channel,H,W]
# print(U.size())
s = self.global_pool(U) # [batch_size,channel,1,1]
# print(s.size())
z = self.fc1(s) # S->Z降维 # [batch_size,d,1,1]
# print(z.size())
a_b = self.fc2(z) # Z->a,b 升维 论文使用conv 1x1表示全连接。结果中前一半通道值为a,后一半为b [batch_size,out_channels*M,1,1]
# print(a_b.size())
a_b = a_b.reshape(batch_size, self.M, self.out_channels, -1) # 调整形状,变为 两个全连接层的值[batch_size,M,out_channels,1]
# print(a_b.size())
a_b = self.softmax(a_b) # 使得两个全连接层对应位置进行softmax [batch_size,M,out_channels,1]
# the part of selection
a_b = list(a_b.chunk(self.M,
dim=1)) # split to a and b chunk为pytorch方法,将tensor按照指定维度切分成 几个tensor块 [[batch_size,1,out_channels,1],[batch_size,1,out_channels,1]
# print(a_b[0].size())
# print(a_b[1].size())
a_b = list(map(lambda x: x.reshape(batch_size, self.out_channels, 1, 1),
a_b)) # 将所有分块 调整形状,即扩展两维 [[batch_size,out_channels,1,1],[batch_size,out_channels,1,1]
V = list(map(lambda x, y: x * y, output,
a_b)) # 权重与对应 不同卷积核输出的U 逐元素相乘[batch_size,out_channels,H,W] * [batch_size,out_channels,1,1] = [batch_size,out_channels,H,W]
V = reduce(lambda x, y: x + y,
V) # 两个加权后的特征 逐元素相加 [batch_size,out_channels,H,W] + [batch_size,out_channels,H,W] = [batch_size,out_channels,H,W]
return V # [batch_size,out_channels,H,W]2、测试输入输出
if __name__ == '__main__':
# 创建一个形状为[2,64,16,16]的输入张量
x = torch.randn([2, 64, 16, 16]) # batch size为2,通道数为64,高度和宽度为16
print('x', x.shape) # [2,64,16,16]
sknet = SKConv(64, 64)
print("sknet parameters:", sum(p.numel() for p in sknet.parameters()))
print("sknet(feature_maps)", sknet(x).shape)
"""
打印结果:
x torch.Size([2, 64, 16, 16])
sknet parameters: 8768
sknet(feature_maps) torch.Size([2, 64, 16, 16])
"""模型的输入输出维度一样,可以当作一个模块插入已有模型中
Triplet Attention代码实现
GSoP-Net(Global Second-order Pooling Convolutional Networks)
2 混合域
2.1 CBAM
2.1.1 介绍
1、论文名称:CBAM: Convolutional Block Attention Module
2、论文链接:https://arxiv.org/pdf/1807.06521v2.pdf
3、论文代码:https://github.com/luuuyi/CBAM.PyTorch(复现版本)
卷积块注意模块(CBAM),一种简单而有效的前馈卷积神经网络注意模块。其将通道注意力和空间注意力相互结合,是一种混合注意力模块
2.1.2 代码
1、模型代码实现
(1)通道注意力
1、全连接层版本
# 通道注意力机制——全连接层版本
class Channel_Attention_Module_FC(nn.Module):
def __init__(self, channels, ratio=4):
super(Channel_Attention_Module_FC, self).__init__()
self.avg_pooling = nn.AdaptiveAvgPool2d(1)
self.max_pooling = nn.AdaptiveMaxPool2d(1)
self.fc_layers = nn.Sequential(
nn.Linear(in_features=channels, out_features=channels // ratio, bias=False),
nn.ReLU(),
nn.Linear(in_features=channels // ratio, out_features=channels, bias=False)
)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
b, c, h, w = x.shape
avg_x = self.avg_pooling(x).view(b, c)
max_x = self.max_pooling(x).view(b, c)
v = self.fc_layers(avg_x) + self.fc_layers(max_x)
v = self.sigmoid(v).view(b, c, 1, 1)
return x * v# 通道注意力机制——全连接层版本,使用ConV实现
class Channel_attention_Module_FC_Use_Conv(nn.Module):
def __init__(self, in_channels, reduction=4): # in_channels是输入特征的通道数,reduction用于控制通道降维的程度
super().__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1) # 全局平均池化,将每个通道的特征图压缩成一个点
self.max_pool = nn.AdaptiveMaxPool2d(1) # 全局最大池化,同样将每个通道的特征图压缩成一个点
self.fc = nn.Sequential(
nn.Conv2d(in_channels, in_channels // reduction, 1, bias=False), # 第一个全连接层,用于降维
nn.ReLU(inplace=True), # 激活函数
nn.Conv2d(in_channels // reduction, in_channels, 1, bias=False) # 第二个全连接层,用于升维
)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
# print("self.avg_pool(x)", self.avg_pool(x).shape)
# print("self.max_pool(x)", self.max_pool(x).shape)
avg_out = self.fc(self.avg_pool(x)) # 对平均池化后的特征进行全连接变换
max_out = self.fc(self.max_pool(x)) # 对最大池化后的特征进行全连接变换
out = avg_out + max_out
return self.sigmoid(out)
Channel_Attention_Module_FC和Channel_attention_Module_FC_Use_Conv参数量一样
2、一维卷积版本
# 通道注意力机制——一维卷积版本
class Channel_Attention_Module_Conv(nn.Module):
def __init__(self, channels, gamma = 2, b = 1):
super(Channel_Attention_Module_Conv, self).__init__()
kernel_size = int(abs((math.log(channels, 2) + b) / gamma))
kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
self.avg_pooling = nn.AdaptiveAvgPool2d(1)
self.max_pooling = nn.AdaptiveMaxPool2d(1)
self.conv = nn.Conv1d(1, 1, kernel_size = kernel_size, padding = (kernel_size - 1) // 2, bias = False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_x = self.avg_pooling(x)
max_x = self.max_pooling(x)
avg_out = self.conv(avg_x.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
max_out = self.conv(max_x.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
v = self.sigmoid(avg_out + max_out)
return x * v(2)空间注意力
# 空间注意力机制
class Spatial_Attention_Module(nn.Module):
def __init__(self, k: int):
super(Spatial_Attention_Module, self).__init__()
self.avg_pooling = torch.mean
self.max_pooling = torch.max
# In order to keep the size of the front and rear images consistent
# with calculate, k = 1 + 2p, k denote kernel_size, and p denote padding number
# so, when p = 1 -> k = 3; p = 2 -> k = 5; p = 3 -> k = 7, it works. when p = 4 -> k = 9, it is too big to use in network
assert k in [3, 5, 7], "kernel size = 1 + 2 * padding, so kernel size must be 3, 5, 7"
self.conv = nn.Conv2d(2, 1, kernel_size = (k, k), stride = (1, 1), padding = ((k - 1) // 2, (k - 1) // 2),
bias = False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
# compress the C channel to 1 and keep the dimensions
avg_x = self.avg_pooling(x, dim = 1, keepdim = True)
max_x, _ = self.max_pooling(x, dim = 1, keepdim = True)
v = self.conv(torch.cat((max_x, avg_x), dim = 1))
v = self.sigmoid(v)
return x * v(3)总的CBAM模块
# CBAM模块(空间注意力和通道注意力二者结合)
class CBAMBlock(nn.Module):
def __init__(self, channel_attention_mode: str, spatial_attention_kernel_size: int, channels: int = None,
ratio: int = None, gamma: int = None, b: int = None):
super(CBAMBlock, self).__init__()
if channel_attention_mode == "FC":
assert channels != None and ratio != None and channel_attention_mode == "FC", \
"FC channel attention block need feature maps' channels, ratio"
self.channel_attention_block = Channel_Attention_Module_FC(channels = channels, ratio = ratio)
elif channel_attention_mode == "Conv":
assert channels != None and gamma != None and b != None and channel_attention_mode == "Conv", \
"Conv channel attention block need feature maps' channels, gamma, b"
self.channel_attention_block = Channel_Attention_Module_Conv(channels = channels, gamma = gamma, b = b)
else:
assert channel_attention_mode in ["FC", "Conv"], \
"channel attention block must be 'FC' or 'Conv'"
self.spatial_attention_block = Spatial_Attention_Module(k = spatial_attention_kernel_size)
def forward(self, x):
x = self.channel_attention_block(x)
x = self.spatial_attention_block(x)
return x2、测试输入输出
(1)测试通道注意力和空间注意力的输入输出
if __name__ == '__main__':
# 创建通道注意力模型,输入通道数为64
ch_Attn_fc_use_conv = Channel_attention_Module_FC_Use_Conv(64)
print("ch_Attn_fc_use_conv parameters:", sum(p.numel() for p in ch_Attn_fc_use_conv.parameters())) # ch_Attn_fc_use_conv parameters: 2048
# # 打印模型结构
# print(ch_Attn_fc_use_conv)
# outputs1 = ch_Attn_fc_use_conv(inputs)
# # 打印输入张量的形状=
# print("outputs3", outputs1.shape) # outputs3 torch.Size([2, 64, 1, 1])
ch_Attn_fc = Channel_Attention_Module_FC(64)
print("ch_Attn_fc parameters:", sum(p.numel() for p in ch_Attn_fc.parameters())) # ch_Attn_fc parameters: 2048
ch_Attn_conv = Channel_Attention_Module_Conv(64)
print("ch_Attn_conv parameters:", sum(p.numel() for p in ch_Attn_conv.parameters())) # ch_Attn_conv parameters: 3
sp_Attn = Spatial_Attention_Module(k=5)
print("sp_Attn parameters:", sum(p.numel() for p in sp_Attn.parameters())) # sp_Attn parameters: 50
"""
打印结果:
ch_Attn_fc_use_conv parameters: 2048
ch_Attn_fc parameters: 2048
ch_Attn_conv parameters: 3
sp_Attn parameters: 50
"""(2)测试CBAM整个模型
if __name__ == '__main__':
# 创建一个形状为[2,64,16,16]的输入张量
x = torch.randn([2, 64, 16, 16]) # batch size为2,通道数为64,高度和宽度为16
print('x', x.shape) # [2,64,16,16]
cbam1 = CBAMBlock("FC", 5, channels=64, ratio=9)
print("cbam1 parameters:", sum(p.numel() for p in cbam1.parameters()))
print("cbam1(feature_maps)", cbam1(x).shape)
cbam2 = CBAMBlock("Conv", 5, channels=64, gamma=2, b=1)
print("cbam2 parameters:", sum(p.numel() for p in cbam2.parameters()))
print("cbam2(feature_maps)", cbam2(x).shape)
"""
打印结果:
x torch.Size([2, 64, 16, 16])
cbam1 parameters: 946
cbam1(feature_maps) torch.Size([2, 64, 16, 16])
cbam2 parameters: 53
cbam2(feature_maps) torch.Size([2, 64, 16, 16])
"""
CBAM模型的输入输出维度一样,可以当作一个模块插入已有模型中。并且可以发现,在通道注意力中使用了卷积比使用全连接层的参数量少很多
