# Awesome-Vision-Attentions

**Repository Path**: milixiang/Awesome-Vision-Attentions

## Basic Information

- **Project Name**: Awesome-Vision-Attentions
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 1
- **Created**: 2021-11-27
- **Last Updated**: 2022-01-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey  [paper](https://arxiv.org/abs/2111.07624)

## 介绍该论文的中文版博客 [链接](https://mp.weixin.qq.com/s/0iOZ45NTK9qSWJQlcI3_kQ )


![image](https://github.com/MenghaoGuo/Awesome-Vision-Attentions/blob/main/imgs/fuse.png)


<!-- ![image](https://github.com/MenghaoGuo/Awesome-Vision-Attentions/blob/main/imgs/attention_category.png) -->


- [Vision-Attention-Papers](#vision-attention-papers)
  * [Channel attention](#channel-attention)
  * [Spatial attention](#spatial-attention)
  * [Temporal attention](#temporal-attention)
  * [Branch attention](#branch-attention)
  * [Channel \& Spatial attention](#channelspatial-attention)
  * [Spatial \& Temporal attention](#spatialtemporal-attention)


🔥 (citations > 200)  

* TODO : Code about different attention mechanisms will come soon.
* TODO :  [Code]() link will come soon.
* TODO :  collect more related papers. Contributions are welcome. 

## Channel attention

* Squeeze-and-Excitation Networks(CVPR2018) [pdf](https://arxiv.org/pdf/1709.01507), (PAMI2019 version) [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8701503)  🔥 
* Image superresolution using very deep residual channel attention networks(ECCV2018) [pdf](https://arxiv.org/pdf/1807.02758)   🔥 
* Context encoding for semantic segmentation(CVPR2018) [pdf](https://arxiv.org/pdf/1803.08904)   🔥 
* Spatio-temporal channel correlation networks for action classification(ECCV2018)  [pdf](https://arxiv.org/pdf/1806.07754)
* Global second-order pooling convolutional networks(CVPR2019) [pdf](https://arxiv.org/pdf/1811.12006)
* Srm : A style-based recalibration module for convolutional neural networks(ICCV2019)  [pdf](https://arxiv.org/pdf/1903.10829) 
* You look twice: Gaternet for dynamic filter selection in cnns(CVPR2019)  [pdf](https://arxiv.org/pdf/1811.11205)
* Second-order attention network for single image super-resolution(CVPR2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Dai_Second-Order_Attention_Network_for_Single_Image_Super-Resolution_CVPR_2019_paper.pdf)  🔥 
* Spsequencenet: Semantic segmentation network on 4d point clouds(CVPR2020)  [pdf](https://openaccess.thecvf.com/content_CVPR_2020/html/Shi_SpSequenceNet_Semantic_Segmentation_Network_on_4D_Point_Clouds_CVPR_2020_paper.html)
* Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR2020) [pdf](https://arxiv.org/pdf/1910.03151)   🔥 
* Gated channel transformation for visual recognition(CVPR2020)  [pdf](https://arxiv.org/pdf/1909.11519) 
* Fcanet: Frequency channel attention networks(ICCV2021)  [pdf](https://arxiv.org/pdf/2012.11879)

## Spatial attention

- Recurrent models of visual attention(NeurIPS2014), [pdf](https://arxiv.org/pdf/1406.6247)   🔥 
- Show, attend and tell: Neural image caption generation with visual attention(PMLR2015) [pdf](https://arxiv.org/pdf/1502.03044)   🔥 
- Draw: A recurrent neural network for image generation(ICML2015) [pdf](https://arxiv.org/pdf/1502.04623)   🔥 
- Spatial transformer networks(NeurIPS2015) [pdf](https://arxiv.org/pdf/1506.02025)   🔥 
- Multiple object recognition with visual attention(ICLR2015) [pdf](https://arxiv.org/pdf/1412.7755)   🔥 
- Action recognition using visual attention(arXiv2015) [pdf](https://arxiv.org/pdf/1511.04119)   🔥 
- Videolstm convolves, attends and flows for action recognition(arXiv2016) [pdf](https://arxiv.org/pdf/1607.01794)   🔥 
- Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition(CVPR2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf)   🔥 
- Learning multi-attention convolutional neural network for fine-grained image recognition(ICCV2017) [pdf](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf)   🔥 
- Diversified visual attention networks for fine-grained object classification(TMM2017) [pdf](https://arxiv.org/pdf/1606.08572)   🔥 
- High-Order Attention Models for Visual Question Answering (NIPS2017) [pdf](https://arxiv.org/pdf/1711.04323)
- Attentional pooling for action recognition(NeurIPS2017) [pdf](https://arxiv.org/pdf/1711.01467)   🔥 
- Non-local neural networks(CVPR2018) [pdf](https://arxiv.org/pdf/1711.07971)   🔥 
- Attentional shapecontextnet for point cloud recognition(CVPR2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xie_Attentional_ShapeContextNet_for_CVPR_2018_paper.pdf) 
- Relation networks for object detection(CVPR2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Relation_Networks_for_CVPR_2018_paper.pdf)   🔥 
- a2-nets: Double attention networks(NeurIPS2018) [pdf](https://arxiv.org/pdf/1810.11579)   🔥 
- Attention-aware compositional network for person re-identification(CVPR2018) [pdf](https://arxiv.org/pdf/1805.03344)   🔥 
- Tell me where to look: Guided attention inference network(CVPR2018) [pdf](https://arxiv.org/pdf/1802.10171)   🔥 
- Pedestrian alignment network for large-scale person re-identification(TCSVT2018) [pdf](https://arxiv.org/pdf/1707.00408)   🔥 
- Learn to pay attention(ICLR2018) [pdf](https://arxiv.org/pdf/1804.02391.pdf)   🔥
- Attention U-Net: Learning Where to Look for the Pancreas(MIDL2018) [pdf](https://arxiv.org/pdf/1804.03999.pdf)   🔥
- Psanet: Point-wise spatial attention network for scene parsing(ECCV2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/html/Hengshuang_Zhao_PSANet_Point-wise_Spatial_ECCV_2018_paper.html)   🔥 
- Self attention generative adversarial networks(ICML2019) [pdf](https://arxiv.org/pdf/1805.08318)   🔥 
- Attentional pointnet for 3d-object detection in point clouds(CVPRW2019) [pdf](https://openaccess.thecvf.com/content_CVPRW_2019/papers/WAD/Paigwar_Attentional_PointNet_for_3D-Object_Detection_in_Point_Clouds_CVPRW_2019_paper.pdf)
- Co-occurrent features in semantic segmentation(CVPR2019) [pdf](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Co-Occurrent_Features_in_Semantic_Segmentation_CVPR_2019_paper.pdf)
- Factor Graph Attention(CVPR2019) [pdf](https://arxiv.org/pdf/1904.05880)
- Attention augmented convolutional networks(ICCV2019) [pdf](https://arxiv.org/pdf/1904.09925)   🔥 
- Local relation networks for image recognition(ICCV2019) [pdf](https://arxiv.org/pdf/1904.11491)
- Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML2019) [pdf](https://arxiv.org/pdf/1905.11634)
- Graph-based global reasoning networks(CVPR2019) [pdf](https://arxiv.org/pdf/1811.12814)   🔥 
- Gcnet: Non-local networks meet squeeze-excitation networks and beyond(ICCVW2019) [pdf](https://arxiv.org/pdf/1904.11492)   🔥 
- Asymmetric non-local neural networks for semantic segmentation(ICCV2019) [pdf](https://arxiv.org/pdf/1908.07678)   🔥 
- Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition(CVPR2019) [pdf](https://arxiv.org/pdf/1903.06150) 
- Second-order non-local attention networks for person re-identification(ICCV2019) [pdf](https://arxiv.org/pdf/1909.00295)   🔥 
- End-to-end comparative attention networks for person re-identification(ICCV2019) [pdf](https://arxiv.org/pdf/1606.04404)   🔥 
- Modeling point clouds with self-attention and gumbel subset sampling(CVPR2019) [pdf](https://arxiv.org/pdf/1904.03375)
- Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification(arXiv 2019) [pdf](https://arxiv.org/pdf/1801.09927)
- L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention(arXiv 2019) [pdf](https://arxiv.org/pdf/1908.00720)
- Generative pretraining from pixels(PMLR2020) [pdf](https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf)
- Exploring self-attention for image recognition(CVPR2020) [pdf](https://arxiv.org/pdf/2004.13621)
- Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention(MM20) [pdf](https://dl.acm.org/doi/pdf/10.1145/3394171.3413829)
- Disentangled non-local neural networks(ECCV2020) [pdf](https://arxiv.org/pdf/2006.06668) 
- Relation-aware global attention for person re-identification(CVPR2020) [pdf](https://arxiv.org/pdf/1904.02998)
- Segmentation transformer: Object-contextual representations for semantic segmentation(ECCV2020) [pdf](https://arxiv.org/pdf/1909.11065)   🔥 
- Spatial pyramid based graph reasoning for semantic segmentation(CVPR2020) [pdf](https://arxiv.org/pdf/2003.10211)
- Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation(CVPR2020) [pdf](https://arxiv.org/pdf/2004.04581.pdf)
- End-to-end object detection with transformers(ECCV2020) [pdf](https://arxiv.org/pdf/2005.12872)   🔥 
- Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling(CVPR2020) [pdf](https://arxiv.org/pdf/2003.00492)
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers(CVPR2021) [pdf](https://arxiv.org/pdf/2012.15840)
- An image is worth 16x16 words: Transformers for image recognition at scale(ICLR2021) [pdf](https://arxiv.org/pdf/2010.11929)   🔥 
- Is Attention Better Than Matrix Decomposition? (ICLR2021) [pdf](https://arxiv.org/abs/2109.04553) 
- An empirical study of training selfsupervised vision transformers(CVPR2021) [pdf](https://arxiv.org/pdf/2104.02057)
- Ocnet: Object context network for scene parsing(IJCV 2021) [pdf](https://arxiv.org/pdf/1809.00916)   🔥 
- Point transformer(ICCV 2021) [pdf](https://arxiv.org/pdf/2012.09164)
- PCT: Point Cloud Transformer (CVMJ 2021) [pdf](https://arxiv.org/pdf/2012.09688.pdf)
- Pre-trained image processing transformer(CVPR 2021) [pdf](https://arxiv.org/pdf/2012.00364)
- An empirical study of training self-supervised vision transformers(ICCV 2021) [pdf](https://arxiv.org/pdf/2104.02057)
- Segformer: Simple and efficient design for semantic segmentation with transformers(arxiv 2021) [pdf](https://arxiv.org/pdf/2105.15203)
- Beit: Bert pre-training of image transformers(arxiv 2021) [pdf](https://arxiv.org/pdf/2106.08254)
- Beyond Self-attention: External attention using two linear layers for visual tasks(arxiv 2021) [pdf](https://arxiv.org/pdf/2105.02358)
- Query2label: A simple transformer way to multi-label classification(arxiv 2021) [pdf](https://arxiv.org/pdf/2107.10834)
- Transformer in transformer(arxiv 2021) [pdf](https://arxiv.org/pdf/2103.00112)

## Temporal attention 

- Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) [pdf](https://arxiv.org/pdf/1708.02286.pdf) 🔥
- Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding(CVPR 2018)  [pdf](https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/1036.pdf)
- Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) [pdf](https://arxiv.org/pdf/1807.05688.pdf) 

## Branch attention 

- Training very deep networks, (NeurIPS 2015) [pdf](https://arxiv.org/pdf/1507.06228.pdf) 🔥
- Selective kernel networks,(CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Selective_Kernel_Networks_CVPR_2019_paper.pdf) 🔥
- CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) [pdf](https://arxiv.org/pdf/1904.04971.pdf)
- Dynamic convolution: Attention over convolution kernels (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Dynamic_Convolution_Attention_Over_Convolution_Kernels_CVPR_2020_paper.pdf)
- ResNest: Split-attention networks  (arXiv 2020) [pdf](https://arxiv.org/pdf/2004.08955.pdf) 🔥

## ChannelSpatial attention

- Residual attention network for image classification (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_Residual_Attention_Network_CVPR_2017_paper.pdf) 🔥
- SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning,(CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf) 🔥
- CBAM: convolutional block attention module, (ECCV 2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf)  🔥
- Harmonious attention network for person re-identification (CVPR 2018) [pdf](https://arxiv.org/pdf/1802.08122.pdf) 🔥
- Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018)  [pdf](https://arxiv.org/pdf/1808.08127.pdf)
- Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) [pdf](https://www.ecva.net/papers/eccv_2018/papers_ECCV/papers/Cheng_Wang_Mancs_A_Multi-task_ECCV_2018_paper.pdf) 🔥
- Bam: Bottleneck attention module(BMVC 2018) [pdf](http://bmvc2018.org/contents/papers/0092.pdf) 🔥
- Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) [pdf](https://arxiv.org/pdf/1808.07659.pdf)  
- Learning what and where to attend,(ICLR 2019) [pdf](https://openreview.net/pdf?id=BJgLg3R9KQ) 
- Dual attention network for scene segmentation (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fu_Dual_Attention_Network_for_Scene_Segmentation_CVPR_2019_paper.pdf) 🔥
- Abd-net: Attentive but diverse person re-identification (ICCV 2019) [pdf](https://openaccess.thecvf.com/content_ICCV_2019/papers/Chen_ABD-Net_Attentive_but_Diverse_Person_Re-Identification_ICCV_2019_paper.pdf) 
- Mixed high-order attention network for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1908.05819.pdf)
- Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Xie_MLCVNet_Multi-Level_Context_VoteNet_for_3D_Object_Detection_CVPR_2020_paper.pdf) 
- Improving convolutional networks with self-calibrated convolutions (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_Improving_Convolutional_Networks_With_Self-Calibrated_Convolutions_CVPR_2020_paper.pdf) 
- Relation-aware global attention for person re-identification (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Relation-Aware_Global_Attention_for_Person_Re-Identification_CVPR_2020_paper.pdf) 
- Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hou_Strip_Pooling_Rethinking_Spatial_Pooling_for_Scene_Parsing_CVPR_2020_paper.pdf) 
- Rotate to attend: Convolutional triplet attention module, (WACV 2021) [pdf](https://arxiv.org/pdf/2010.03045.pdf)
- Coordinate attention for efficient mobile network design (CVPR 2021) [pdf](https://openaccess.thecvf.com/content/CVPR2021/papers/Hou_Coordinate_Attention_for_Efficient_Mobile_Network_Design_CVPR_2021_paper.pdf) 
-  Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) [pdf](http://proceedings.mlr.press/v139/yang21o/yang21o.pdf) 

## SpatialTemporal attention

- An end-to-end spatio-temporal attention model for human action recognition from skeleton data(AAAI 2017) [pdf](https://arxiv.org/pdf/1611.06067.pdf) 🔥
- Diversity regularized spatiotemporal attention for video-based person re-identification (ArXiv 2018) 🔥
- Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) [pdf](https://openaccess.thecvf.com/content_ICCVW_2019/papers/HVU/Meng_Interpretable_Spatio-Temporal_Attention_for_Video_Action_Recognition_ICCVW_2019_paper.pdf) 
- A Simple Baseline for Audio-Visual Scene-Aware Dialog (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.05876v1.pdf)
- Hierarchical lstms with adaptive attention for visual captioning, (TPAMI 2020) [pdf](https://arxiv.org/pdf/1812.11004.pdf)
- Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) [pdf_link](https://ieeexplore.ieee.org/abstract/document/8744407)
- Gta: Global temporal attention for video action understanding (ArXiv 2020) [pdf](https://arxiv.org/pdf/2012.08510.pdf)
- Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.12224.pdf)
- Read: Reciprocal attention discriminator for image-to-video re-identification, (ECCV 2020) [pdf](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590324.pdf)
- Decoupled spatial-temporal transformer for video inpainting (ArXiv 2021) [pdf](https://arxiv.org/pdf/2104.06637.pdf)
- Towards Coherent Visual Storytelling with Ordered Image Attention, (ArXiv 2021) [pdf](https://arxiv.org/pdf/2108.02180)