# Awesome-Vision-Attentions **Repository Path**: milixiang/Awesome-Vision-Attentions ## Basic Information - **Project Name**: Awesome-Vision-Attentions - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 1 - **Created**: 2021-11-27 - **Last Updated**: 2022-01-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey [paper](https://arxiv.org/abs/2111.07624) ## 介绍该论文的中文版博客 [链接](https://mp.weixin.qq.com/s/0iOZ45NTK9qSWJQlcI3_kQ ) ![image](https://github.com/MenghaoGuo/Awesome-Vision-Attentions/blob/main/imgs/fuse.png) - [Vision-Attention-Papers](#vision-attention-papers) * [Channel attention](#channel-attention) * [Spatial attention](#spatial-attention) * [Temporal attention](#temporal-attention) * [Branch attention](#branch-attention) * [Channel \& Spatial attention](#channelspatial-attention) * [Spatial \& Temporal attention](#spatialtemporal-attention) 🔥 (citations > 200) * TODO : Code about different attention mechanisms will come soon. * TODO : [Code]() link will come soon. * TODO : collect more related papers. Contributions are welcome. ## Channel attention * Squeeze-and-Excitation Networks(CVPR2018) [pdf](https://arxiv.org/pdf/1709.01507), (PAMI2019 version) [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8701503) 🔥 * Image superresolution using very deep residual channel attention networks(ECCV2018) [pdf](https://arxiv.org/pdf/1807.02758) 🔥 * Context encoding for semantic segmentation(CVPR2018) [pdf](https://arxiv.org/pdf/1803.08904) 🔥 * Spatio-temporal channel correlation networks for action classification(ECCV2018) [pdf](https://arxiv.org/pdf/1806.07754) * Global second-order pooling convolutional networks(CVPR2019) [pdf](https://arxiv.org/pdf/1811.12006) * Srm : A style-based recalibration module for convolutional neural networks(ICCV2019) [pdf](https://arxiv.org/pdf/1903.10829) * You look twice: Gaternet for dynamic filter selection in cnns(CVPR2019) [pdf](https://arxiv.org/pdf/1811.11205) * Second-order attention network for single image super-resolution(CVPR2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Dai_Second-Order_Attention_Network_for_Single_Image_Super-Resolution_CVPR_2019_paper.pdf) 🔥 * Spsequencenet: Semantic segmentation network on 4d point clouds(CVPR2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/html/Shi_SpSequenceNet_Semantic_Segmentation_Network_on_4D_Point_Clouds_CVPR_2020_paper.html) * Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR2020) [pdf](https://arxiv.org/pdf/1910.03151) 🔥 * Gated channel transformation for visual recognition(CVPR2020) [pdf](https://arxiv.org/pdf/1909.11519) * Fcanet: Frequency channel attention networks(ICCV2021) [pdf](https://arxiv.org/pdf/2012.11879) ## Spatial attention - Recurrent models of visual attention(NeurIPS2014), [pdf](https://arxiv.org/pdf/1406.6247) 🔥 - Show, attend and tell: Neural image caption generation with visual attention(PMLR2015) [pdf](https://arxiv.org/pdf/1502.03044) 🔥 - Draw: A recurrent neural network for image generation(ICML2015) [pdf](https://arxiv.org/pdf/1502.04623) 🔥 - Spatial transformer networks(NeurIPS2015) [pdf](https://arxiv.org/pdf/1506.02025) 🔥 - Multiple object recognition with visual attention(ICLR2015) [pdf](https://arxiv.org/pdf/1412.7755) 🔥 - Action recognition using visual attention(arXiv2015) [pdf](https://arxiv.org/pdf/1511.04119) 🔥 - Videolstm convolves, attends and flows for action recognition(arXiv2016) [pdf](https://arxiv.org/pdf/1607.01794) 🔥 - Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition(CVPR2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Fu_Look_Closer_to_CVPR_2017_paper.pdf) 🔥 - Learning multi-attention convolutional neural network for fine-grained image recognition(ICCV2017) [pdf](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zheng_Learning_Multi-Attention_Convolutional_ICCV_2017_paper.pdf) 🔥 - Diversified visual attention networks for fine-grained object classification(TMM2017) [pdf](https://arxiv.org/pdf/1606.08572) 🔥 - High-Order Attention Models for Visual Question Answering (NIPS2017) [pdf](https://arxiv.org/pdf/1711.04323) - Attentional pooling for action recognition(NeurIPS2017) [pdf](https://arxiv.org/pdf/1711.01467) 🔥 - Non-local neural networks(CVPR2018) [pdf](https://arxiv.org/pdf/1711.07971) 🔥 - Attentional shapecontextnet for point cloud recognition(CVPR2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Xie_Attentional_ShapeContextNet_for_CVPR_2018_paper.pdf) - Relation networks for object detection(CVPR2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Relation_Networks_for_CVPR_2018_paper.pdf) 🔥 - a2-nets: Double attention networks(NeurIPS2018) [pdf](https://arxiv.org/pdf/1810.11579) 🔥 - Attention-aware compositional network for person re-identification(CVPR2018) [pdf](https://arxiv.org/pdf/1805.03344) 🔥 - Tell me where to look: Guided attention inference network(CVPR2018) [pdf](https://arxiv.org/pdf/1802.10171) 🔥 - Pedestrian alignment network for large-scale person re-identification(TCSVT2018) [pdf](https://arxiv.org/pdf/1707.00408) 🔥 - Learn to pay attention(ICLR2018) [pdf](https://arxiv.org/pdf/1804.02391.pdf) 🔥 - Attention U-Net: Learning Where to Look for the Pancreas(MIDL2018) [pdf](https://arxiv.org/pdf/1804.03999.pdf) 🔥 - Psanet: Point-wise spatial attention network for scene parsing(ECCV2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/html/Hengshuang_Zhao_PSANet_Point-wise_Spatial_ECCV_2018_paper.html) 🔥 - Self attention generative adversarial networks(ICML2019) [pdf](https://arxiv.org/pdf/1805.08318) 🔥 - Attentional pointnet for 3d-object detection in point clouds(CVPRW2019) [pdf](https://openaccess.thecvf.com/content_CVPRW_2019/papers/WAD/Paigwar_Attentional_PointNet_for_3D-Object_Detection_in_Point_Clouds_CVPRW_2019_paper.pdf) - Co-occurrent features in semantic segmentation(CVPR2019) [pdf](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Co-Occurrent_Features_in_Semantic_Segmentation_CVPR_2019_paper.pdf) - Factor Graph Attention(CVPR2019) [pdf](https://arxiv.org/pdf/1904.05880) - Attention augmented convolutional networks(ICCV2019) [pdf](https://arxiv.org/pdf/1904.09925) 🔥 - Local relation networks for image recognition(ICCV2019) [pdf](https://arxiv.org/pdf/1904.11491) - Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML2019) [pdf](https://arxiv.org/pdf/1905.11634) - Graph-based global reasoning networks(CVPR2019) [pdf](https://arxiv.org/pdf/1811.12814) 🔥 - Gcnet: Non-local networks meet squeeze-excitation networks and beyond(ICCVW2019) [pdf](https://arxiv.org/pdf/1904.11492) 🔥 - Asymmetric non-local neural networks for semantic segmentation(ICCV2019) [pdf](https://arxiv.org/pdf/1908.07678) 🔥 - Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition(CVPR2019) [pdf](https://arxiv.org/pdf/1903.06150) - Second-order non-local attention networks for person re-identification(ICCV2019) [pdf](https://arxiv.org/pdf/1909.00295) 🔥 - End-to-end comparative attention networks for person re-identification(ICCV2019) [pdf](https://arxiv.org/pdf/1606.04404) 🔥 - Modeling point clouds with self-attention and gumbel subset sampling(CVPR2019) [pdf](https://arxiv.org/pdf/1904.03375) - Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification(arXiv 2019) [pdf](https://arxiv.org/pdf/1801.09927) - L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention(arXiv 2019) [pdf](https://arxiv.org/pdf/1908.00720) - Generative pretraining from pixels(PMLR2020) [pdf](https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf) - Exploring self-attention for image recognition(CVPR2020) [pdf](https://arxiv.org/pdf/2004.13621) - Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention(MM20) [pdf](https://dl.acm.org/doi/pdf/10.1145/3394171.3413829) - Disentangled non-local neural networks(ECCV2020) [pdf](https://arxiv.org/pdf/2006.06668) - Relation-aware global attention for person re-identification(CVPR2020) [pdf](https://arxiv.org/pdf/1904.02998) - Segmentation transformer: Object-contextual representations for semantic segmentation(ECCV2020) [pdf](https://arxiv.org/pdf/1909.11065) 🔥 - Spatial pyramid based graph reasoning for semantic segmentation(CVPR2020) [pdf](https://arxiv.org/pdf/2003.10211) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation(CVPR2020) [pdf](https://arxiv.org/pdf/2004.04581.pdf) - End-to-end object detection with transformers(ECCV2020) [pdf](https://arxiv.org/pdf/2005.12872) 🔥 - Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling(CVPR2020) [pdf](https://arxiv.org/pdf/2003.00492) - Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers(CVPR2021) [pdf](https://arxiv.org/pdf/2012.15840) - An image is worth 16x16 words: Transformers for image recognition at scale(ICLR2021) [pdf](https://arxiv.org/pdf/2010.11929) 🔥 - Is Attention Better Than Matrix Decomposition? (ICLR2021) [pdf](https://arxiv.org/abs/2109.04553) - An empirical study of training selfsupervised vision transformers(CVPR2021) [pdf](https://arxiv.org/pdf/2104.02057) - Ocnet: Object context network for scene parsing(IJCV 2021) [pdf](https://arxiv.org/pdf/1809.00916) 🔥 - Point transformer(ICCV 2021) [pdf](https://arxiv.org/pdf/2012.09164) - PCT: Point Cloud Transformer (CVMJ 2021) [pdf](https://arxiv.org/pdf/2012.09688.pdf) - Pre-trained image processing transformer(CVPR 2021) [pdf](https://arxiv.org/pdf/2012.00364) - An empirical study of training self-supervised vision transformers(ICCV 2021) [pdf](https://arxiv.org/pdf/2104.02057) - Segformer: Simple and efficient design for semantic segmentation with transformers(arxiv 2021) [pdf](https://arxiv.org/pdf/2105.15203) - Beit: Bert pre-training of image transformers(arxiv 2021) [pdf](https://arxiv.org/pdf/2106.08254) - Beyond Self-attention: External attention using two linear layers for visual tasks(arxiv 2021) [pdf](https://arxiv.org/pdf/2105.02358) - Query2label: A simple transformer way to multi-label classification(arxiv 2021) [pdf](https://arxiv.org/pdf/2107.10834) - Transformer in transformer(arxiv 2021) [pdf](https://arxiv.org/pdf/2103.00112) ## Temporal attention - Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) [pdf](https://arxiv.org/pdf/1708.02286.pdf) 🔥 - Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding(CVPR 2018) [pdf](https://openaccess.thecvf.com/content_cvpr_2018/CameraReady/1036.pdf) - Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) [pdf](https://arxiv.org/pdf/1807.05688.pdf) ## Branch attention - Training very deep networks, (NeurIPS 2015) [pdf](https://arxiv.org/pdf/1507.06228.pdf) 🔥 - Selective kernel networks,(CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Selective_Kernel_Networks_CVPR_2019_paper.pdf) 🔥 - CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) [pdf](https://arxiv.org/pdf/1904.04971.pdf) - Dynamic convolution: Attention over convolution kernels (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Dynamic_Convolution_Attention_Over_Convolution_Kernels_CVPR_2020_paper.pdf) - ResNest: Split-attention networks (arXiv 2020) [pdf](https://arxiv.org/pdf/2004.08955.pdf) 🔥 ## ChannelSpatial attention - Residual attention network for image classification (CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_Residual_Attention_Network_CVPR_2017_paper.pdf) 🔥 - SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning,(CVPR 2017) [pdf](https://openaccess.thecvf.com/content_cvpr_2017/papers/Chen_SCA-CNN_Spatial_and_CVPR_2017_paper.pdf) 🔥 - CBAM: convolutional block attention module, (ECCV 2018) [pdf](https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf) 🔥 - Harmonious attention network for person re-identification (CVPR 2018) [pdf](https://arxiv.org/pdf/1802.08122.pdf) 🔥 - Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018) [pdf](https://arxiv.org/pdf/1808.08127.pdf) - Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) [pdf](https://www.ecva.net/papers/eccv_2018/papers_ECCV/papers/Cheng_Wang_Mancs_A_Multi-task_ECCV_2018_paper.pdf) 🔥 - Bam: Bottleneck attention module(BMVC 2018) [pdf](http://bmvc2018.org/contents/papers/0092.pdf) 🔥 - Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) [pdf](https://arxiv.org/pdf/1808.07659.pdf) - Learning what and where to attend,(ICLR 2019) [pdf](https://openreview.net/pdf?id=BJgLg3R9KQ) - Dual attention network for scene segmentation (CVPR 2019) [pdf](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fu_Dual_Attention_Network_for_Scene_Segmentation_CVPR_2019_paper.pdf) 🔥 - Abd-net: Attentive but diverse person re-identification (ICCV 2019) [pdf](https://openaccess.thecvf.com/content_ICCV_2019/papers/Chen_ABD-Net_Attentive_but_Diverse_Person_Re-Identification_ICCV_2019_paper.pdf) - Mixed high-order attention network for person re-identification (ICCV 2019) [pdf](https://arxiv.org/pdf/1908.05819.pdf) - Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Xie_MLCVNet_Multi-Level_Context_VoteNet_for_3D_Object_Detection_CVPR_2020_paper.pdf) - Improving convolutional networks with self-calibrated convolutions (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Liu_Improving_Convolutional_Networks_With_Self-Calibrated_Convolutions_CVPR_2020_paper.pdf) - Relation-aware global attention for person re-identification (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Relation-Aware_Global_Attention_for_Person_Re-Identification_CVPR_2020_paper.pdf) - Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) [pdf](https://openaccess.thecvf.com/content_CVPR_2020/papers/Hou_Strip_Pooling_Rethinking_Spatial_Pooling_for_Scene_Parsing_CVPR_2020_paper.pdf) - Rotate to attend: Convolutional triplet attention module, (WACV 2021) [pdf](https://arxiv.org/pdf/2010.03045.pdf) - Coordinate attention for efficient mobile network design (CVPR 2021) [pdf](https://openaccess.thecvf.com/content/CVPR2021/papers/Hou_Coordinate_Attention_for_Efficient_Mobile_Network_Design_CVPR_2021_paper.pdf) - Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) [pdf](http://proceedings.mlr.press/v139/yang21o/yang21o.pdf) ## SpatialTemporal attention - An end-to-end spatio-temporal attention model for human action recognition from skeleton data(AAAI 2017) [pdf](https://arxiv.org/pdf/1611.06067.pdf) 🔥 - Diversity regularized spatiotemporal attention for video-based person re-identification (ArXiv 2018) 🔥 - Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) [pdf](https://openaccess.thecvf.com/content_ICCVW_2019/papers/HVU/Meng_Interpretable_Spatio-Temporal_Attention_for_Video_Action_Recognition_ICCVW_2019_paper.pdf) - A Simple Baseline for Audio-Visual Scene-Aware Dialog (CVPR 2019) [pdf](https://arxiv.org/pdf/1904.05876v1.pdf) - Hierarchical lstms with adaptive attention for visual captioning, (TPAMI 2020) [pdf](https://arxiv.org/pdf/1812.11004.pdf) - Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) [pdf_link](https://ieeexplore.ieee.org/abstract/document/8744407) - Gta: Global temporal attention for video action understanding (ArXiv 2020) [pdf](https://arxiv.org/pdf/2012.08510.pdf) - Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) [pdf](https://arxiv.org/pdf/2003.12224.pdf) - Read: Reciprocal attention discriminator for image-to-video re-identification, (ECCV 2020) [pdf](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590324.pdf) - Decoupled spatial-temporal transformer for video inpainting (ArXiv 2021) [pdf](https://arxiv.org/pdf/2104.06637.pdf) - Towards Coherent Visual Storytelling with Ordered Image Attention, (ArXiv 2021) [pdf](https://arxiv.org/pdf/2108.02180)