# Multimodal-Sentiment-Analysis **Repository Path**: httaowjqwfo/Multimodal-Sentiment-Analysis ## Basic Information - **Project Name**: Multimodal-Sentiment-Analysis - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 1 - **Created**: 2025-03-14 - **Last Updated**: 2025-04-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Multimodal-Sentiment-Analysis 多模态情感分析——基于BERT+ResNet50的多种融合方法,数据学院人工智能课程第五次实验代码 本项目基于Hugging Face和torchvision实现,共有五种融合方法(2Naive 3Attention),在Models文件夹中查看 ## Project Structure ``` |-- Multimodal-Sentiment-Analysis |-- Config.py |-- main.py |-- README.md |-- requirements.txt |-- Trainer.py |-- data | |-- .DS_Store | |-- test.json | |-- test_without_label.txt | |-- train.json | |-- train.txt | |-- data |-- Models | |-- CMACModel.py | |-- HSTECModel.py | |-- NaiveCatModel.py | |-- NaiveCombineModel.py | |-- OTEModel.py | |-- __init__.py |-- src | |-- CrossModalityAttentionCombineModel.png | |-- HiddenStateTransformerEncoderCombineModel.png | |-- OutputTransformerEncoderModel.png |-- utils |-- common.py |-- DataProcess.py |-- __init__.py |-- APIs | |-- APIDataset.py | |-- APIDecode.py | |-- APIEncode.py | |-- APIMetric.py | |-- __init__.py ``` ## Requirements chardet==4.0.0 numpy==1.22.2 Pillow==9.2.0 scikit_learn==1.1.1 torch==1.8.2 torchvision==0.9.2 tqdm==4.63.0 transformers==4.18.0 ```shell pip install -r requirements.txt ``` ## Model 两个Naive方法就不展示了 **CrossModalityAttentionCombine** ![CrossModalityAttentionCombineModel](./src/CrossModalityAttentionCombineModel.png) **HiddenStateTransformerEncoder** ![HiddenStateTransformerEncoderCombineModel](./src/HiddenStateTransformerEncoderCombineModel.png) **OutputTransformerEncoder** ![OutputTransformerEncoderModel](./src/OutputTransformerEncoderModel.png) ## Train 需下载数据集,并放在data文件夹中解压,数据集地址:链接: https://pan.baidu.com/s/10fOExXqSCS4NmIjfsfuo9w?pwd=gqzm 提取码: gqzm 复制这段内容后打开百度网盘手机App,操作更方便哦 ```shell python main.py --do_train --epoch 10 --text_pretrained_model roberta-base --fuse_model_type OTE 单模态(--text_only --img_only) ``` fuse_model_type可选:CMAC、HSTEC、OTE、NaiveCat、NaiveCombine text_pretrain_model可在Hugging Face上选择合适的 ## Test ```shell python main.py --do_test --text_pretrained_model roberta-base --fuse_model_type OTE --load_model_path $your_model_path$ 单模态(--text_only --img_only) ``` ## Config ```python class config: # 根目录 root_path = os.getcwd() data_dir = os.path.join(root_path, './data/data/') train_data_path = os.path.join(root_path, 'data/train.json') test_data_path = os.path.join(root_path, 'data/test.json') output_path = os.path.join(root_path, 'output') output_test_path = os.path.join(output_path, 'test.txt') load_model_path = None # 一般超参 epoch = 20 learning_rate = 3e-5 weight_decay = 0 num_labels = 3 loss_weight = [1.68, 9.3, 3.36] # Fuse相关 fuse_model_type = 'NaiveCombine' only = None middle_hidden_size = 64 attention_nhead = 8 attention_dropout = 0.4 fuse_dropout = 0.5 out_hidden_size = 128 # BERT相关 fixed_text_model_params = False bert_name = 'roberta-base' bert_learning_rate = 5e-6 bert_dropout = 0.2 # ResNet相关 fixed_img_model_params = False image_size = 224 fixed_image_model_params = True resnet_learning_rate = 5e-6 resnet_dropout = 0.2 img_hidden_seq = 64 # Dataloader params checkout_params = {'batch_size': 4, 'shuffle': False} train_params = {'batch_size': 16, 'shuffle': True, 'num_workers': 2} val_params = {'batch_size': 16, 'shuffle': False, 'num_workers': 2} test_params = {'batch_size': 8, 'shuffle': False, 'num_workers': 2} ``` ## Result | Model | Acc | | ----------------------------- | ---------- | | NaiveCat | 71.25 | | NaiveCombine | 73.625 | | CrossModalityAttentionCombine | 67.1875 | | HiddenStateTransformerEncoder | 73.125 | | **OutputTransformerEncoder** | **74.625** | #### 消融实验 OutputTransformerEncoderModel Result:(另一模态输入文本为空字符串或空白图片) | Feature | Acc | | ---------- | ------ | | Text Only | 71.875 | | Image Only | 63 | ## Reference Joint Fine-Tuning for Multimodal Sentiment Analysis:[guitld/Transfer-Learning-with-Joint-Fine-Tuning-for-Multimodal-Sentiment-Analysis: This is the code for the Paper "Guilherme L. Toledo, Ricardo M. Marcacini: Transfer Learning with Joint Fine-Tuning for Multimodal Sentiment Analysis (LXAI Research Workshop at ICML 2022)". (github.com)](https://github.com/guitld/Transfer-Learning-with-Joint-Fine-Tuning-for-Multimodal-Sentiment-Analysis) Is cross-attention preferable to self-attention for multi-modal emotion recognition:[smartcameras/SelfCrossAttn: PyTorch implementation of the models described in the IEEE ICASSP 2022 paper "Is cross-attention preferable to self-attention for multi-modal emotion recognition?" (github.com)](https://github.com/smartcameras/SelfCrossAttn) Multimodal_Sentiment_Analysis_With_Image-Text_Interaction_Network:[Multimodal Sentiment Analysis With Image-Text Interaction Network | IEEE Journals & Magazine | IEEE Xplore](https://ieeexplore.ieee.org/abstract/document/9736584/) CLMLF:[Link-Li/CLMLF (github.com)](https://github.com/Link-Li/CLMLF)