# Awesome-LLM

**Repository Path**: knifecms/awesome-llm

## Basic Information

- **Project Name**: Awesome-LLM
- **Description**: 精彩的大模型研究
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2025-06-05
- **Last Updated**: 2025-06-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 大语言模型(LLM)重大进展 🚀

[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)

> 精选的大语言模型(LLM)重大进展列表，重点关注塑造该领域的重要算法改进、架构创新和训练技术。

## 📖 目录

- [基础模型](#-基础模型)
- [架构创新](#-架构创新)
- [训练技术](#-训练技术)
- [效率提升](#-效率提升)
- [推理与上下文学习](#-推理与上下文学习)
- [多模态模型](#-多模态模型)
- [评估与对齐](#-评估与对齐)
- [开源模型](#-开源模型)
- [工具与框架](#-工具与框架)
- [教程与资源](#-教程与资源)
- [社区与活动](#-社区与活动)
- [贡献指南](#-贡献指南)

## 🏗️ 基础模型

### Transformer时代
- **Attention Is All You Need** (2017) - [论文](https://arxiv.org/abs/1706.03762) | [代码](https://github.com/tensorflow/tensor2tensor)
  - Vaswani等人提出的Transformer架构，是现代LLM的基础。

- **GPT** (2018) - [论文](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | [代码](https://github.com/openai/finetune-transformer-lm)
  - OpenAI推出的第一代生成式预训练Transformer模型。

- **GPT-2** (2019) - [论文](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | [代码](https://github.com/openai/gpt-2)
  - 参数量达15亿，展示了大规模语言模型的强大能力。

- **GPT-3** (2020) - [论文](https://arxiv.org/abs/2005.14165) | [API](https://openai.com/api/)
  - 参数量达1750亿，展示了少样本学习能力。

- **BERT** (2018) - [论文](https://arxiv.org/abs/1810.04805) | [代码](https://github.com/google-research/bert)
  - 谷歌AI提出的双向编码器表示模型，开创了预训练-微调范式。

- **T5** (2019) - [论文](https://arxiv.org/abs/1910.10683) | [代码](https://github.com/google-research/text-to-text-transfer-transformer)
  - 统一的文本到文本转换框架，将各种NLP任务统一为文本生成任务。

- **PaLM** (2022) - [论文](https://arxiv.org/abs/2204.02311) | [博客](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)
  - 谷歌的Pathways语言模型，参数量达5400亿。

- **DeepSeek** (2023) - [官网](https://deepseek.com/) | [GitHub](https://github.com/deepseek-ai)
  - 深度求索团队开发的开源大模型，包括DeepSeek-Coder和DeepSeek-LLM系列。
  - 在代码生成和数学推理方面表现优异，支持128K上下文长度。

## 🏛️ 架构创新

### 注意力机制
- **Sparse Transformers** (2019) - [论文](https://arxiv.org/abs/1904.10509) | [代码](https://github.com/openai/sparse_attention)
  - 引入稀疏注意力模式，支持处理更长的序列。

- **Reformer** (2020) - [论文](https://arxiv.org/abs/2001.04451) | [代码](https://github.com/google/trax/tree/master/trax/models/reformer)
  - 使用局部敏感哈希实现高效注意力机制。

- **Retro-Transformer** (2021) - [论文](https://arxiv.org/abs/2112.04426)
  - 引入检索增强生成(RAG)架构，结合参数化知识和非参数化检索。

- **RetNet** (2023) - [论文](https://arxiv.org/abs/2307.08621) | [代码](https://github.com/microsoft/unilm/tree/master/retnet)
  - 微软提出的新型架构，同时支持并行训练和循环推理。

### 混合专家模型
- **Switch Transformers** (2021) - [论文](https://arxiv.org/abs/2101.03961) | [代码](https://github.com/google/flax/tree/main/flax/linen/experimental)
  - 稀疏混合专家模型，实现更高效的模型扩展。

- **Mixtral 8x7B** (2023) - [博客](https://mistral.ai/news/announcing-mistral-7b/) | [模型](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
  - Mistral AI推出的稀疏混合专家模型，性能接近70B参数的密集模型。

## 🎓 训练技术

### 扩展定律
- **Scaling Laws for Neural Language Models** (2020) - [论文](https://arxiv.org/abs/2001.08361)
  - 研究模型性能如何随规模和计算资源扩展的实证研究。

### 指令微调
- **FLAN** (2021) - [论文](https://arxiv.org/abs/2109.01652)
  - 通过指令微调提升零样本和小样本学习能力。

## ⚡ 效率提升

### 模型压缩
- **知识蒸馏** (2015) - [论文](https://arxiv.org/abs/1503.02531)
  - 训练小模型来模拟大模型的行为。

### 高效注意力
- **Linformer** (2020) - [论文](https://arxiv.org/abs/2006.04768) | [代码](https://github.com/tatp22/linformer-pytorch)
  - 线性复杂度的自注意力机制。

## 🧠 推理与上下文学习

### 思维链
- **Chain-of-Thought Prompting** (2022) - [论文](https://arxiv.org/abs/2201.11903)
  - 使模型能够进行多步推理。

## 🖼️ 多模态模型

### 视觉-语言模型
- **CLIP** (2021) - [论文](https://arxiv.org/abs/2103.00020) | [代码](https://github.com/openai/CLIP)
  - 对比语言-图像预训练模型。

## 📊 评估与对齐

### 对齐
- **InstructGPT** (2022) - [论文](https://arxiv.org/abs/2203.02155)
  - 使用人类反馈来对齐语言模型以遵循指令。

## 🐱‍💻 开源模型

### 国际开源模型
- **LLaMA** (2023) - [论文](https://arxiv.org/abs/2302.13971) | [代码](https://github.com/facebookresearch/llama)
  - Meta发布的基础大语言模型，参数量从7B到65B不等。

- **LLaMA 2** (2023) - [论文](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | [下载](https://ai.meta.com/llama/)
  - LLaMA的升级版，包含7B、13B和70B参数版本，支持对话任务。

- **Mistral 7B** (2023) - [博客](https://mistral.ai/news/announcing-mistral-7b/) | [模型](https://huggingface.co/mistralai/Mistral-7B-v0.1)
  - 性能超越LLaMA 2 13B的7B模型，采用分组查询注意力。

### 中文开源模型
- **ChatGLM** (2023) - [GitHub](https://github.com/THUDM/ChatGLM-6B) | [模型](https://huggingface.co/THUDM/chatglm-6b)
  - 智谱AI开发的中英双语对话模型，支持中英双语问答。

- **DeepSeek LLM** (2023) - [GitHub](https://github.com/deepseek-ai/DeepSeek-LLM) | [模型](https://huggingface.co/deepseek-ai)
  - 深度求索开源的67B参数大模型，擅长中英双语任务。
  - 支持128K长文本处理，在多个基准测试中表现优异。

- **Qwen (通义千问)** (2023) - [GitHub](https://github.com/QwenLM/Qwen) | [模型](https://huggingface.co/Qwen)
  - 阿里巴巴开源的70B参数大模型，支持中英双语。

- **Baichuan 2** (2023) - [GitHub](https://github.com/baichuan-inc/Baichuan2) | [模型](https://huggingface.co/baichuan-inc)
  - 百川智能开源的70B参数大模型，包含基础版和对话版。

## 🛠️ 工具与框架

### 训练
- **DeepSpeed** - [GitHub](https://github.com/microsoft/DeepSpeed)
  - 用于超大规模模型训练的深度学习优化库。

## 📚 教程与资源

### 中文大模型
- **文心一言** - [官网](https://yiyan.baidu.com/)
  - 百度开发的大语言模型，支持多轮对话和内容创作

- **通义千问** - [官网](https://tongyi.aliyun.com/)
  - 阿里巴巴开发的大语言模型，具备强大的多轮对话能力

- **讯飞星火** - [官网](https://xinghuo.xfyun.cn/)
  - 科大讯飞推出的认知智能大模型

### 中文教程
- **李沐《动手学深度学习》** - [GitHub](https://zh.d2l.ai/)
  - 深度学习经典教材，包含PyTorch和MXNet实现

- **《神经网络与深度学习》** - [GitHub](https://nndl.github.io/)
  - 邱锡鹏教授编写的深度学习教材

- **Datawhale 学习资料** - [GitHub](https://github.com/datawhalechina/leedl-tutorials)
  - 包含深度学习、强化学习等方向的优质教程

### 中文博客
- **李沐的博客** - [链接](https://zh.d2l.ai/chapter_appendix-tools-for-deep-learning/blogs.html)
  - 深度学习前沿技术解析

- **张俊林博士的博客** - [知乎](https://www.zhihu.com/people/zhang-jun-lin-76)
  - 大模型技术深度解析

### 在线课程
- **斯坦福CS324** - [网站](https://stanford-cs324.github.io/winter2022/)
  - 大语言模型：基础与应用（英文）

- **李宏毅深度学习** - [B站](https://space.bilibili.com/511221970)
  - 台湾大学李宏毅教授的深度学习课程（中文）

## 🌐 社区与活动

### 会议
- **ACL, EMNLP, ICLR, NeurIPS, ICML**

## 🤝 贡献指南

欢迎贡献！请先阅读[贡献指南](CONTRIBUTING.md)。

## 📜 许可证

本项目采用MIT许可证 - 详见[LICENSE](LICENSE)文件。


## 📰 最新论文 (更新于: 2025年06月08日)

*注意：本部分内容由脚本自动生成，包含最近一周内arXiv上发布的大语言模型相关论文。*

### 架构

- **Inference-Time Hyper-Scaling with KV Cache Compression**  
  *Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05345v1) | [摘要](http://arxiv.org/abs/2506.05345v1)

- **Exploring Diffusion Transformer Designs via Grafting**  
  *Keshigeyan Chandrasegaran, Michael Poli, Daniel Y. Fu 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05340v1) | [摘要](http://arxiv.org/abs/2506.05340v1)
  *22 pages; Project website: https://grafting.stanford.edu*

- **Kinetics: Rethinking Test-Time Scaling Laws**  
  *Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05333v1) | [摘要](http://arxiv.org/abs/2506.05333v1)

- **Unleashing Hour-Scale Video Training for Long Video-Language
  Understanding**  
  *Jingyang Lin, Jialian Wu, Ximeng Sun 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05332v1) | [摘要](http://arxiv.org/abs/2506.05332v1)
  *Project page: https://videomarathon.github.io/*

- **Generalizable, real-time neural decoding with hybrid state-space models**  
  *Avery Hee-Woon Ryoo, Nanda H. Krishna, Ximeng Mao 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05320v1) | [摘要](http://arxiv.org/abs/2506.05320v1)
  *Preprint. Under review*

- **Improving Data Efficiency for LLM Reinforcement Fine-tuning Through
  Difficulty-targeted Online Data Selection and Rollout Replay**  
  *Yifan Sun, Jingyan Shen, Yibin Wang 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05316v1) | [摘要](http://arxiv.org/abs/2506.05316v1)

- **Power Law Guided Dynamic Sifting for Efficient Attention**  
  *Nirav Koley, Prajwal Singhania, Abhinav Bhatele*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05300v1) | [摘要](http://arxiv.org/abs/2506.05300v1)

- **Sample Complexity and Representation Ability of Test-time Scaling
  Paradigms**  
  *Baihe Huang, Shanda Li, Tianhao Wu 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05295v1) | [摘要](http://arxiv.org/abs/2506.05295v1)

### 训练

- **Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity
  Analysis Between Alignment and Fine-tuning Datasets**  
  *Lei Hsiung, Tianyu Pang, Yung-Chen Tang 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05346v1) | [摘要](http://arxiv.org/abs/2506.05346v1)
  *Project Page: https://hsiung.cc/llm-similarity-risk/*

- **Refer to Anything with Vision-Language Prompts**  
  *Shengcao Cao, Zijun Wei, Jason Kuen 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05342v1) | [摘要](http://arxiv.org/abs/2506.05342v1)

- **Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases
  in Preference Models**  
  *Anirudh Bharadwaj, Chaitanya Malaviya, Nitish Joshi 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05339v1) | [摘要](http://arxiv.org/abs/2506.05339v1)
  *Code and data available at
  https://github.com/anirudhb123/preference-model-biases*

- **LSM-2: Learning from Incomplete Wearable Sensor Data**  
  *Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05321v1) | [摘要](http://arxiv.org/abs/2506.05321v1)
  *Xu and Narayanswamy are co-first authors. McDuff and Liu are co-last
  authors*

- **Constrained Entropic Unlearning: A Primal-Dual Framework for Large
  Language Models**  
  *Taha Entesari, Arman Hatami, Rinat Khaziev 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05314v1) | [摘要](http://arxiv.org/abs/2506.05314v1)

- **Learning normalized image densities via dual score matching**  
  *Florentin Guth, Zahra Kadkhodaie, Eero P Simoncelli*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05310v1) | [摘要](http://arxiv.org/abs/2506.05310v1)

- **ProRefine: Inference-time Prompt Refinement with Textual Feedback**  
  *Deepak Pandita, Tharindu Cyril Weerasooriya, Ankit Parag Shah 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05305v1) | [摘要](http://arxiv.org/abs/2506.05305v1)

### 推理

- **Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via
  Spatial Reasoning**  
  *Xingjian Ran, Yixuan Li, Linning Xu 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05341v1) | [摘要](http://arxiv.org/abs/2506.05341v1)
  *Project Page: https://directlayout.github.io/*

### 对齐与安全

- **Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via
  Latent Alignment**  
  *Yingshuai Ji, Haomin Zhuang, Matthew Toole 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05325v1) | [摘要](http://arxiv.org/abs/2506.05325v1)

- **Control Tax: The Price of Keeping AI in Check**  
  *Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05296v1) | [摘要](http://arxiv.org/abs/2506.05296v1)

### 应用

- **Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia
  Games**  
  *Niv Eckhaus, Uri Berger, Gabriel Stanovsky*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05309v1) | [摘要](http://arxiv.org/abs/2506.05309v1)

### 其他

- **Search Arena: Analyzing Search-Augmented LLMs**  
  *Mihran Miroyan, Tsung-Han Wu, Logan King 等*  
  📅 2025年06月05日 | [论文](https://arxiv.org/pdf/2506.05334v1) | [摘要](http://arxiv.org/abs/2506.05334v1)
  *Preprint. Code: https://github.com/lmarena/search-arena. Dataset:
  https://huggingface.co/datasets/lmarena-ai/search-arena-24k*