# zerosearch
**Repository Path**: mirrors/zerosearch
## Basic Information
- **Project Name**: zerosearch
- **Description**: ZeroSearch 是一种新颖的强化学习框架,它无需与真实的搜索引擎交互即可激励 LLM 的搜索能力
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: https://www.oschina.net/p/zerosearch
- **GVP Project**: No
## Statistics
- **Stars**: 3
- **Forks**: 2
- **Created**: 2025-05-09
- **Last Updated**: 2025-12-06
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou
Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, Jingren Zhou
Tongyi Lab
, Alibaba Group
# 🔥 News
- **[2025.06.08]** Released the [simulation LLMs](https://huggingface.co/collections/sunhaonlp/simulation-llm-wiki-v2-6857b06122425526d82a42d4) and [policy models](https://huggingface.co/collections/sunhaonlp/zerosearch-policy-wiki-v2-68442dce61d2e68f6623e500) compatible with Wikipedia Search.
- **[2025.05.17]** Released the [simulation LLMs](https://huggingface.co/collections/sunhaonlp/simulation-llm-google-v2-6827f4e45bca955ed2b2d0ba) and [policy models](https://huggingface.co/collections/sunhaonlp/zerosearch-policy-google-v2-6827f4ee6b6265069d443d4e) compatible with Google Search.
- **[2025.05.17]** Released the [simulation tuning dataset](https://huggingface.co/datasets/sunhaonlp/SimulationTuning_dataset).
- **[2025.05.17]** Added support for three RL algorithms: REINFORCE, GPRO, and PPO.
- **[2025.05.08]** Released the initial codebase and paper.
# 🤗 Resources
| Retriever | Simulation Tuning Dataset | Simulation LLMs | Policy Models |
| --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Wikipedia | •[SimulationTuning\_wiki\_dataset](https://huggingface.co/datasets/sunhaonlp/SimulationTuning_wiki_dataset) | •[Simulation\_LLM\_wiki\_3B\_V2](https://huggingface.co/sunhaonlp/Simulation_LLM_wiki_3B_V2)
•[Simulation\_LLM\_wiki\_7B\_V2](https://huggingface.co/sunhaonlp/Simulation_LLM_wiki_7B_V2)
•[Simulation\_LLM\_wiki\_14B\_V2](https://huggingface.co/sunhaonlp/Simulation_LLM_wiki_14B_V2) | •[ZeroSearch\_wiki\_V2\_Qwen2.5\_3B](https://huggingface.co/Alibaba-NLP/ZeroSearch_wiki_V2_Qwen2.5_3B)
•[ZeroSearch\_wiki\_V2\_Qwen2.5\_3B\_Instruct](https://huggingface.co/Alibaba-NLP/ZeroSearch_wiki_V2_Qwen2.5_3B_Instruct)
•[ZeroSearch\_wiki\_V2\_Llama\_3.2\_3B](https://huggingface.co/Alibaba-NLP/ZeroSearch_wiki_V2_Llama_3.2_3B)
•[ZeroSearch\_wiki\_V2\_Llama\_3.2\_3B\_Instruct](https://huggingface.co/Alibaba-NLP/ZeroSearch_wiki_V2_Llama_3.2_3B_Instruct)
•[ZeroSearch\_wiki\_V2\_Qwen2.5\_7B](https://huggingface.co/Alibaba-NLP/ZeroSearch_wiki_V2_Qwen2.5_7B)
•[ZeroSearch\_wiki\_V2\_Qwen2.5\_7B\_Instruct](https://huggingface.co/Alibaba-NLP/ZeroSearch_wiki_V2_Qwen2.5_7B_Instruct) |
| Google | •[SimulationTuning\_google\_dataset](https://huggingface.co/datasets/sunhaonlp/SimulationTuning_google_dataset) | •[Simulation\_LLM\_google\_3B\_V2](https://huggingface.co/sunhaonlp/Simulation_LLM_google_3B_V2)
•[Simulation\_LLM\_google\_7B\_V2](https://huggingface.co/sunhaonlp/Simulation_LLM_google_7B_V2)
•[Simulation\_LLM\_google\_14B\_V2](https://huggingface.co/sunhaonlp/Simulation_LLM_google_14B_V2) | •[ZeroSearch\_google\_V2\_Qwen2.5\_3B](https://huggingface.co/Alibaba-NLP/ZeroSearch_google_V2_Qwen2.5_3B)
•[ZeroSearch\_google\_V2\_Qwen2.5\_3B\_Instruct](https://huggingface.co/Alibaba-NLP/ZeroSearch_google_V2_Qwen2.5_3B_Instruct)
•[ZeroSearch\_google\_V2\_Llama\_3.2\_3B](https://huggingface.co/Alibaba-NLP/ZeroSearch_google_V2_Llama_3.2_3B)
•[ZeroSearch\_google\_V2\_Llama\_3.2\_3B\_Instruct](https://huggingface.co/Alibaba-NLP/ZeroSearch_google_V2_Llama_3.2_3B_Instruct)
•[ZeroSearch\_google\_V2\_Qwen2.5\_7B](https://huggingface.co/Alibaba-NLP/ZeroSearch_google_V2_Qwen2.5_7B)
•[ZeroSearch\_google\_V2\_Qwen2.5\_7B\_Instruct](https://huggingface.co/Alibaba-NLP/ZeroSearch_google_V2_Qwen2.5_7B_Instruct) |
# 📌 Introduction
- We propose ZeroSearch, a novel reinforcement learning framework that incentivizes the capability of LLMs to use a real search engine with simulated searches during training.
- Through supervised fine-tuning, we transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. We further introduce a curriculum rollout mechanism to progressively elicit the model’s reasoning ability by exposing it to increasingly challenging retrieval scenarios.
- We conduct extensive experiments on both in-domain and out-of-domain datasets. Results show that ZeroSearch outperforms real search engine-based models while incurring zero API cost. Moreover, it generalizes well across both base and instruction-tuned LLMs of various sizes and supports different reinforcement learning algorithms.
# 🛠 Dependencies
```bash
conda create -n zerosearch python=3.9
conda activate zerosearch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3
pip install wandb
pip install serpapi
# verl
pip install -e .
# flash attention 2
pip3 install flash-attn --no-build-isolation
# sglang
# If you encounter package conflicts when trying to install sglang in the current environment, we recommend creating a new environment and installing sglang there.
pip install sglang[all]
```
# 📖 Quick Start
(1) Download the training dataset.
```bash
huggingface-cli download --repo-type dataset --resume-download sunhaonlp/ZeroSearch_dataset --local-dir ZeroSearch_dataset
# (Optional) Download the Simulation Tuning dataset, required only if you want to train your own simulation LLMs
huggingface-cli download --repo-type dataset --resume-download sunhaonlp/SimulationTuning_dataset --local-dir SimulationTuning_dataset
```
(2) Download the simulation LLMs.
```bash
# Simulation LLMs are available in different parameter sizes. Choose the one that best suits your needs.
# The 14B version is recommended for its stable and reliable simulation performance.
huggingface-cli download --resume-download sunhaonlp/Simulation_LLM_google_3B_V2 --local-dir Simulation_LLM_google_3B
huggingface-cli download --resume-download sunhaonlp/Simulation_LLM_google_7B_V2 --local-dir Simulation_LLM_google_7B
huggingface-cli download --resume-download sunhaonlp/Simulation_LLM_google_14B_V2 --local-dir Simulation_LLM_google_14B
```
(3) Launch a local simulation server.
```bash
# Prompt-based simulation
python -m sglang.launch_server --model-path Qwen2.5-14B-Instruct --host 0.0.0.0 --tp 2 --dp 2 --port 6001
# Fine-tuning-based simulation
python -m sglang.launch_server --model-path Simulation_LLM_google_14B --host 0.0.0.0 --tp 2 --dp 2 --port 6001
```
(4) Conduct RL training with Qwen2.5-3B-Instruct.
```bash
# Activate the conda environment
conda activate zerosearch
# Set your Google Search API key
export SER_API_KEY=your_api_key
# You can run REINFORCE, GRPO or PPO training using the scripts below.
# The START_THRESHOLD and END_THRESHOLD parameters define the initial and final difficulty levels of the training tasks. Adjusting these values can help optimize model performance.
## Prompt-based simulation
bash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5
bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5
bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_prompt SIMULATION_LLM Qwen2.5-14B-Instruct START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5
## Fine-tuning-based simulation
bash train_reinforce.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5
bash train_grpo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5
bash train_ppo.sh NUM_GPUS_PER_NODE 4 MODEL_PATH Qwen2.5-3B-Instruct DATA_PATH ZeroSearch_dataset TOTAL_STEPS 203 IP localhost SEARCH_MODE simulate_sft SIMULATION_LLM Simulation_LLM_google_14B START_THRESHOLD 0 END_THRESHOLD 0.5 SEARCH_ENGINE google MAX_TURNS 5 TOPK 5
```
# 💡 Performance
### 📊 Main Results
### 📊 Compare ZeroSearch with Real Search Engine
### 📊 Choice of Simulation LLMs
### 📊 Case Study
# 🙏 Acknowledgements
This work is implemented based on [Search-R1](https://github.com/PeterGriffinJin/Search-R1), [veRL](https://github.com/volcengine/verl), and [RAGEN](https://github.com/ZihanWang314/RAGEN/tree/main). We sincerely thank the authors of these projects for their valuable contributions to the open-source community.
## 👍 Awesome work inspired by ZeroSearch
- [SSRL](https://github.com/TsinghuaC3I/SSRL): SSRL: Self-Search Reinforcement Learning. [![[code]](https://img.shields.io/github/stars/TsinghuaC3I/SSRL)](https://github.com/TsinghuaC3I/SSRL)
# 📧 Contact
If you have any questions, feel free to reach out to me via email: [sunhao@stu.pku.edu.cn](mailto:sunhao@stu.pku.edu.cn)
## 🚩Citation
If this work is helpful, please kindly cite as:
```bigquery
@article{sun2025zerosearch,
title={ZeroSearch: Incentivize the Search Capability of LLMs without Searching},
author={Sun, Hao and Qiao, Zile and Guo, Jiayan and Fan, Xuanbo and Hou, Yingyan and Jiang, Yong and Xie, Pengjun and Huang, Fei and Zhang, Yan},
journal={arXiv preprint arXiv:2505.04588},
year={2025}
}
```