# ERTACache **Repository Path**: ByteDance/ERTACache ## Basic Information - **Project Name**: ERTACache - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-05 - **Last Updated**: 2025-09-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion ## 🫖 Introduction In this work, we present **ERTACache**, a principled and efficient caching framework for accelerating diffusion model inference. By decomposing cache-induced degradation into feature shift and step amplification errors, we develop a dual-path correction strategy that combines offline-calibrated reuse scheduling, trajectory-aware timestep adjustment, and closed-form residual rectification. The following figure gives an overview of our **ERTACache** framework, which adopts a dual-dimensional correction strategy: (1) we first perform offline policy calibration by searching for a globally effective cache schedule using residual error profiling; (2) we then introduce a trajectory-aware timestep adjustment mechanism to mitigate integration drift caused by reused features; (3) finally, we propose an explicit error rectification that analytically approximates and rectifies the additive error introduced by cached outputs, enabling accurate reconstruction with negligible overhead. ![visualization](./visualize/framework.png) As the figure shown below, **ERTACache** preserves fine-grained visual details and frame-to-frame consistency, outperforming TeaCache and matching the performance of the non-cache reference. In video generation tasks using CogVideoX, Wan2.1-1.3B, and OperaSora 1.2, ERTA-Cache achieves noticeably better temporal consistency, particularly between the first and last frames. When applied to the Flux-dev 1.0 image model, it enhances visual richness and details. These results highlight **ERTACache** as a uniquely effective solution that balances visual quality and computational efficiency for consistent video generation. ![visualization](./visualize/exp.png) Unlike prior heuristics-based methods, **ERTACache** provides a theoretically grounded yet lightweight solution that significantly reduces redundant computations while maintaining high-fidelity outputs. Empirical results across multiple benchmarks validate its effectiveness and generality, highlighting its potential as a practical solution for efficient generative sampling. ## 🎉 Supported Models **Text to Video** - **ERTACache4Wan2.1** - **ERTACache4CogVideoX-2B** - **ERTACache4OpenSora1.2** **Text to Image** - **ERTACache4FLUX** ## 📈 Inference Comparisons on a Single A800

T2V Model	Method	LPIPS	SSIM	PSNR	Latency(s)
OpenSora 1.2	TeaCache	0.2511	0.7477	19.10	19.84
OpenSora 1.2	ERTACache	0.1659	0.8170	22.34	18.04
CogvideoX-2B	TeaCache	0.2057	0.7614	20.97	26.88
CogvideoX-2B	ERTACache	0.1012	0.8702	26.44	26.78
Wan2.1-1.3B	TeaCache	0.2913	0.5685	16.17	99.5
Wan2.1-1.3B	ERTACache	0.1095	0.8200	23.77	91.7
FLUX-dev 1.0	TeaCache	0.4427	0.7445	16.47	14.21
FLUX-dev 1.0	ERTACache	0.3029	0.8962	20.51	14.01

## Installation The running environment set-up depends on the specific model. For example, for FLUX, you need to install the FLUX packages: ```shell pip install --upgrade diffusers[torch] transformers protobuf tokenizers sentencepiece ``` ## Usage For all the supported models, you can enter in the specific folder (for example: go to `\ERTACache4FLUX` ), then use the following command to get the outputs saved in the `.\sample` folder ```bash sh run.sh ``` ## 💐 Acknowledgement This repository is built based on [VideoSys](https://github.com/NUS-HPC-AI-Lab/VideoSys), [Diffusers](https://github.com/huggingface/diffusers), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [CogVideoX](https://github.com/THUDM/CogVideo), [FLUX](https://github.com/black-forest-labs/flux), [Wan2.1](https://github.com/Wan-Video/Wan2.1), Thanks for their contributions! ## 🔒 License * The majority of this project is released under the MIT license as found in the [LICENSE](./LICENSE) file. * For [VideoSys](https://github.com/NUS-HPC-AI-Lab/VideoSys), [Diffusers](https://github.com/huggingface/diffusers), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [CogVideoX](https://github.com/THUDM/CogVideo), [FLUX](https://github.com/black-forest-labs/flux), [Wan2.1](https://github.com/Wan-Video/Wan2.1), please follow their LICENSE.