# ERTACache **Repository Path**: ByteDance/ERTACache ## Basic Information - **Project Name**: ERTACache - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-05 - **Last Updated**: 2025-09-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion ## 🫖 Introduction In this work, we present **ERTACache**, a principled and efficient caching framework for accelerating diffusion model inference. By decomposing cache-induced degradation into feature shift and step amplification errors, we develop a dual-path correction strategy that combines offline-calibrated reuse scheduling, trajectory-aware timestep adjustment, and closed-form residual rectification. The following figure gives an overview of our **ERTACache** framework, which adopts a dual-dimensional correction strategy: (1) we first perform offline policy calibration by searching for a globally effective cache schedule using residual error profiling; (2) we then introduce a trajectory-aware timestep adjustment mechanism to mitigate integration drift caused by reused features; (3) finally, we propose an explicit error rectification that analytically approximates and rectifies the additive error introduced by cached outputs, enabling accurate reconstruction with negligible overhead. ![visualization](./visualize/framework.png) As the figure shown below, **ERTACache** preserves fine-grained visual details and frame-to-frame consistency, outperforming TeaCache and matching the performance of the non-cache reference. In video generation tasks using CogVideoX, Wan2.1-1.3B, and OperaSora 1.2, ERTA-Cache achieves noticeably better temporal consistency, particularly between the first and last frames. When applied to the Flux-dev 1.0 image model, it enhances visual richness and details. These results highlight **ERTACache** as a uniquely effective solution that balances visual quality and computational efficiency for consistent video generation. ![visualization](./visualize/exp.png) Unlike prior heuristics-based methods, **ERTACache** provides a theoretically grounded yet lightweight solution that significantly reduces redundant computations while maintaining high-fidelity outputs. Empirical results across multiple benchmarks validate its effectiveness and generality, highlighting its potential as a practical solution for efficient generative sampling. ## 🎉 Supported Models **Text to Video** - **ERTACache4Wan2.1** - **ERTACache4CogVideoX-2B** - **ERTACache4OpenSora1.2** **Text to Image** - **ERTACache4FLUX** ## 📈 Inference Comparisons on a Single A800
T2V Model Method LPIPS SSIM PSNR Latency(s)
OpenSora 1.2 TeaCache 0.2511 0.7477 19.10 19.84
ERTACache 0.1659 0.8170 22.34 18.04
CogvideoX-2B TeaCache 0.2057 0.7614 20.97 26.88
ERTACache 0.1012 0.8702 26.44 26.78
Wan2.1-1.3B TeaCache 0.2913 0.5685 16.17 99.5
ERTACache 0.1095 0.8200 23.77 91.7
FLUX-dev 1.0 TeaCache 0.4427 0.7445 16.47 14.21
ERTACache 0.3029 0.8962 20.51 14.01

## Installation The running environment set-up depends on the specific model. For example, for FLUX, you need to install the FLUX packages: ```shell pip install --upgrade diffusers[torch] transformers protobuf tokenizers sentencepiece ``` ## Usage For all the supported models, you can enter in the specific folder (for example: go to `\ERTACache4FLUX` ), then use the following command to get the outputs saved in the `.\sample` folder ```bash sh run.sh ``` ## 💐 Acknowledgement This repository is built based on [VideoSys](https://github.com/NUS-HPC-AI-Lab/VideoSys), [Diffusers](https://github.com/huggingface/diffusers), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [CogVideoX](https://github.com/THUDM/CogVideo), [FLUX](https://github.com/black-forest-labs/flux), [Wan2.1](https://github.com/Wan-Video/Wan2.1), Thanks for their contributions! ## 🔒 License * The majority of this project is released under the MIT license as found in the [LICENSE](./LICENSE) file. * For [VideoSys](https://github.com/NUS-HPC-AI-Lab/VideoSys), [Diffusers](https://github.com/huggingface/diffusers), [Open-Sora](https://github.com/hpcaitech/Open-Sora), [CogVideoX](https://github.com/THUDM/CogVideo), [FLUX](https://github.com/black-forest-labs/flux), [Wan2.1](https://github.com/Wan-Video/Wan2.1), please follow their LICENSE.