# ERTACache **Repository Path**: ByteDance/ERTACache ## Basic Information - **Project Name**: ERTACache - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-05 - **Last Updated**: 2025-09-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion ## 🫖 Introduction In this work, we present **ERTACache**, a principled and efficient caching framework for accelerating diffusion model inference. By decomposing cache-induced degradation into feature shift and step amplification errors, we develop a dual-path correction strategy that combines offline-calibrated reuse scheduling, trajectory-aware timestep adjustment, and closed-form residual rectification. The following figure gives an overview of our **ERTACache** framework, which adopts a dual-dimensional correction strategy: (1) we first perform offline policy calibration by searching for a globally effective cache schedule using residual error profiling; (2) we then introduce a trajectory-aware timestep adjustment mechanism to mitigate integration drift caused by reused features; (3) finally, we propose an explicit error rectification that analytically approximates and rectifies the additive error introduced by cached outputs, enabling accurate reconstruction with negligible overhead.  As the figure shown below, **ERTACache** preserves fine-grained visual details and frame-to-frame consistency, outperforming TeaCache and matching the performance of the non-cache reference. In video generation tasks using CogVideoX, Wan2.1-1.3B, and OperaSora 1.2, ERTA-Cache achieves noticeably better temporal consistency, particularly between the first and last frames. When applied to the Flux-dev 1.0 image model, it enhances visual richness and details. These results highlight **ERTACache** as a uniquely effective solution that balances visual quality and computational efficiency for consistent video generation.  Unlike prior heuristics-based methods, **ERTACache** provides a theoretically grounded yet lightweight solution that significantly reduces redundant computations while maintaining high-fidelity outputs. Empirical results across multiple benchmarks validate its effectiveness and generality, highlighting its potential as a practical solution for efficient generative sampling. ## 🎉 Supported Models **Text to Video** - **ERTACache4Wan2.1** - **ERTACache4CogVideoX-2B** - **ERTACache4OpenSora1.2** **Text to Image** - **ERTACache4FLUX** ## 📈 Inference Comparisons on a Single A800
T2V Model | Method | LPIPS | SSIM | PSNR | Latency(s) |
---|---|---|---|---|---|
OpenSora 1.2 | TeaCache | 0.2511 | 0.7477 | 19.10 | 19.84 |
ERTACache | 0.1659 | 0.8170 | 22.34 | 18.04 | |
CogvideoX-2B | TeaCache | 0.2057 | 0.7614 | 20.97 | 26.88 |
ERTACache | 0.1012 | 0.8702 | 26.44 | 26.78 | |
Wan2.1-1.3B | TeaCache | 0.2913 | 0.5685 | 16.17 | 99.5 |
ERTACache | 0.1095 | 0.8200 | 23.77 | 91.7 | |
FLUX-dev 1.0 | TeaCache | 0.4427 | 0.7445 | 16.47 | 14.21 |
ERTACache | 0.3029 | 0.8962 | 20.51 | 14.01 | |