Diffusion models have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications. These models face two fundamental challenges: strict temporal dependencies preventing parallelization, and computationally intensive forward passes required at each denoising step. Drawing inspiration from speculative decoding in large language models, we present \textit{SpeCa}, a novel ``\textit{\textbf{Forecast-then-verify}}'' acceleration framework that effectively addresses both limitations. \textit{SpeCa}'s core innovation lies in introducing Speculative Sampling to diffusion models, predicting intermediate features for subsequent timesteps based on fully computed reference timesteps. Our approach implements a parameter-free verification mechanism that efficiently evaluates prediction reliability, enabling real-time decisions to accept or reject each prediction while incurring negligible computational overhead. %This mechanism fundamentally resolves error accumulation issues that plague existing acceleration methods at high speedup ratios. Furthermore, \textit{SpeCa} introduces sample-adaptive computation allocation that dynamically modulates resources based on generation complexity—allocating reduced computation for simpler samples while preserving intensive processing for complex instances. Experiments demonstrate 6.34$\times$ acceleration on FLUX with minimal quality degradation (5.5\% drop), 7.3$\times$ speedup on DiT while preserving generation fidelity, and 79.84\% VBench score at 6.1$\times$ acceleration for HunyuanVideo. The verification mechanism incurs minimal overhead (1.67\%-3.5\% of full inference costs), establishing a new paradigm for efficient diffusion model inference while maintaining generation quality even at aggressive acceleration ratios. Codes are available in the supplementary material and will be released in Github.