Deploying DFlash block diffusion on NVIDIA hardware accelerates autoregressive LLMs during latency-sensitive inference.