Parallel scan is a fundamental primitive widely used in a broad range of workloads, including parallel sorting, graph algorithms, and sampling in large language model inference. Although GPU-optimized ...