Abstract: Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft ...
This project presents UniFlow (Unified Pixel Flow Tokenizer), a unified continuous visual tokenizer that eliminates the conflict between semantic understanding and high-fidelity reconstruction. By ...
Existing semantic speech tokenizers often suffer from acoustic blindness, while acoustic tokenizers typically lack linguistic alignment. load the tokenizer and feature extractor; extract discrete ...