This project will soon stop being maintained. We recommend using pymss-desktop, which supports more comprehensive models, provides 50-100x faster inference than MSST-WebUI, and has a better-looking ...
Abstract: Pre-trained vision-language models (VLMs) and language models (LMs) have recently garnered significant attention due to their remarkable ability to represent textual concepts, opening up new ...