Abstract: Pre-trained vision-language models (VLMs) and language models (LMs) have recently garnered significant attention due to their remarkable ability to represent textual concepts, opening up new ...
Tesseract is an open source OCR engine for recognizing text in images. The project provides both the libtesseract library and the tesseract command-line program. Tesseract supports Unicode through UTF ...
Abstract: Most existing multimodal named entity recognition (MNER) methods cannot align image and text well, and fail to effectively fuse image-text semantic information, leading to suboptimal MNER ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results