Multimodal Vision Research Laboratory

MVRL

research area: vision-language modeling

This work explores the use and development of vision-language models for various vision tasks.

See below for a list of our publications in this area. You can see an unfiltered list of our publications or lists filtered for the following research areas: agriculture; astronomical imagery and data; camera calibration; biodiversity and conservation; generative modeling; LiDAR Processing; image localization; medical and biological imaging; image motion; remote sensing and mapping; social media; video surveillance and object tracking; timelapse imaging; transportation; vision-language modeling; and outdoor webcam imagery.

publications

  1. a thumbnail for ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
    Xing E, Kolouju P, Pless R, Stylianou A, Jacobs N. 2025. ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    bibtex | paper | linkedin | code
  2. Xing E, Stylianou A, Pless R, Jacobs N. 2025. QuARI: Query Adaptive Retrieval Improvement. arXiv 2505.21647.
    bibtex | paper