Multimodal Vision Research Laboratory

MVRL

Research Area: Generative Multimodal AI

Our generative AI research pushes the boundaries of how AI synthesizes and represents information across diverse sensors and scales. Recent work includes open-world generation of stereo images with unsupervised matching (GenStereo), consistent text-to-360 scene generation (PanoDreamer), and generative-free 3D scene recovery for occlusion removal (DeclutterNeRF). We also develop methods for generating detailed synthetic captions for composed image retrieval, fine-grained satellite image synthesis with structured semantics (VectorSynth), and zero-shot soundscape mapping from satellite imagery (Sat2Sound). Our research spans from geospatially guided diffusion for mixed-view panorama synthesis to diffusion-guided visual active search in partially observable environments.

Publications

  1. Sarkar A, Sastry S, Pirinen A, Jacobs N, Vorobeychik Y. 2026. DiffVAS: Diffusion-Guided Visual Active Search in Partially Observable Environments. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  2. Thumbnail for VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics
    Cher D, Wei B, Sastry S, Jacobs N. 2026. VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics. In: IEEE Winter Conference on Applications of Computer Vision (WACV).
  3. Xiong Z, Ye X, Yaman B, Cheng S, Lu Y, Luo J, Jacobs N, Ren L. 2026. UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving. International Journal of Applied Earth Observation and Geoinformation.
  4. Xing E, Stylianou A, Pless R, Jacobs N. 2025. QuARI: Query Adaptive Retrieval Improvement. In: Neural Information Processing Systems (NeurIPS).
  5. Thumbnail for LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
    Sastry S, Xing X, Dhakal A, Khanal S, Ahmad A, Jacobs N. 2025. LD-SDM: Language-Driven Hierarchical Species Distribution Modeling. In: Computer Vision for Ecology (IEEE International Conference on Computer Vision (ICCV) Workshops).
  6. Thumbnail for Towards Open-World Generation of Stereo Images and Unsupervised Matching
    Qiao F, Xiong Z, Xing E, Jacobs N. 2025. Towards Open-World Generation of Stereo Images and Unsupervised Matching. In: IEEE International Conference on Computer Vision (ICCV).
  7. Thumbnail for ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
    Xing E, Kolouju P, Pless R, Stylianou A, Jacobs N. 2025. ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  8. Xiong Z, Chen Z, Li Z, Xu Y, Jacobs N. 2025. PanoDreamer: Consistent Text to 360 Scene Generation. In: 4th Computer Vision for Metaverse Workshop (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  9. Liu W, Xiong Z, Li X, Jacobs N. 2025. DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal. In: 4th Computer Vision for Metaverse Workshop (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  10. Kolouju P, Xing E, Pless R, Jacobs N, Stylianou A. 2025. good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval. In: SyntaGen: 2nd Workshop on Harnessing Generative Models for Synthetic Visual Datasets (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  11. Khanal S, Sastry S, Dhakal A, Ahmad A, Jacobs N. 2025. Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping. arXiv 2505.13777.
  12. Thumbnail for GOMAA-Geo: GOal Modality Agnostic Active Geo-localization
    Sarkar A, Sastry S, Pirinen A, Zhang C, Jacobs N, Vorobeychik Y. 2024. GOMAA-Geo: GOal Modality Agnostic Active Geo-localization. In: Neural Information Processing Systems (NeurIPS).
  13. Xiong Z, Xiong W, Shi J, Zhang H, Song Y, Jacobs N. 2024. GroundingBooth: Grounding Text-to-Image Customization. arXiv 2409.08520.
  14. Dhakal A, Ahmad A, Khanal S, Sastry S, Kerner HR, Jacobs N. 2024. Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images. In: IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION).
  15. Xing X, Xiong Z, Stylianou A, Sastry S, Gong L, Jacobs N. 2024. Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning. In: Workshop on Representation Learning with Very Limited Images.
  16. Sastry S, Khanal S, Dhakal A, Jacobs N. 2024. GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis. In: IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION).
  17. Levering A, Marcos D, Jacobs N, Tuia D. 2024. Prompt-guided and multimodal landscape scenicness assessments with vision-language models. PLOS ONE.
  18. Thumbnail for Unifying Guided and Unguided Outdoor Image Synthesis
    Rafique MU, Zhang Y, Brodie B, Jacobs N. 2021. Unifying Guided and Unguided Outdoor Image Synthesis. In: New Trends in Image Restoration and Enhancement (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops). 776–785. DOI: 10.1109/CVPRW53098.2021.00087.
  19. Thumbnail for Generative Appearance Flow: A Hybrid Approach for Outdoor View Synthesis
    Rafique MU, Blanton H, Snavely N, Jacobs N. 2020. Generative Appearance Flow: A Hybrid Approach for Outdoor View Synthesis. In: British Machine Vision Conference (BMVC).
  20. Thumbnail for Learning to Look around Objects for Top-View Representations of Outdoor Scenes
    Schulter S, Zhai M, Jacobs N, Chandraker M. 2018. Learning to Look around Objects for Top-View Representations of Outdoor Scenes. In: European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-030-01267-0_48.
  21. Thumbnail for Predicting Ground-Level Scene Layout from Aerial Imagery
    Zhai M, Bessinger Z, Workman S, Jacobs N. 2017. Predicting Ground-Level Scene Layout from Aerial Imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR.2017.440.