Multimodal Vision Research Laboratory

MVRL

Research Area: Generative Multimodal AI

How can generative models synthesize and represent information across sensors, scales, and modalities? We develop generative methods for images, 3D scenes, panoramas, and multimodal earth data. Recent work includes open-world generation of stereo images with unsupervised matching (GenStereo), fine-grained satellite image synthesis with structured semantics (VectorSynth), geospatially guided diffusion for mixed-view panorama synthesis, zero-shot soundscape mapping from satellite imagery (Sat2Sound), and diffusion-guided visual active search in partially observable environments (DiffVAS). Our research connects generative modeling to geospatial science, from consistent text-to-360 scene generation to synthetic data for downstream vision and ecology tasks.

All Publications

  1. Khanal S, Sastry S, Dhakal A, Ahmad A, Jacobs N. 2026. Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping. In: IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION).
  2. Dhakal A, Khanal S, Sastry S, Arndt J, Dias PA, Lunga D, Jacobs N. 2026. SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  3. Sarkar A, Sastry S, Pirinen A, Jacobs N, Vorobeychik Y. 2026. DiffVAS: Diffusion-Guided Visual Active Search in Partially Observable Environments. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
  4. Xiong Z, Song Y, He L, Xiong W, Yuan Y, Qiao F, Jacobs N. 2026. PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment. 2603.13770.
  5. Sastry S, Cher D, Wei B, Dhakal A, Khanal S, Gupta D, Jacobs N. 2026. GeoDiT: Point-Conditioned Diffusion Transformer for Satellite Image Synthesis.
  6. Thumbnail for VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics
    Cher D, Wei B, Sastry S, Jacobs N. 2026. VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics. In: IEEE Winter Conference on Applications of Computer Vision (WACV).
  7. Luo Y, Qiao F, Xiong Z, Li Y, Jacobs N. 2026. GenOpticalFlow: A Generative Approach to Unsupervised Optical Flow Learning.
  8. Xiong Z, Xiong W, Shi J, Zhang H, Song Y, Jacobs N. 2026. GroundingBooth: Grounding Text-to-Image Customization. Transactions on Machine Learning Research (TMLR) 2409.08520.
  9. Xiong Z, Ye X, Yaman B, Cheng S, Lu Y, Luo J, Jacobs N, Ren L. 2026. UniDrive-WM: Unified Understanding, Planning and Generation World Model For Autonomous Driving.
  10. Xing E, Stylianou A, Pless R, Jacobs N. 2025. QuARI: Query Adaptive Retrieval Improvement. In: Neural Information Processing Systems (NeurIPS).
  11. Thumbnail for LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
    Sastry S, Xing X, Dhakal A, Khanal S, Ahmad A, Jacobs N. 2025. LD-SDM: Language-Driven Hierarchical Species Distribution Modeling. In: Computer Vision for Ecology (IEEE/CVF International Conference on Computer Vision (ICCV) Workshops).
  12. Thumbnail for Towards Open-World Generation of Stereo Images and Unsupervised Matching
    Qiao F, Xiong Z, Xing E, Jacobs N. 2025. Towards Open-World Generation of Stereo Images and Unsupervised Matching. In: IEEE/CVF International Conference on Computer Vision (ICCV).
  13. Liu W, Xiong Z, Li X, Jacobs N. 2025. DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal. In: 4th Computer Vision for Metaverse Workshop (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  14. Xiong Z, Chen Z, Li Z, Xu Y, Jacobs N. 2025. PanoDreamer: Consistent Text to 360 Scene Generation. In: 4th Computer Vision for Metaverse Workshop (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  15. Kolouju P, Xing E, Pless R, Jacobs N, Stylianou A. 2025. good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval. In: SyntaGen: 2nd Workshop on Harnessing Generative Models for Synthetic Visual Datasets (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  16. Thumbnail for ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
    Xing E, Kolouju P, Pless R, Stylianou A, Jacobs N. 2025. ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  17. Thumbnail for GOMAA-Geo: GOal Modality Agnostic Active Geo-localization
    Sarkar A, Sastry S, Pirinen A, Zhang C, Jacobs N, Vorobeychik Y. 2024. GOMAA-Geo: GOal Modality Agnostic Active Geo-localization. In: Neural Information Processing Systems (NeurIPS).
  18. Levering A, Marcos D, Jacobs N, Tuia D. 2024. Prompt-guided and multimodal landscape scenicness assessments with vision-language models. PLOS ONE.
  19. Dhakal A, Ahmad A, Khanal S, Sastry S, Kerner HR, Jacobs N. 2024. Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images. In: IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION).
  20. Sastry S, Khanal S, Dhakal A, Jacobs N. 2024. GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis. In: IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION).
  21. Xing X, Xiong Z, Stylianou A, Sastry S, Gong L, Jacobs N. 2024. Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning. In: Workshop on Representation Learning with Very Limited Images.
  22. Thumbnail for Unifying Guided and Unguided Outdoor Image Synthesis
    Rafique MU, Zhang Y, Brodie B, Jacobs N. 2021. Unifying Guided and Unguided Outdoor Image Synthesis. In: New Trends in Image Restoration and Enhancement (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops). 776–785. DOI: 10.1109/CVPRW53098.2021.00087.
  23. Thumbnail for Generative Appearance Flow: A Hybrid Approach for Outdoor View Synthesis
    Rafique MU, Blanton H, Snavely N, Jacobs N. 2020. Generative Appearance Flow: A Hybrid Approach for Outdoor View Synthesis. In: British Machine Vision Conference (BMVC).
  24. Thumbnail for Learning to Look around Objects for Top-View Representations of Outdoor Scenes
    Schulter S, Zhai M, Jacobs N, Chandraker M. 2018. Learning to Look around Objects for Top-View Representations of Outdoor Scenes. In: European Conference on Computer Vision (ECCV). DOI: 10.1007/978-3-030-01267-0_48.
  25. Thumbnail for Predicting Ground-Level Scene Layout from Aerial Imagery
    Zhai M, Bessinger Z, Workman S, Jacobs N. 2017. Predicting Ground-Level Scene Layout from Aerial Imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR.2017.440.