Multimodal Vision Research Laboratory

MVRL

Research Area: Multimodal Representation Learning

How do we learn representations that transfer across sensors, modalities, and domains? We develop self-supervised and multimodal embedding methods that align vision, language, and audio for robust learning at scale. Recent work includes Frobenius norm minimization for self-supervised learning (FroSSL), global and local entailment learning for natural-world imagery (RCME), probabilistic masked multimodal embedding models for ecology (ProM3E), and unified embedding spaces linking ground and satellite views (TaxaBind). These representations underpin geospatial search, biodiversity monitoring, and generative earth-data synthesis across the lab.

All Publications

  1. Sastry S, Khanal S, Dhakal A, Lin J, Cher D, Jarosz P, Jacobs N. 2026. ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  2. Thumbnail for Beyond Flat Labels: Level-Restricted Contrastive Learning for Hierarchical Fine-Grained Vision Classification
    Tao Z, Sastry S, Thompson M, Campolongo E, Zhang N, Zhang Z, Lapp H, Su Y, Berger-Wolf T, Jacobs N, Chao W-L, Gu J. 2026. Beyond Flat Labels: Level-Restricted Contrastive Learning for Hierarchical Fine-Grained Vision Classification. In: Fine-Grained Visual Categorization (FGVC) (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop).
  3. Dhakal A, Khanal S, Sastry S, Arndt J, Dias PA, Lunga D, Jacobs N. 2026. SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  4. Thumbnail for VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics
    Cher D, Wei B, Sastry S, Jacobs N. 2026. VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics. In: IEEE Winter Conference on Applications of Computer Vision (WACV).
  5. Klemmer K, Rolf E, Russwurm M, Camps-Valls G, Czerkawski M, Ermon S, Francis A, Jacobs N, Kerner HR, Mackey L, Mai G, Aodha OM, Reichstein M, Robinson C, Rolnick D, Shelhamer E, Sitzmann V, Tuia D, Zhu X. 2025. Earth Embeddings: Towards AI-centric Representations of our Planet. DOI: https://doi.org/10.31223/X5HX9S.
  6. Thumbnail for RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
    Dhakal A, Sastry S, Khanal S, Ahmad A, Xing E, Jacobs N. 2025. RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  7. Thumbnail for TaxaBind: A Unified Embedding Space for Ecological Applications
    Sastry S, Khanal S, Dhakal A, Ahmad A, Jacobs N. 2025. TaxaBind: A Unified Embedding Space for Ecological Applications. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
  8. Lin S-C, Su Y, Gastaldello F, Jacobs N. 2024. Semisupervised Learning for Detecting Inverse Compton Emission in Galaxy Clusters. Astrophysical Journal 977:176. DOI: 10.3847/1538-4357/ad8888.
  9. Thumbnail for FroSSL: Frobenius Norm Minimization for Self-Supervised Learning
    Skean O, Dhakal A, Jacobs N, Giraldo LGS. 2024. FroSSL: Frobenius Norm Minimization for Self-Supervised Learning. In: European Conference on Computer Vision (ECCV).
  10. Thumbnail for PSM: Learning Probabilistic Embeddings for Multi-scale Zero-shot Soundscape Mapping
    Khanal S, Xing E, Sastry S, Dhakal A, Xiong Z, Ahmad A, Jacobs N. 2024. PSM: Learning Probabilistic Embeddings for Multi-scale Zero-shot Soundscape Mapping. In: ACM Multimedia. DOI: 10.1145/3664647.3681620.
  11. Levering A, Marcos D, Jacobs N, Tuia D. 2024. Prompt-guided and multimodal landscape scenicness assessments with vision-language models. PLOS ONE.
  12. Dhakal A, Khanal S, Sastry S, Ahmad A, Jacobs N. 2024. GeoBind: Binding text, image, and audio through satellite images. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
  13. Jain P, Marcos D, Ienco D, Interdonato R, Dhakal A, Jacobs N, Berchoux T. 2024. Aligning Geo-Tagged CLIP Representations and Satellite Imagery for Few-Shot Land Use Classification. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
  14. Xing X, Xiong Z, Stylianou A, Sastry S, Gong L, Jacobs N. 2024. Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning. In: Workshop on Representation Learning with Very Limited Images.
  15. Thumbnail for BirdSat: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
    Sastry S, Khanal S, Dhakal A, Huang D, Jacobs N. 2024. BirdSat: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
  16. Liang G, Zulu J, Xing X, Jacobs N. 2023. Unveiling Roadway Hazards: Enhancing Fatal Crash Risk Estimation through Multi-Scale Satellite Imagery and Self-Supervised Cross-Matching. Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS). DOI: 10.1109/JSTARS.2023.3331438.
  17. Khanal S, Sastry S, Dhakal A, Jacobs N. 2023. Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping. In: British Machine Vision Conference (BMVC).
  18. Xing X, Liang G, Wang C, Jacobs N, Lin A-L. 2023. Self-Supervised Learning Application on COVID-19 Chest X-ray Image Classification Using Masked AutoEncoder. Bioengineering 10. DOI: 10.3390/bioengineering10080901.
  19. Sastry S, Dhakal A, Brodie B, Khanal S, Jacobs N. 2023. Explorations in Self-supervised Learning for Change Detection. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
  20. Xing X, Peng C, Zhang Y, Lin A-L, Jacobs N. 2022. AssocFormer: Association Transformer for Multi-label Classification. In: British Machine Vision Conference (BMVC).
  21. Khanal S, Brodie B, Xing X, Lin A-L, Jacobs N. 2022. Causality for Inherently Explainable Transformers: CAT-XPLAIN. In: XAI4CV: Explainable Artificial Intelligence for Computer Vision (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops).
  22. Thumbnail for Dynamic Feature Alignment for Semi-supervised Domain Adaptation
    Zhang Y, Liang G, Jacobs N. 2021. Dynamic Feature Alignment for Semi-supervised Domain Adaptation. In: British Machine Vision Conference (BMVC).
  23. Thumbnail for Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging
    Liang G, Greenwell C, Zhang Y, Xing X, Wang X, Kavuluru R, Jacobs N. 2021. Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging. IEEE Journal of Biomedical and Health Informatics 26. DOI: 10.1109/JBHI.2021.3110805.
  24. Brodie B, Khanal S, Rafique MU, Greenwell C, Jacobs N. 2021. Hierarchical Probabilistic Embeddings for Multi-View Image Classification. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS). DOI: 10.1109/IGARSS47720.2021.9554405.
  25. Liang G, Su Y, Lin S-C, Zhang Y, Zhang Y, Jacobs N. 2020. Optical Wavelength Guided Self-Supervised Feature Learning For Galaxy Cluster Richness Estimate. In: Workshop on Machine Learning and the Physical Sciences at the 34th Conference on Neural Information Processing Systems.
  26. Thumbnail for Learning Geo-Temporal Image Features
    Zhai M, Salem T, Greenwell C, Workman S, Pless R, Jacobs N. 2018. Learning Geo-Temporal Image Features. In: British Machine Vision Conference (BMVC).
  27. Workman S, Jacobs N. 2015. On the Location Dependence of Convolutional Neural Network Features. In: IEEE/ISPRS Workshop: Large Scale Computer Vision for Remote Sensing (EARTHVISION). 1–9. DOI: 10.1109/CVPRW.2015.7301385.
  28. Thumbnail for MPCA: EM-Based PCA For Mixed-Size Image Datasets
    Shi F, Zhai M, Duncan D, Jacobs N. 2014. MPCA: EM-Based PCA For Mixed-Size Image Datasets. In: IEEE International Conference on Image Processing (ICIP). 1807–1811. DOI: 10.1109/ICIP.2014.7025362.
  29. Zhai M, Shi F, Duncan D, Jacobs N. 2014. Covariance-Based PCA for Multi-Size Data. In: International Conference on Pattern Recognition (ICPR). 1603–1608. DOI: 10.1109/ICPR.2014.284.
  30. Thumbnail for Consistent Temporal Variations in Many Outdoor Scenes
    Jacobs N, Roman N, Pless R. 2007. Consistent Temporal Variations in Many Outdoor Scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–6. DOI: 10.1109/CVPR.2007.383258.