Multimodal Vision Research Laboratory

MVRL

Research Area: Biodiversity and Conservation

What is changing in species, habitats, and ecosystems—and how can AI support conservation decisions? We build multimodal models for biodiversity monitoring and environmental change from natural-world imagery, citizen science, and earth observation. Recent work includes global and local entailment learning for natural-world imagery (RCME), probabilistic masked multimodal embedding models for ecology (ProM3E), language-driven hierarchical species distribution modeling (LD-SDM), unified embedding spaces for ecological applications (TaxaBind), and cross-view learning for bird species mapping (BirdSat). Our goal is conservation-ready monitoring that connects field observations, overhead imagery, and ecological science.

All Publications

  1. Thumbnail for Beyond Flat Labels: Level-Restricted Contrastive Learning for Hierarchical Fine-Grained Vision Classification
    Tao Z, Sastry S, Thompson M, Campolongo E, Zhang N, Zhang Z, Lapp H, Su Y, Berger-Wolf T, Jacobs N, Chao W-L, Gu J. 2026. Beyond Flat Labels: Level-Restricted Contrastive Learning for Hierarchical Fine-Grained Vision Classification. In: Fine-Grained Visual Categorization (FGVC) (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop).
  2. Sastry S, Khanal S, Dhakal A, Lin J, Cher D, Jarosz P, Jacobs N. 2026. ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  3. Thumbnail for LD-SDM: Language-Driven Hierarchical Species Distribution Modeling
    Sastry S, Xing X, Dhakal A, Khanal S, Ahmad A, Jacobs N. 2025. LD-SDM: Language-Driven Hierarchical Species Distribution Modeling. In: Computer Vision for Ecology (IEEE/CVF International Conference on Computer Vision (ICCV) Workshops).
  4. Thumbnail for Global and Local Entailment Learning for Natural World Imagery
    Sastry S, Dhakal A, Xing E, Khanal S, Jacobs N. 2025. Global and Local Entailment Learning for Natural World Imagery. In: IEEE/CVF International Conference on Computer Vision (ICCV).
  5. Thumbnail for TaxaBind: A Unified Embedding Space for Ecological Applications
    Sastry S, Khanal S, Dhakal A, Ahmad A, Jacobs N. 2025. TaxaBind: A Unified Embedding Space for Ecological Applications. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
  6. Sastry S, Xing X, Dhakal A, Khanal S, Ahmad A, Jacobs N. 2024. LD-SDM: Language-Driven Hierarchical Species Distribution Modeling. In: American Geophysical Union (AGU) Fall Meeting Abstracts.
  7. Ahmad A, Dhakal A, Sastry S, Khanal S, Xing E, Jacobs N. 2024. Improved Canopy Vertical Structural Diversity Mapping Across Varied Topographies Using Deep Learning Techniques. In: American Geophysical Union (AGU) Fall Meeting Abstracts.
  8. Thumbnail for BirdSat: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
    Sastry S, Khanal S, Dhakal A, Huang D, Jacobs N. 2024. BirdSat: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).