Multimodal Vision Research Laboratory

MVRL

Research Area: Audio

Audio research at MVRL studies multimodal embeddings that link soundscapes with vision and geospatial data. Recent work includes probabilistic embeddings for zero-shot soundscape mapping (PSM), tri-modal embeddings, and Sat2Sound for mapping soundscapes from satellite imagery—often integrated with ecology-focused models such as ProM3E.

All Publications

  1. Sastry S, Khanal S, Dhakal A, Lin J, Cher D, Jarosz P, Jacobs N. 2026. ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  2. Thumbnail for PSM: Learning Probabilistic Embeddings for Multi-scale Zero-shot Soundscape Mapping
    Khanal S, Xing E, Sastry S, Dhakal A, Xiong Z, Ahmad A, Jacobs N. 2024. PSM: Learning Probabilistic Embeddings for Multi-scale Zero-shot Soundscape Mapping. In: ACM Multimedia. DOI: 10.1145/3664647.3681620.
  3. Khanal S, Sastry S, Dhakal A, Jacobs N. 2023. Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping. In: British Machine Vision Conference (BMVC).
  4. Thumbnail for A Multimodal Approach to Mapping Soundscapes
    Salem T, Zhai M, Workman S, Jacobs N. 2018. A Multimodal Approach to Mapping Soundscapes. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS). DOI: 10.1109/IGARSS.2018.8517977.
  5. Song W, Salem T, Jacobs N, Johnson M. 2017. Detecting the Presence of Bird Vocalizations in Audio Segments Using a Convolutional Neural Network Architecture. In: International Symposium on Acoustic Communication by Animals.