We develop novel representation learning methods for computer vision and multimodal understanding. Recent work includes Frobenius norm minimization for self-supervised learning (FroSSL), dynamic feature alignment for semi-supervised domain adaptation, and learning geo-temporal image features from webcam networks. We also explore hierarchical probabilistic embeddings for multi-view image classification, covariance-based PCA for multi-size data, and representation learning approaches that leverage temporal variations in outdoor scenes. Our research addresses fundamental challenges in learning robust, transferable representations from diverse and often limited data sources.