By Christoph Schütte in News — May 11, 2023

ImageBind: Joint Embedding across Six Different Modalities

A new model release from Facebook/Meta AI research: "An approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU (inertial measurement units) data". The non-interactive demo shows searching audio starting with an image, searching images starting with audio, using text to retrieve images and audio, using image and audio to retrieve images (e.g. a barking sound and a photo of a beach to get dogs on a beach) and using audio as input to an image generator.

Subscribe to ssv.ai