ImageBind: Joint Embedding across Six Different Modalities

ImageBind: Joint Embedding across Six Different Modalities
Photo by Matthew Moloney / Unsplash

A new model release from Facebook/Meta AI research: "An approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU (inertial measurement units) data". The non-interactive demo shows searching audio starting with an image, searching images starting with audio, using text to retrieve images and audio, using image and audio to retrieve images (e.g. a barking sound and a photo of a beach to get dogs on a beach) and using audio as input to an image generator.

GitHub - facebookresearch/ImageBind: ImageBind One Embedding Space to Bind Them All
ImageBind One Embedding Space to Bind Them All. Contribute to facebookresearch/ImageBind development by creating an account on GitHub.

Subscribe to ssv.ai

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe