META Introduces CM3leon, a More Efficient, State-of-the-Art Generative Model for Text and Images

Tuesday, July 18, 2023
META Introduces CM3leon, a More Efficient, State-of-the-Art Generative Model for Text and Images

Interest and research in generative AI models has accelerated in recent months with advancements in natural language processing that lets machines understand and express language, as well as systems that can generate images based on text input. Today, we’re showcasing CM3leon (pronounced like “chameleon”), a single foundation model that does both text-to-image and image-to-text generation.

CM3leon is the first multimodal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multitask supervised fine-tuning (SFT) stage. This recipe is simple, produces a strong model, and also shows that tokenizer-based transformers can be trained as efficiently as existing generative diffusion-based models. CM3leon achieves state-of-the-art performance for text-to-image generation, despite being trained with five times less compute than previous transformer-based methods. CM3leon has the versatility and effectiveness of autoregressive models, while maintaining low training costs and inference efficiency. It is a causal masked mixed-modal (CM3) model because it can generate sequences of text and images conditioned on arbitrary sequences of other image and text content. This greatly expands the functionality of previous models that were either only text-to-image or only image-to-text.

Although text-only generative models are commonly multitask instruction-tuned on a wide range of different tasks to improve their ability to follow instruction prompts, image generation models are instead typically specialized for particular tasks. We apply large-scale multitask instruction tuning to CM3leon for both image and text generation, and show that it significantly improves performance on tasks such as image caption generation, visual question answering, text-based editing, and conditional image generation. This provides another strong example of how the scaling recipes developed for text-only models generalize directly to our tokenization-based image generation models.

When comparing performance on the most widely used image generation benchmark (zero-shot MS-COCO), CM3Leon achieves an FID (Fréchet Inception Distance) score of 4.88, establishing a new state of the art in text-to-image generation and outperforming Google’s text-to-image model, Parti. This achievement underscores the potential of retrieval augmentation and highlights the impact of scaling strategies on the performance of autoregressive models. CM3Leon also shows an impressive ability to generate complex compositional objects, such as the potted cactus with sunglasses and a hat in the examples below. CM3leon performs well across a variety of vision-language tasks, including visual question answering and long-form captioning. Even with training on a dataset comprised of only three billion text tokens, CM3Leon's zero-shot performance compares favorably against larger models trained on more extensive datasets.

 

Stephanie Cime

ArtDependence WhatsApp Group

Get the latest ArtDependence updates directly in WhatsApp by joining the ArtDependence WhatsApp Group by clicking the link or scanning the QR code below

whatsapp-qr

Subscribe to the Newsletter

Image of the Day

Anna Melnykova, "Palace of Labor (palats praci), architector I. Pretro, 1916", shot with analog Canon camera, 35 mm Fuji film in March 2022.

Anna Melnykova, "Palace of Labor (palats praci), architector I. Pretro, 1916", shot with analog Canon camera, 35 mm Fuji film in March 2022.

Search

About ArtDependence

ArtDependence Magazine is an international magazine covering all spheres of contemporary art, as well as modern and classical art.

ArtDependence features the latest art news, highlighting interviews with today’s most influential artists, galleries, curators, collectors, fair directors and individuals at the axis of the arts.

The magazine also covers series of articles and reviews on critical art events, new publications and other foremost happenings in the art world.

If you would like to submit events or editorial content to ArtDependence Magazine, please feel free to reach the magazine via the contact page.