We are starting the year 2023 with a new model for generating images from Google Research. Google has released MUSE, a 3 billion param Text-To-Image Generation via Masked Generative Transformer. A new competitor to Stable Diffusion and the like that has a few very interesting properties.
On Google Muse, the Transformer model for text-to-image generation is significantly faster than existing diffusion and autoregressive models and provides state-of-the-art image generation capabilities which enables inpainting, outpainting and mask-free editing without model fine-tuning or inversion.
One of the most interesting properties of this model, which it shares with Parti (also from Google) is its ability to GENERATE readable TEXT. A limitation that we know all currently available business models have, Muse can.
Tekedia Mini-MBA edition 16 (Feb 10 – May 3, 2025) opens registrations; register today for early bird discounts.
Tekedia AI in Business Masterclass opens registrations here.
Join Tekedia Capital Syndicate and invest in Africa’s finest startups here.
Another very important quality where the other models fail is when interpreting quantities or the distribution of elements at the prompt: “Put me 2 baseballs to the left of 3 tennis balls”, Google Muse also seems to be more compliant with these types of prompts.
Another very cool thing that this model has learned automatically is to be able to MANIPULATE IMAGES only by modifying the prompt. This allows you to modify an image in seconds respecting its composition and style, super helpful.
We finished last year with Point·E and we start this new year with Dream3D, yet another text-to-3D model. Compared to other sota models this one generates a high-quality 3D shape from text, uses the shape for a NeRF before it gets textured by a text2image diffusion model.