Google Unveils Imagen 4 & Ultra Text-to-Image AI Models
Google is advancing its generative AI capabilities with the launch of its latest text-to-image models, Imagen 4 and the premium Imagen 4 Ultra. Building on previous iterations, these new models aim to give users greater control and deliver improved visual results, particularly in following detailed prompts.
The standard Imagen 4 model is positioned as the go-to tool for most creative tasks, while the more powerful Imagen 4 Ultra is specifically designed for scenarios demanding exceptional precision in image generation. Both models are now accessible through a paid preview in the Gemini API and available for limited free testing within Google AI Studio.
Imagen 4 vs. Imagen 4 Ultra: What’s the Difference?
Google has outlined distinct use cases and pricing for its two new AI image generators:
Imagen 4: Described as the model suitable for general tasks, offering a step up in quality, particularly with “significantly improved text rendering” compared to its predecessor, Imagen 3. It is priced at $0.04 per image.
Imagen 4 Ultra: This deluxe version is tailored for instances where prompts require images to adhere “precisely follow instructions.” Google highlights its potential for “strong” output quality, comparing it favorably against leading competitors like Dall-E and Midjourney. The enhanced precision comes at a 50 percent price increase, costing $0.06 per image.
Putting the Precision to the Test
During initial demonstrations, Google showcased various images generated by the new models. Imagen 4 Ultra successfully created a three-panel comic depicting a spaceship battle with a space lizard, complete with specific sound effects, appearing to follow the detailed prompt accurately, albeit in a style reminiscent of a toon rendering from a 3D application.
Imagen 4 also proved capable of executing complex prompts. A detailed request for a vintage Kyoto travel postcard featuring specific elements like a pagoda under cherry blossoms, distant snow-capped mountains, and vibrant colors was generated “to a ‘T’.” Other examples included a hiking couple and a stylized fashion shoot. While these images were acknowledged as being of good quality and precisely matching the text instructions, they still exhibited a “highly machine generated” look, lacking a natural or charming aesthetic.
Early Impressions and the Shifting AI Art Landscape
Despite the promises of improved precision and quality, early impressions suggest Imagen 4 represents only a “mild improvement” over previous versions. Compared to what are often considered market leaders, such as Dall-E 3 and Midjourney 7, the new Google models reportedly didn’t deliver a significant “wow” factor. The resulting images, while technically accurate to prompts, still carry a distinct machine-like appearance.
This launch occurs amidst a perceived shift in public sentiment regarding AI-generated art. Following an initial surge of enthusiasm, some observers note a potential waning interest, with the primary visible use cases often being seen in the context of spammy advertisements on social media and websites.
While Imagen 4 and Imagen 4 Ultra demonstrate Google’s continued effort to refine its text-to-image AI, offering enhanced precision and accessibility through the Gemini API and AI Studio, the models face a competitive landscape and evolving public perception of AI-generated visuals.