Google Launches Gemini 2.5 Flash Image with Text-Based Editing Capabilities

Google introduced Gemini 2.5 Flash Image (with internal codename nano-banana) — a model for image generation and editing. The model supports combining multiple images into one, maintains character consistency between generations, supports text-based editing, and uses Gemini’s world knowledge for content generation.

The model is available through Google AI Studio, Gemini API, and Vertex AI for enterprise customers. The cost is $30.00 per 1 million output tokens, with each image counted as 1290 tokens ($0.039 per image).

Gemini 2.5 Flash Image results — Performance comparison of Gemini 2.5 Flash Image with competitors

Key Features

Character Consistency

A key problem in image generation is maintaining the appearance of a character or object across different generations and scenes. Gemini 2.5 Flash Image allows placing the same character in different environments, showcasing a product from different angles in new settings, or creating uniform brand materials while preserving object identity.

Google created a demo application in Google AI Studio demonstrating the model’s character consistency capabilities.

Text-based Editing

The model can edit images based on text queries. Gemini 2.5 Flash Image can blur image backgrounds, remove stains from t-shirts, remove people from photos, change subject poses, add color to black and white photos, or implement other changes based on text descriptions.

To demonstrate these capabilities, Google developed a photo editing application in AI Studio with interface controls and prompt-based editing.

Text-based image editing — Targeted image editing capabilities using text commands

World Knowledge Integration

Traditional image generation models created aesthetically appealing results but lacked deep semantic understanding of the real world. Gemini 2.5 Flash Image uses Gemini’s world knowledge, opening new application scenarios.

Google created a demo application in Google AI Studio that turns a simple canvas into an interactive tutor. The application shows the model’s ability to read and understand handwritten diagrams, help solve problems, and follow complex editing instructions in a single step.

Multi-image Fusion

Gemini 2.5 Flash Image understands and combines multiple input images. The model can place objects into scenes, change room styles using color schemes or textures, and merge images with a single prompt.

To demonstrate image fusion, Google developed an application in Google AI Studio that allows dragging objects into new scenes for quick creation of photorealistic merged images.

Technical Implementation

Developers can start working with the model through API documentation. The model is in preview mode through Gemini API and Google AI Studio, with a stable version expected in the coming weeks.

All images created or edited with Gemini 2.5 Flash Image contain an invisible SynthID digital watermark to identify AI-generated or edited content.

from google import genai
from PIL import Image
from io import BytesIO

client = genai.Client()

prompt = "Create a picture of my cat eating a nano-banana in a fancy restaurant under the gemini constellation"

image = Image.open('/path/to/image.png')

response = client.models.generate_content(
    model="gemini-2.5-flash-image-preview",
    contents=[prompt, image],
)

for part in response.candidates[0].content.parts:
  if part.text is not None:
    print(part.text)
  elif part.inline_data is not None:
    image = Image.open(BytesIO(part.inline_data.data))   
    image.save("generated_image.png")

Google is actively working on improving long text rendering, even more reliable character consistency, and factual representation of fine details in images. The team encourages users to provide feedback on the developer forum or on X.