Apple MGIE: Multimodal Models for Image Editing

Apple, in collaboration with the University of California, has developed the open-source MGIE model for image editing based on text input. This model tackles various editing tasks, including Photoshop-style image modifications, global photo optimization, and local editing.

MGIE (MLLM-Guided Image Editing) leverages large multimodal language models (MLLMs) capable of processing both text and images to enhance the quality of image editing based on the input query. Previously, multimodal models were not employed for this task.

MGIE integrates MLLMs into the image editing process in two ways. Firstly, the model utilizes MLLM to extract specific instructions from the query. For instance, for the query “make the sky bluer,” MGIE generates the instruction “increase the saturation of the sky area by 20%.” Secondly, MLLM generates a hidden representation of the required edit at the pixel level. MGIE employs a novel end-to-end training scheme that jointly optimizes instruction generation modules, hidden representation generation, and image editing. Users can also refine queries for iterative editing.

MGIE caters to a wide range of editing scenarios, from simple color adjustments to complex object manipulations. The model can perform both global and local edits. Key features of MGIE include:

Instruction-based Editing: MGIE generates specific instructions that effectively control the editing process, enhancing editing quality and simplifying query composition.
Photoshop-style Modification: MGIE can execute typical Photoshop-style edits such as cropping, resizing, rotating, flipping, and applying filters. The model can also perform more complex edits like background alteration and object addition or removal.
Global Photo Optimization: MGIE can optimize overall photo quality, including brightness, contrast, sharpness, and color balance, and apply artistic effects such as sketching, painting, and caricaturing.
Local Editing: MGIE can edit specific areas or objects in the image, such as faces, eyes, hair, clothing, and accessories, including altering the attributes of these areas or objects (shape, size, color, texture, and style).

The MGIE code is available on GitHub, and the model can be tried out using the web demo on Hugging Face Spaces.

More from Neurohive