Image Processing / Neural Networks and Deep Learning

SenseNova-U1: NEO-unify multimodal architecture works directly with pixels without VAE

14 May 2026

SenseNova-U1: NEO-unify multimodal architecture works directly with pixels without VAE

SenseNova introduced a new multimodal architecture, SenseNova-U1, which combines image understanding, generation, and editing inside a single transformer without a separate visual encoder or variational autoencoder. This approach removes the…

RealRestorer: Open-Source Image Enhancement Model Outperforms Nano Banana Pro on Real-World Benchmark

30 March 2026

Realresorer image restoration open model 2

RealRestorer: Open-Source Image Enhancement Model Outperforms Nano Banana Pro on Real-World Benchmark

A team of researchers from StepFun, Southern University of Science and Technology, and the Chinese Academy of Sciences has published RealRestorer — an open-source image quality enhancement model that removes…

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

2 October 2025

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

MinerU2.5 is a compact vision-language model with 1.2 billion parameters for PDF parsing, introduced by the Shanghai Artificial Intelligence Laboratory team. The model achieves state-of-the-art results in PDF parsing with…

ByteDance Unveil TA-TiTok Tokenizer Achieving SOTA in Text-to-Image Generation Using Only Public Data

19 January 2025

ByteDance Unveil TA-TiTok Tokenizer Achieving SOTA in Text-to-Image Generation Using Only Public Data

Researchers from ByteDance and POSTECH introduced TA-TiTok (Text-Aware Transformer-based 1-Dimensional Tokenizer), a novel approach to making text-to-image AI models more accessible and efficient. Their work demonstrates through MaskGen models how…

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

7 January 2025

7 January 2025

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

7 January 2025

Nvidia just announced a major shift in consumer AI computing at CES 2025, combining new GPUs with a platform for running foundation models locally. The announcement includes next-gen RTX 50…

ArtAug: Open Source Framework for Image Generation Models Enhancement

18 December 2024

ArtAug: Open Source Framework for Image Generation Models Enhancement

East China Normal University and Alibaba Group researchers have introduced ArtAug, a framework that enhances text-to-image generation through synthesis-understanding interaction. This approach significantly improves image quality without requiring extensive manual…

Ideogram 2.0: Generating Text on Images with Unmatched Accuracy

22 August 2024

Ideogram 2.0: Generating Text on Images with Unmatched Accuracy

Ideogram launched its groundbreaking Ideogram 2.0 model, setting new standards in the text-to-image generation space. Trained from scratch, Ideogram 2.0 significantly outperforms existing models in key quality metrics such as…

Midjourney Introduces Character Transfer Feature to New Images

17 March 2024

Midjourney Introduces Character Transfer Feature to New Images

The image generation service Midjourney now offers a character transfer feature to new images by specifying a link to an existing image with the character in the request. This functionality…

Apple MGIE: Multimodal Models for Image Editing

12 February 2024

Apple MGIE: Multimodal Models for Image Editing

Apple, in collaboration with the University of California, has developed the open-source MGIE model for image editing based on text input. This model tackles various editing tasks, including Photoshop-style image…

Microsoft DragNUWA: Video Generation via Object Trajectories

15 January 2024

Microsoft DragNUWA: Video Generation via Object Trajectories

Microsoft has released the DragNUWA weights – a cross-domain video generation model that offers more precise control over the resulting output compared to similar models. Control is achieved by simultaneously…

OpenAI Announced the Release of Dall-E 3 in Early October

20 September 2023

20 September 2023

OpenAI Announced the Release of Dall-E 3 in Early October

20 September 2023

OpenAI announced the release of Dall-E 3 in the ChatGPT interface in early October. Researchers revealed that the new version of the text-to-image models surpasses Dall-E 2 in several key aspects.…

Würstchen: An Open-Source Text-to-Image Model Consuming 16 Times Less GPU than Stable Diffusion 1.4

14 September 2023

Würstchen: An Open-Source Text-to-Image Model Consuming 16 Times Less GPU than Stable Diffusion 1.4

Würstchen is an open text-to-image model that generates images faster than diffusion models like Stable Diffusion while consuming significantly less memory, achieving comparable results. The approach is based on a…

Best AI Photo Generator Apps: Top 10 Selection

12 September 2023

Best AI Photo Generator Apps: Top 10 Selection

Which AI can draw pictures from words with maximum quality and minimal time investment? We have conducted research to find out the best AI photo generator apps that create images…

AI Photo Enhancer Online Apps Review: Improve Image Quality for Free

2 August 2023

AI Photo Enhancer Online Apps Review: Improve Image Quality for Free

In this article, we will explore AI photo enhancer online apps that improve image quality for free. The limit for free upscaling typically ranges from just 5 attempts to several…

Stability AI Introduces Stable Diffusion SDXL 1.0 Model

26 July 2023

26 July 2023

Stability AI Introduces Stable Diffusion SDXL 1.0 Model

26 July 2023

Stability AI has announced the release of Stable Diffusion SDXL 1.0, a new version of the popular image generation model. SDXL 1.0 is a foundational model with 3.5 billion parameters…

Google Bard Update: Image Processing and New Language Support

16 July 2023

Google Bard Update: Image Processing and New Language Support

Google Bard has undergone an update, expanding its functionality to 46 languages across more than 200 countries, including countries in Europe and Brazil. The latest features include image processing, dialog…

PACGen: Personalized and Controllable Text-to-Image Generation

7 July 2023

PACGen: Personalized and Controllable Text-to-Image Generation

Researchers from the University of Wisconsin-Madison have introduced a text-to-image diffusion model called PACGen (Personalized and Controllable Text-to-Image Generation) for transferring objects from one image to a new scene generated…

NVIDIA neural network generates realistic 3D worlds based on Minecraft

22 April 2021

NVIDIA neural network generates realistic 3D worlds based on Minecraft

Nvidia has unveiled GANcraft, a neural network for creating photorealistic images based on 3D block worlds, similar to the worlds in Minecraft. GANcraft creates a visualization of a world, taking…

The StyleCLIP neural network sets picture style based on a text description

9 April 2021

The StyleCLIP neural network sets picture style based on a text description

StyleCLIP is a combination of CLIP and StyleGAN models designed to manipulate image style with text prompts. The open-source code is available, including Google Colab notebooks. Why is it needed StyleGAN…

SAM: the neural network changes the age on the image of a person’s face

17 February 2021

SAM: the neural network changes the age on the image of a person’s face

SAM is a neural network model that changes the age of a person in an image. The model takes as input an image of a person’s face and target age.…

Semantic Data Augmentation Improves Neural Network’s Generalization

24 July 2020

24 July 2020

Semantic Data Augmentation Improves Neural Network’s Generalization

24 July 2020

A group of researchers from the University of Beijing has proposed a novel implicit semantic data augmentation method that improves the generalization capabilities of deep neural networks. Data augmentation has…