State-of-the-Art / Neural Networks and Deep Learning

MIRA: A World Model Fully Simulates Rocket League Without Requiring You to Install the Game Itself

8 July 2026

MIRA world model rocket league simulation AI

MIRA: A World Model Fully Simulates Rocket League Without Requiring You to Install the Game Itself

Teams from General Intuition, Kyutai, and Epic Games introduced MIRA — a world model that fully simulates the Rocket League game environment for four players at once and draws each…

LFM2.5-230M: An Ultra-Compact Model Runs on a Raspberry Pi and Almost Any Modern Phone

29 June 2026

LFM2.5-230M: An Ultra-Compact Model Runs on a Raspberry Pi and Almost Any Modern Phone

Liquid AI released LFM2.5-230M — one of the smallest language models out there today, at just 230 million parameters. It’s compact enough to run on a small device without trouble:…

DreamX-World-5B: An Open-Source World Model with Camera Control, Text-Based Control, and Location Memory

17 June 2026

DreamX-World-5B: An Open-Source World Model with Camera Control, Text-Based Control, and Location Memory

The AMAP-ML team has published DreamX-World 1.0, an interactive generative world model that turns text or an image into a controllable video with precise camera control, memory of previously visited…

VibeThinker: 3B model reasons and codes at the level of flagship models

16 June 2026

https://neurohive.io/ru/ii-v-marketinge/pochemu-socseti-blokirujut-67-multiakkaunterov-v-pervye-3-dnya-analiz-500-otchetov/

VibeThinker: 3B model reasons and codes at the level of flagship models

Sina Weibo AI published VibeThinker-3B — a compact language model with just 3 billion parameters that matches flagship models DeepSeek V3.2 (671B), GLM-5 (744B), and Gemini 3 Pro on verifiable…

ESM Cambrian: protein language model outperformed Google’s AlphaFold3 and built the largest atlas of the protein world

4 June 2026

ESM Cambrian: protein language model outperformed Google’s AlphaFold3 and built the largest atlas of the protein world

A team of researchers from Biohub published ESM Cambrian (ESMC) — a language model for protein structure prediction and design that outperformed AlphaFold3 by Google on structure prediction accuracy, designed…

LLaVA-OneVision-2: Multimodal Model Analyzes Compressed Video Stream Through a Codec Instead of Frame Sampling

28 May 2026

LLaVA-OneVision-2: Multimodal Model Analyzes Compressed Video Stream Through a Codec Instead of Frame Sampling

Researchers from Glint Lab, AIM for Health Lab, and MVP Lab published LLaVA-OneVision-2 (LLaVA-OV-2) — a next-generation multimodal model that rethinks how a neural network “watches” video. Instead of slicing…

LongLive-2.0 — 5B Model Generates Long Video at 720p in Real Time

20 May 2026

LongLive-2.0 — 5B Model Generates Long Video at 720p in Real Time

Researchers from NVIDIA have published LongLive-2.0 — an infrastructure for training and running long video generation models using NVFP4 4-bit precision quantization. Quantization is the compression of model weights by…

SenseNova-U1: NEO-unify multimodal architecture works directly with pixels without VAE

14 May 2026

SenseNova-U1: NEO-unify multimodal architecture works directly with pixels without VAE

SenseNova introduced a new multimodal architecture, SenseNova-U1, which combines image understanding, generation, and editing inside a single transformer without a separate visual encoder or variational autoencoder. This approach removes the…

OpenSeeker-v2: Best-in-Class Deep Research Agent Built by an Academic Team on Just 10,600 Samples

7 May 2026

OpenSeeker-v2: Best-in-Class Deep Research Agent Built by an Academic Team on Just 10,600 Samples

Researchers from Shanghai Jiao Tong University have proven that building a best-in-class deep research agent doesn’t require hundreds of billions of pre-training tokens or expensive reinforcement learning. Just 10,600 carefully…

OpenGame: AI Agent Generates Full Browser Games from Text Description

22 April 2026

OpenGame: AI Agent Generates Full Browser Games from Text Description

A team of researchers from CUHK MMLab published OpenGame — the first agentic framework for creating browser-based 2D games from natural language descriptions. The project is fully open: the framework…

InCoder-32B-Thinking: Open-Source Code Generation Model for Microcontrollers, GPU Kernel Optimization, and RTL Design

7 April 2026

InCoder-32B-Thinking: Open-Source Code Generation Model for Microcontrollers, GPU Kernel Optimization, and RTL Design

A research team from Beihang University, Shanghai Jiao Tong University, the University of Manchester, and IQuest Research has published InCoder-32B-Thinking — a language model with an extended chain-of-thought reasoning for…

Trinity-Large-Thinking 400B: an open model matching Claude Opus-4.6 on agentic benchmarks at 28x lower price

3 April 2026

Trinity-Large-Thinking 400B: an open model matching Claude Opus-4.6 on agentic benchmarks at 28x lower price

Arcee AI has released Trinity-Large-Thinking — an open-weight reasoning model for complex multi-turn agentic tasks. On PinchBench — a comprehensive benchmark for AI agents — it ranks second among all…

PixelSmile: Open Model for Facial Expression Editing with Smooth Intensity Control

31 March 2026

PixelSmile: Open Model for Facial Expression Editing with Smooth Intensity Control

Researchers from Fudan University and StepFun have published PixelSmile — a diffusion model for precise facial expression editing in portraits and anime images. Instead of training on discrete labels like…

RealRestorer: Open-Source Image Enhancement Model Outperforms Nano Banana Pro on Real-World Benchmark

30 March 2026

Realresorer image restoration open model 2

RealRestorer: Open-Source Image Enhancement Model Outperforms Nano Banana Pro on Real-World Benchmark

A team of researchers from StepFun, Southern University of Science and Technology, and the Chinese Academy of Sciences has published RealRestorer — an open-source image quality enhancement model that removes…

MinerU-Diffusion: A New Approach to OCR via Diffusion Decoding Speeds Up PDF Parsing 3× Without Accuracy Loss

27 March 2026

MinerU-Diffusion: A New Approach to OCR via Diffusion Decoding Speeds Up PDF Parsing 3× Without Accuracy Loss

A team from Shanghai Artificial Intelligence Laboratory and Peking University published MinerU-Diffusion — a document OCR framework that abandons classical autoregressive generation in favor of diffusion-based decoding. The project is…

daVinci-MagiHuman: Open 15B Model Generates a 5-Second Lip Sync Video in 2 Seconds on a Single H100

24 March 2026

daVinci-MagiHuman: Open 15B Model Generates a 5-Second Lip Sync Video in 2 Seconds on a Single H100

SII-GAIR and Sand.ai have published daVinci-MagiHuman — an open-source multimodal 15B model based on a single-stream transformer that simultaneously generates video with precise lip sync and synchronized audio, producing a…

Helios: 14B Model Generates Videos Longer Than 60 Seconds at 19.5 FPS on a Single H100

11 March 2026

Helios: 14B Model Generates Videos Longer Than 60 Seconds at 19.5 FPS on a Single H100

A team of researchers from Peking University and ByteDance published Helios — an autoregressive diffusion transformer with 14 billion parameters that generates video at 19.5 frames per second on a…

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

10 February 2026

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

A research team from the Chinese company Baichuan has introduced Baichuan-M3 — an open medical language model that, instead of the traditional question-and-answer mode, conducts a full clinical dialogue, actively…

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

22 January 2026

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

A team of researchers from Fudan University and Shanghai Qĳi Zhifeng Co. introduced ABC-Bench — the first benchmark that tests the ability of AI agents to solve full-fledged backend development…

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

22 January 2026

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

Researchers from the University of Pennsylvania and Microsoft Research introduced Multiplex Thinking — a new reasoning method for large language models. The idea is to generate not one token at…

Yume1.5: An Open Model for Creating Interactive Virtual Worlds with Keyboard Control

5 January 2026

Yume1.5: An Open Model for Creating Interactive Virtual Worlds with Keyboard Control

Researchers from Shanghai AI Laboratory and Fudan University published Yume1.5 — a model for generating interactive virtual worlds that can be controlled directly from the keyboard. Unlike regular video generation,…