Reasoning / Neural Networks and Deep Learning

VibeThinker: 3B model reasons and codes at the level of flagship models

16 June 2026

https://neurohive.io/ru/ii-v-marketinge/pochemu-socseti-blokirujut-67-multiakkaunterov-v-pervye-3-dnya-analiz-500-otchetov/

VibeThinker: 3B model reasons and codes at the level of flagship models

Sina Weibo AI published VibeThinker-3B — a compact language model with just 3 billion parameters that matches flagship models DeepSeek V3.2 (671B), GLM-5 (744B), and Gemini 3 Pro on verifiable…

SenseNova-U1: NEO-unify multimodal architecture works directly with pixels without VAE

14 May 2026

SenseNova-U1: NEO-unify multimodal architecture works directly with pixels without VAE

SenseNova introduced a new multimodal architecture, SenseNova-U1, which combines image understanding, generation, and editing inside a single transformer without a separate visual encoder or variational autoencoder. This approach removes the…

OpenGame: AI Agent Generates Full Browser Games from Text Description

22 April 2026

OpenGame: AI Agent Generates Full Browser Games from Text Description

A team of researchers from CUHK MMLab published OpenGame — the first agentic framework for creating browser-based 2D games from natural language descriptions. The project is fully open: the framework…

ChatGPT Images 2.0: OpenAI Launches Image Generation Model With Reasoning, 2K Resolution, and Multilingual Text

22 April 2026

22 April 2026

ChatGPT Images 2.0: OpenAI Launches Image Generation Model With Reasoning, 2K Resolution, and Multilingual Text

22 April 2026

April 21, 2026, OpenAI released ChatGPT Images 2.0 powered by the gpt-image-2 model. According to LM Arena, the new model immediately took first place across all image generation categories with…

InCoder-32B-Thinking: Open-Source Code Generation Model for Microcontrollers, GPU Kernel Optimization, and RTL Design

7 April 2026

InCoder-32B-Thinking: Open-Source Code Generation Model for Microcontrollers, GPU Kernel Optimization, and RTL Design

A research team from Beihang University, Shanghai Jiao Tong University, the University of Manchester, and IQuest Research has published InCoder-32B-Thinking — a language model with an extended chain-of-thought reasoning for…

VBVR: 2 Million Videos for Reasoning Training — an Open Dataset That Changes the Rules

26 February 2026

VBVR: 2 Million Videos for Reasoning Training — an Open Dataset That Changes the Rules

A team of more than 50 researchers from around the world — from CMU, Oxford and other universities — has published Very Big Video Reasoning (VBVR), a massive dataset for…

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

19 February 2026

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

Zhipu AI and Tsinghua University have published a GLM-5 technical report — currently the top-performing open-weight language model by benchmarks: first place among open-weight models on Artificial Analysis and top-1…

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

10 February 2026

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

A research team from the Chinese company Baichuan has introduced Baichuan-M3 — an open medical language model that, instead of the traditional question-and-answer mode, conducts a full clinical dialogue, actively…

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

22 January 2026

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

A team of researchers from Fudan University and Shanghai Qĳi Zhifeng Co. introduced ABC-Bench — the first benchmark that tests the ability of AI agents to solve full-fledged backend development…

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

22 January 2026

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

Researchers from the University of Pennsylvania and Microsoft Research introduced Multiplex Thinking — a new reasoning method for large language models. The idea is to generate not one token at…

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

30 November 2025

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

P1-235B-A22B from Shanghai AI Laboratory became the first open-source model to win a gold medal at the latest International Physics Olympiad IPhO 2025, scoring 21.2 out of 30 points and…

Which AI Can Play a Villain: Comparing Alignment Algorithms Across 17 ModelsRetry

13 November 2025

13 November 2025

Which AI Can Play a Villain: Comparing Alignment Algorithms Across 17 ModelsRetry

13 November 2025

Researchers from Tencent Multimodal Department and Sun Yat-Sen University published a study on how large language models handle role-playing. It turns out that AI models perform mediocrely at role-playing: even…

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

12 November 2025

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

Researchers from Xiaohongshu introduced DeepEyesV2 — an agentic multimodal model based on Qwen2.5-VL-7B that can not only understand text and images but also actively use external tools: execute Python code…

From Millions Spent on “Thank You” to Efficient Inference: Boilerplate Detection in a Single Token

31 October 2025

From Millions Spent on “Thank You” to Efficient Inference: Boilerplate Detection in a Single Token

Researchers from JFrog published a study demonstrating a method for early detection of boilerplate responses in large language models after generating just a single token. The method enables computational cost…

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

16 October 2025

QeRL rainforcement learning quantization training speedup

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

QeRL is a framework for training language models using reinforcement learning that simultaneously reduces GPU requirements and surpasses traditional LoRA and QLoRA methods in accuracy. On the Qwen2.5-7B-Instruct model, QeRL…

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

10 September 2025

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

Researchers from ByteDance and the University of Hong Kong introduced Mini-o3 — a multimodal model that performs deep multi-step reasoning to solve complex visual search tasks. Mini-o3 achieves SOTA results…

NVIDIA Nemotron Nano 2: Reasoning and Code Generation Model Outperforms Qwen3-8B on Benchmarks and Supports 128k Context

20 August 2025

NVIDIA Nemotron Nano 2: Reasoning and Code Generation Model Outperforms Qwen3-8B on Benchmarks and Supports 128k Context

A team of NVIDIA researchers presented Nemotron-Nano-9B-v2 — a hybrid Mamba-Transformer language model that generates responses 6 times faster than Qwen3-8B on reasoning tasks while exceeding it in accuracy. The…

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

6 August 2025

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

Researchers from Shanghai University of Engineering Science and Peking University presented 3D-R1 — a new foundation model that significantly improves reasoning capabilities in three-dimensional vision-language models (VLM). The model demonstrates an average performance…

Gemini 2.5 Pro Achieved Gold Medal Performance at IMO 2025, Solving 5 of 6 Problems

25 July 2025

Gemini 2.5 Pro Achieved Gold Medal Performance at IMO 2025, Solving 5 of 6 Problems

Large language models perform well on mathematical benchmarks like AIME, however International Mathematical Olympiad (IMO) problems require deep understanding, creativity, and formal reasoning. Chinese researchers used Google Gemini 2.5 Pro…

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

4 May 2025

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

Microsoft recently unveiled Phi-4-reasoning, a 14-billion parameter model that achieves exceptional performance on complex reasoning tasks, outperforming models 5-47 times larger while requiring significantly less computational resources, with developers able…

DeepMath-103K: Advancing AI Reasoning Through Challenge

21 April 2025

DeepMath-103K: Advancing AI Reasoning Through Challenge

Mathematical reasoning stands as a crucial benchmark for artificial intelligence systems, requiring logical deduction, symbolic manipulation, and multi-step problem-solving. Recent breakthroughs in AI reasoning have been significantly driven by reinforcement…