LLM / Neural Networks and Deep Learning

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

19 February 2026

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

Zhipu AI and Tsinghua University have published a GLM-5 technical report — currently the top-performing open-weight language model by benchmarks: first place among open-weight models on Artificial Analysis and top-1…

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

10 February 2026

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

A research team from the Chinese company Baichuan has introduced Baichuan-M3 — an open medical language model that, instead of the traditional question-and-answer mode, conducts a full clinical dialogue, actively…

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

22 January 2026

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

A team of researchers from Fudan University and Shanghai Qĳi Zhifeng Co. introduced ABC-Bench — the first benchmark that tests the ability of AI agents to solve full-fledged backend development…

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

30 November 2025

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

P1-235B-A22B from Shanghai AI Laboratory became the first open-source model to win a gold medal at the latest International Physics Olympiad IPhO 2025, scoring 21.2 out of 30 points and…

Which AI Can Play a Villain: Comparing Alignment Algorithms Across 17 ModelsRetry

13 November 2025

13 November 2025

Which AI Can Play a Villain: Comparing Alignment Algorithms Across 17 ModelsRetry

13 November 2025

Researchers from Tencent Multimodal Department and Sun Yat-Sen University published a study on how large language models handle role-playing. It turns out that AI models perform mediocrely at role-playing: even…

From Millions Spent on “Thank You” to Efficient Inference: Boilerplate Detection in a Single Token

31 October 2025

From Millions Spent on “Thank You” to Efficient Inference: Boilerplate Detection in a Single Token

Researchers from JFrog published a study demonstrating a method for early detection of boilerplate responses in large language models after generating just a single token. The method enables computational cost…

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025

10 October 2025

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025

Researchers from China conducted a large-scale comparison of AI capabilities for stock trading using real market data. AI agents managed a portfolio of 20 Dow Jones Index stocks over 4…

Hybrid Image Tokenizer: Apple’s New Approach to Multimodal Models

22 September 2025

Hybrid Image Tokenizer: Apple’s New Approach to Multimodal Models

Apple Research Team introduced Manzano — a unified multimodal large language model that combines visual content understanding and generation capabilities through a hybrid image tokenizer and carefully designed training strategy.…

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

17 September 2025

Tongyi-DeepResearch-30B-A3B results webweaver deepresearch

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

Researchers from Tongyi Lab (Alibaba Group) introduced WebWeaver — an open dual-agent framework for deep research that simulates the human research process. The framework consists of a planner, which iteratively…

Google Launches Gemini 2.5 Flash Image with Text-Based Editing Capabilities

26 August 2025

Google Launches Gemini 2.5 Flash Image with Text-Based Editing Capabilities

Google introduced Gemini 2.5 Flash Image (with internal codename nano-banana) — a model for image generation and editing. The model supports combining multiple images into one, maintains character consistency between…

How To Choose A Generative AI Platform

26 August 2025

How To Choose A Generative AI Platform

Most teams outgrow single-model tools once they need governance, repeatability, and multi-model routing. This guide shows what belongs in a Generative AI platform and how to evaluate options with architecture-level…

NVIDIA Nemotron Nano 2: Reasoning and Code Generation Model Outperforms Qwen3-8B on Benchmarks and Supports 128k Context

20 August 2025

NVIDIA Nemotron Nano 2: Reasoning and Code Generation Model Outperforms Qwen3-8B on Benchmarks and Supports 128k Context

A team of NVIDIA researchers presented Nemotron-Nano-9B-v2 — a hybrid Mamba-Transformer language model that generates responses 6 times faster than Qwen3-8B on reasoning tasks while exceeding it in accuracy. The…

Seed Diffusion: New State-of-the-Art in Speed-Quality Balance for Code Generation Models

6 August 2025

Seed Diffusion: New State-of-the-Art in Speed-Quality Balance for Code Generation Models

The research team from ByteDance Seed in collaboration with the AIR Institute of Tsinghua University introduced Seed Diffusion Preview — a language model based on discrete diffusion that demonstrates record-breaking…

Gemini 2.5 Pro Achieved Gold Medal Performance at IMO 2025, Solving 5 of 6 Problems

25 July 2025

Gemini 2.5 Pro Achieved Gold Medal Performance at IMO 2025, Solving 5 of 6 Problems

Large language models perform well on mathematical benchmarks like AIME, however International Mathematical Olympiad (IMO) problems require deep understanding, creativity, and formal reasoning. Chinese researchers used Google Gemini 2.5 Pro…

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

8 July 2025

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

Researchers from Sakana AI have introduced Adaptive Branching Monte Carlo Tree Search (AB-MCTS) — a revolutionary approach to creating “dream teams” from large language models that allows them to dynamically…

MiniCPM4: Open Local Model Achieves Qwen3-8B Performance with 7x Inference Acceleration

15 June 2025

MiniCPM4: Open Local Model Achieves Qwen3-8B Performance with 7x Inference Acceleration

The OpenBMB research team presented MiniCPM4 — a highly efficient language model designed specifically for local devices. MiniCPM4-8B achieves comparable performance to Qwen3-8B (81.13 vs 80.55), while requiring 4.5 times…

Strict On-Policy Training with Optimal Baseline: Microsoft Introduces Simplified Algorithm for RLHF

4 June 2025

Strict On-Policy Training with Optimal Baseline: Microsoft Introduces Simplified Algorithm for RLHF

The Microsoft Research team introduced On-Policy RL with Optimal reward baseline (OPO) — a simplified reinforcement learning algorithm for aligning large language models. The new method addresses key problems of…

Mistral Agents API: AI Agent Framework with Web Search, Code Generation, and Image Generation Capabilities

28 May 2025

Mistral Agents API: AI Agent Framework with Web Search, Code Generation, and Image Generation Capabilities

French startup Mistral AI introduced Agents API — a framework for creating autonomous AI agents with built-in connectors, persistent memory, and orchestration capabilities. Developers can create unlimited agents and build…

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

22 May 2025

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

A research team from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory has introduced Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT) — a new approach to training large multimodal models with…

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

9 May 2025

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

Alibaba’s NLP research team has officially open-sourced ZEROSEARCH, a complete framework for training LLMs to search without using real search engines. ZEROSEARCH builds on a key insight: LLMs have already…

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

4 May 2025

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

Microsoft recently unveiled Phi-4-reasoning, a 14-billion parameter model that achieves exceptional performance on complex reasoning tasks, outperforming models 5-47 times larger while requiring significantly less computational resources, with developers able…