Open source / Neural Networks and Deep Learning

VBVR: 2 Million Videos for Reasoning Training — an Open Dataset That Changes the Rules

26 February 2026

VBVR: 2 Million Videos for Reasoning Training — an Open Dataset That Changes the Rules

A team of more than 50 researchers from around the world — from Berkeley, Stanford, CMU, Oxford and other universities — has published Very Big Video Reasoning (VBVR), a massive…

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

19 February 2026

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

Zhipu AI and Tsinghua University have published a GLM-5 technical report — currently the top-performing open-weight language model by benchmarks: first place among open-weight models on Artificial Analysis and top-1…

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

10 February 2026

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

A research team from the Chinese company Baichuan has introduced Baichuan-M3 — an open medical language model that, instead of the traditional question-and-answer mode, conducts a full clinical dialogue, actively…

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

22 January 2026

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

A team of researchers from Fudan University and Shanghai Qĳi Zhifeng Co. introduced ABC-Bench — the first benchmark that tests the ability of AI agents to solve full-fledged backend development…

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

22 January 2026

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

Researchers from the University of Pennsylvania and Microsoft Research introduced Multiplex Thinking — a new reasoning method for large language models. The idea is to generate not one token at…

Yume1.5: An Open Model for Creating Interactive Virtual Worlds with Keyboard Control

5 January 2026

Yume1.5: An Open Model for Creating Interactive Virtual Worlds with Keyboard Control

Researchers from Shanghai AI Laboratory and Fudan University published Yume1.5 — a model for generating interactive virtual worlds that can be controlled directly from the keyboard. Unlike regular video generation,…

Wan-Move: Open-Source Alternative to Kling 1.5 Pro for Motion-Controllable Video Generation

13 December 2025

Wan-Move: Open-Source Alternative to Kling 1.5 Pro for Motion-Controllable Video Generation

A team of researchers from Tongyi Lab (Alibaba Group), Tsinghua University, and the University of Hong Kong presented Wan-Move — a new approach to precise motion control in generative video…

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

30 November 2025

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

P1-235B-A22B from Shanghai AI Laboratory became the first open-source model to win a gold medal at the latest International Physics Olympiad IPhO 2025, scoring 21.2 out of 30 points and…

MiroThinker v1.0: Open-Source AI Research Agent Learns to Make Up to 600 Tool Calls Per Task

20 November 2025

MiroThinker v1.0: Open-Source AI Research Agent Learns to Make Up to 600 Tool Calls Per Task

The MiroMind team introduced MiroThinker v1.0 — an AI research agent capable of performing up to 600 tool calls per task with a 256K token context window. On four key…

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

12 November 2025

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

Researchers from Xiaohongshu introduced DeepEyesV2 — an agentic multimodal model based on Qwen2.5-VL-7B that can not only understand text and images but also actively use external tools: execute Python code…

Remote Labor Index: Top AI Agents Successfully Complete 2.5% of Freelance Projects

4 November 2025

4 November 2025

Remote Labor Index: Top AI Agents Successfully Complete 2.5% of Freelance Projects

4 November 2025

A team of researchers from the Center for AI Safety and Scale AI published the Remote Labor Index (RLI) — the first benchmark that tests whether AI agents can perform…

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

24 October 2025

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

Researchers from HKUST, Ant Group, Zhejiang University, and Northeastern University introduced Ditto — a comprehensive open framework addressing the training data scarcity problem in text-instruction-based video editing. The developers created…

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

16 October 2025

QeRL rainforcement learning quantization training speedup

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

QeRL is a framework for training language models using reinforcement learning that simultaneously reduces GPU requirements and surpasses traditional LoRA and QLoRA methods in accuracy. On the Qwen2.5-7B-Instruct model, QeRL…

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025

10 October 2025

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025

Researchers from China conducted a large-scale comparison of AI capabilities for stock trading using real market data. AI agents managed a portfolio of 20 Dow Jones Index stocks over 4…

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

2 October 2025

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

MinerU2.5 is a compact vision-language model with 1.2 billion parameters for PDF parsing, introduced by the Shanghai Artificial Intelligence Laboratory team. The model achieves state-of-the-art results in PDF parsing with…

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

30 September 2025

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

A team of researchers from NVIDIA, MIT, and other institutions introduced LongLive — a framework for real-time long video generation that allows users to control the narrative during video creation.…

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

17 September 2025

Tongyi-DeepResearch-30B-A3B results webweaver deepresearch

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

Researchers from Tongyi Lab (Alibaba Group) introduced WebWeaver — an open dual-agent framework for deep research that simulates the human research process. The framework consists of a planner, which iteratively…

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

10 September 2025

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

Researchers from ByteDance and the University of Hong Kong introduced Mini-o3 — a multimodal model that performs deep multi-step reasoning to solve complex visual search tasks. Mini-o3 achieves SOTA results…

Matrix-3D: Open Framework for Generating Fully Explorable 3D Worlds from a Single Image

14 August 2025

Matrix-3D: Open Framework for Generating Fully Explorable 3D Worlds from a Single Image

Researchers from Skywork AI and the Hong Kong University of Science and Technology have introduced Matrix-3D — a framework for creating fully explorable 3D worlds from a single image or…

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

6 August 2025

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

Researchers from Shanghai University of Engineering Science and Peking University presented 3D-R1 — a new foundation model that significantly improves reasoning capabilities in three-dimensional vision-language models (VLM). The model demonstrates an average performance…

Show-o2: Open-source 7B multimodal model outperforms 14B models on benchmarks using significantly less training data

11 July 2025

Show-o2: Open-source 7B multimodal model outperforms 14B models on benchmarks using significantly less training data

Researchers from Show Lab at the National University of Singapore and ByteDance introduced Show-o2 — a second-generation multimodal model that demonstrates superior results in image and video understanding and generation…