MinerU-Diffusion: A New Approach to OCR via Diffusion Decoding Speeds Up PDF Parsing 3× Without Accuracy Loss

27 March 2026
Miner-U-Diffusion

MinerU-Diffusion: A New Approach to OCR via Diffusion Decoding Speeds Up PDF Parsing 3× Without Accuracy Loss

A team from Shanghai Artificial Intelligence Laboratory and Peking University published MinerU-Diffusion — a document OCR framework that abandons classical autoregressive generation in favor of diffusion-based decoding. The project is…

daVinci-MagiHuman: Open 15B Model Generates a 5-Second Lip Sync Video in 2 Seconds on a Single H100

24 March 2026
daVinci-MagiHuman model

daVinci-MagiHuman: Open 15B Model Generates a 5-Second Lip Sync Video in 2 Seconds on a Single H100

SII-GAIR and Sand.ai have published daVinci-MagiHuman — an open-source multimodal 15B model based on a single-stream transformer that simultaneously generates video with precise lip sync and synchronized audio, producing a…

OpenClaw: The Lobster That Took Over the World — How One Developer Built the Most Popular Open-Source AI Agent in History

18 March 2026
openclaw AI agent

OpenClaw: The Lobster That Took Over the World — How One Developer Built the Most Popular Open-Source AI Agent in History

OpenClaw is a free and open-source AI agent created by Austrian developer Peter Steinberger in November 2025. An AI agent is a software wrapper around a language model that does…

OpenClaw-RL: A Framework That Updates an Agent’s Weights on the Fly, Learning from User and Environment Feedback

17 March 2026
OpenClaw-RL agent

OpenClaw-RL: A Framework That Updates an Agent’s Weights on the Fly, Learning from User and Environment Feedback

Researchers from Princeton University have introduced OpenClaw-RL, a framework that allows an AI agent to improve in real time — without a separate data collection stage and without manual annotation.…

Helios: 14B Model Generates Videos Longer Than 60 Seconds at 19.5 FPS on a Single H100

11 March 2026

Helios: 14B Model Generates Videos Longer Than 60 Seconds at 19.5 FPS on a Single H100

A team of researchers from Peking University and ByteDance published Helios — an autoregressive diffusion transformer with 14 billion parameters that generates video at 19.5 frames per second on a…

VBVR: 2 Million Videos for Reasoning Training — an Open Dataset That Changes the Rules

26 February 2026
VBVR dataset

VBVR: 2 Million Videos for Reasoning Training — an Open Dataset That Changes the Rules

A team of more than 50 researchers from around the world — from Berkeley, Stanford, CMU, Oxford and other universities — has published Very Big Video Reasoning (VBVR), a massive…

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

19 February 2026

GLM-5: Top-1 Open-Weight Model for Code and Text Generation, Competing with Claude and GPT on Agentic Tasks

Zhipu AI and Tsinghua University have published a GLM-5 technical report — currently the top-performing open-weight language model by benchmarks: first place among open-weight models on Artificial Analysis and top-1…

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

10 February 2026
Baichuan-M3

Baichuan-M3: An Open Medical Model That Conducts Consultations Like a Real Doctor and Outperforms GPT-5.2 on Benchmarks

A research team from the Chinese company Baichuan has introduced Baichuan-M3 — an open medical language model that, instead of the traditional question-and-answer mode, conducts a full clinical dialogue, actively…

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

22 January 2026
abc-bench-pipeline-workflow

Claude Sonnet 4.5 Leads on Comprehensive Backend Benchmark, Outperforming in Both Code and Environment Configuration

A team of researchers from Fudan University and Shanghai Qiji Zhifeng Co. introduced ABC-Bench — the first benchmark that tests the ability of AI agents to solve full-fledged backend development…

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

22 January 2026
multiplex thinking

Multiplex Thinking: Sampling 3 Tokens Instead of 1 Increases Olympiad Problem-Solving Accuracy from 40% to 55%

Researchers from the University of Pennsylvania and Microsoft Research introduced Multiplex Thinking — a new reasoning method for large language models. The idea is to generate not one token at…

Yume1.5: An Open Model for Creating Interactive Virtual Worlds with Keyboard Control

5 January 2026
yume 1.5 model

Yume1.5: An Open Model for Creating Interactive Virtual Worlds with Keyboard Control

Researchers from Shanghai AI Laboratory and Fudan University published Yume1.5 — a model for generating interactive virtual worlds that can be controlled directly from the keyboard. Unlike regular video generation,…

Wan-Move: Open-Source Alternative to Kling 1.5 Pro for Motion-Controllable Video Generation

13 December 2025
WAN_MOVE video editor

Wan-Move: Open-Source Alternative to Kling 1.5 Pro for Motion-Controllable Video Generation

A team of researchers from Tongyi Lab (Alibaba Group), Tsinghua University, and the University of Hong Kong presented Wan-Move — a new approach to precise motion control in generative video…

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

30 November 2025

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

P1-235B-A22B from Shanghai AI Laboratory became the first open-source model to win a gold medal at the latest International Physics Olympiad IPhO 2025, scoring 21.2 out of 30 points and…

MiroThinker v1.0: Open-Source AI Research Agent Learns to Make Up to 600 Tool Calls Per Task

20 November 2025
mirothinker v1.0 benchmarks comparison

MiroThinker v1.0: Open-Source AI Research Agent Learns to Make Up to 600 Tool Calls Per Task

The MiroMind team introduced MiroThinker v1.0 — an AI research agent capable of performing up to 600 tool calls per task with a 256K token context window. On four key…

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

12 November 2025
deepeyesv2-illustration

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

Researchers from Xiaohongshu introduced DeepEyesV2 — an agentic multimodal model based on Qwen2.5-VL-7B that can not only understand text and images but also actively use external tools: execute Python code…

Remote Labor Index: Top AI Agents Successfully Complete 2.5% of Freelance Projects

4 November 2025
Remote Labor AI benchmark open source

Remote Labor Index: Top AI Agents Successfully Complete 2.5% of Freelance Projects

A team of researchers from the Center for AI Safety and Scale AI published the Remote Labor Index (RLI) — the first benchmark that tests whether AI agents can perform…

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

24 October 2025
ditto editto framework

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

Researchers from HKUST, Ant Group, Zhejiang University, and Northeastern University introduced Ditto — a comprehensive open framework addressing the training data scarcity problem in text-instruction-based video editing. The developers created…

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

16 October 2025
QeRL rainforcement learning quantization training speedup

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

QeRL is a framework for training language models using reinforcement learning that simultaneously reduces GPU requirements and surpasses traditional LoRA and QLoRA methods in accuracy. On the Qwen2.5-7B-Instruct model, QeRL…

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025
stocks trading with AI

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

Researchers from China conducted a large-scale comparison of AI capabilities for stock trading using real market data. AI agents managed a portfolio of 20 Dow Jones Index stocks over 4…

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

2 October 2025
minerU2.5 comparison PDF parsing model

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

MinerU2.5 is a compact vision-language model with 1.2 billion parameters for PDF parsing, introduced by the Shanghai Artificial Intelligence Laboratory team. The model achieves state-of-the-art results in PDF parsing with…

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

30 September 2025
longlive 2

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

A team of researchers from NVIDIA, MIT, and other institutions introduced LongLive — a framework for real-time long video generation that allows users to control the narrative during video creation.…