Open source / Neural Networks and Deep Learning

MiroThinker v1.0: Open-Source AI Research Agent Learns to Make Up to 600 Tool Calls Per Task

20 November 2025

MiroThinker v1.0: Open-Source AI Research Agent Learns to Make Up to 600 Tool Calls Per Task

The MiroMind team introduced MiroThinker v1.0 — an AI research agent capable of performing up to 600 tool calls per task with a 256K token context window. On four key…

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

19 November 2025

P1: First Open-Source Model to Win Gold at the International Physics Olympiad

P1-235B-A22B from Shanghai AI Laboratory became the first open-source model to win a gold medal at the latest International Physics Olympiad IPhO 2025, scoring 21.2 out of 30 points and…

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

12 November 2025

DeepEyesV2: Multimodal Model Learns to Use Tools to Solve Complex TasksRetry

Researchers from Xiaohongshu introduced DeepEyesV2 — an agentic multimodal model based on Qwen2.5-VL-7B that can not only understand text and images but also actively use external tools: execute Python code…

Remote Labor Index: Top AI Agents Successfully Complete 2.5% of Freelance Projects

4 November 2025

4 November 2025

Remote Labor Index: Top AI Agents Successfully Complete 2.5% of Freelance Projects

4 November 2025

A team of researchers from the Center for AI Safety and Scale AI published the Remote Labor Index (RLI) — the first benchmark that tests whether AI agents can perform…

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

24 October 2025

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

Researchers from HKUST, Ant Group, Zhejiang University, and Northeastern University introduced Ditto — a comprehensive open framework addressing the training data scarcity problem in text-instruction-based video editing. The developers created…

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

16 October 2025

QeRL rainforcement learning quantization training speedup

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

QeRL is a framework for training language models using reinforcement learning that simultaneously reduces GPU requirements and surpasses traditional LoRA and QLoRA methods in accuracy. On the Qwen2.5-7B-Instruct model, QeRL…

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025

10 October 2025

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025

Researchers from China conducted a large-scale comparison of AI capabilities for stock trading using real market data. AI agents managed a portfolio of 20 Dow Jones Index stocks over 4…

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

2 October 2025

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

MinerU2.5 is a compact vision-language model with 1.2 billion parameters for PDF parsing, introduced by the Shanghai Artificial Intelligence Laboratory team. The model achieves state-of-the-art results in PDF parsing with…

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

30 September 2025

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

A team of researchers from NVIDIA, MIT, and other institutions introduced LongLive — a framework for real-time long video generation that allows users to control the narrative during video creation.…

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

17 September 2025

Tongyi-DeepResearch-30B-A3B results webweaver deepresearch

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

Researchers from Tongyi Lab (Alibaba Group) introduced WebWeaver — an open dual-agent framework for deep research that simulates the human research process. The framework consists of a planner, which iteratively…

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

10 September 2025

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

Researchers from ByteDance and the University of Hong Kong introduced Mini-o3 — a multimodal model that performs deep multi-step reasoning to solve complex visual search tasks. Mini-o3 achieves SOTA results…

Matrix-3D: Open Framework for Generating Fully Explorable 3D Worlds from a Single Image

14 August 2025

Matrix-3D: Open Framework for Generating Fully Explorable 3D Worlds from a Single Image

Researchers from Skywork AI and the Hong Kong University of Science and Technology have introduced Matrix-3D — a framework for creating fully explorable 3D worlds from a single image or…

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

6 August 2025

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

Researchers from Shanghai University of Engineering Science and Peking University presented 3D-R1 — a new foundation model that significantly improves reasoning capabilities in three-dimensional vision-language models (VLM). The model demonstrates an average performance…

Show-o2: Open-source 7B multimodal model outperforms 14B models on benchmarks using significantly less training data

11 July 2025

Show-o2: Open-source 7B multimodal model outperforms 14B models on benchmarks using significantly less training data

Researchers from Show Lab at the National University of Singapore and ByteDance introduced Show-o2 — a second-generation multimodal model that demonstrates superior results in image and video understanding and generation…

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

8 July 2025

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

Researchers from Sakana AI have introduced Adaptive Branching Monte Carlo Tree Search (AB-MCTS) — a revolutionary approach to creating “dream teams” from large language models that allows them to dynamically…

MiniCPM4: Open Local Model Achieves Qwen3-8B Performance with 7x Inference Acceleration

15 June 2025

MiniCPM4: Open Local Model Achieves Qwen3-8B Performance with 7x Inference Acceleration

The OpenBMB research team presented MiniCPM4 — a highly efficient language model designed specifically for local devices. MiniCPM4-8B achieves comparable performance to Qwen3-8B (81.13 vs 80.55), while requiring 4.5 times…

Strict On-Policy Training with Optimal Baseline: Microsoft Introduces Simplified Algorithm for RLHF

4 June 2025

Strict On-Policy Training with Optimal Baseline: Microsoft Introduces Simplified Algorithm for RLHF

The Microsoft Research team introduced On-Policy RL with Optimal reward baseline (OPO) — a simplified reinforcement learning algorithm for aligning large language models. The new method addresses key problems of…

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

22 May 2025

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

A research team from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory has introduced Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT) — a new approach to training large multimodal models with…

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

19 May 2025

19 May 2025

NVIDIA Isaac robotics platform showing a humanoid robot interacting with objects

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

19 May 2025

NVIDIA continues to push the boundaries of AI-driven robotics with significant updates to its Isaac ecosystem, announced at COMPUTEX 2025. These innovations address key challenges in robotics development by enhancing…

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

9 May 2025

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

Alibaba’s NLP research team has officially open-sourced ZEROSEARCH, a complete framework for training LLMs to search without using real search engines. ZEROSEARCH builds on a key insight: LLMs have already…

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

4 May 2025

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

Microsoft recently unveiled Phi-4-reasoning, a 14-billion parameter model that achieves exceptional performance on complex reasoning tasks, outperforming models 5-47 times larger while requiring significantly less computational resources, with developers able…