Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

24 October 2025
ditto editto framework

Ditto: Open Framework for Text-Instruction-Based Video Style and Object Editing with 99% Frame Consistency

Researchers from HKUST, Ant Group, Zhejiang University, and Northeastern University introduced Ditto — a comprehensive open framework addressing the training data scarcity problem in text-instruction-based video editing. The developers created…

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

16 October 2025
QeRL rainforcement learning quantization training speedup

QeRL: Training 32B Models on Single H100 vs Three GPUs, Beating LoRA in Accuracy

QeRL is a framework for training language models using reinforcement learning that simultaneously reduces GPU requirements and surpasses traditional LoRA and QLoRA methods in accuracy. On the Qwen2.5-7B-Instruct model, QeRL…

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

10 October 2025
stocks trading with AI

Kimi-K2 and Qwen3-235B-Ins – Best AI Models for Stock Trading, Chinese Researchers Found

Researchers from China conducted a large-scale comparison of AI capabilities for stock trading using real market data. AI agents managed a portfolio of 20 Dow Jones Index stocks over 4…

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

2 October 2025
minerU2.5 comparison PDF parsing model

MinerU2.5: Open-Source 1.2B Model for PDF Parsing Outperforms Gemini 2.5 Pro on Benchmarks

MinerU2.5 is a compact vision-language model with 1.2 billion parameters for PDF parsing, introduced by the Shanghai Artificial Intelligence Laboratory team. The model achieves state-of-the-art results in PDF parsing with…

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

30 September 2025
longlive 2

LongLive — 1.3B Video Generation Model at 20.7 FPS with Real-Time Narrative Control

A team of researchers from NVIDIA, MIT, and other institutions introduced LongLive — a framework for real-time long video generation that allows users to control the narrative during video creation.…

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

17 September 2025
Tongyi-DeepResearch-30B-A3B results webweaver deepresearch

WebWeaver — Open Source Framework for Deep Research Outperforms OpenAI DeepResearch and Gemini Deep Research on Benchmarks

Researchers from Tongyi Lab (Alibaba Group) introduced WebWeaver — an open dual-agent framework for deep research that simulates the human research process. The framework consists of a planner, which iteratively…

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

10 September 2025
Observation mini-o3

Mini-o3: A Multimodal 7B Model Outperformed GPT-4o in Visual Search With 30-Step Reasoning Chains

Researchers from ByteDance and the University of Hong Kong introduced Mini-o3 — a multimodal model that performs deep multi-step reasoning to solve complex visual search tasks. Mini-o3 achieves SOTA results…

Matrix-3D: Open Framework for Generating Fully Explorable 3D Worlds from a Single Image

14 August 2025
matrix 3d

Matrix-3D: Open Framework for Generating Fully Explorable 3D Worlds from a Single Image

Researchers from Skywork AI and the Hong Kong University of Science and Technology have introduced Matrix-3D — a framework for creating fully explorable 3D worlds from a single image or…

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

6 August 2025
3D-R1 model

3D-R1: Open Source Reasoning Model for 3D Scenes Outperforms State-of-the-Art Methods by 10% on 3D Benchmarks

Researchers from Shanghai University of Engineering Science and Peking University presented 3D-R1 — a new foundation model that significantly improves reasoning capabilities in three-dimensional vision-language models (VLM). The model demonstrates an average performance…

Show-o2: Open-source 7B multimodal model outperforms 14B models on benchmarks using significantly less training data

11 July 2025

Show-o2: Open-source 7B multimodal model outperforms 14B models on benchmarks using significantly less training data

Researchers from Show Lab at the National University of Singapore and ByteDance introduced Show-o2 — a second-generation multimodal model that demonstrates superior results in image and video understanding and generation…

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

8 July 2025
treequest

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

Researchers from Sakana AI have introduced Adaptive Branching Monte Carlo Tree Search (AB-MCTS) — a revolutionary approach to creating “dream teams” from large language models that allows them to dynamically…

MiniCPM4: Open Local Model Achieves Qwen3-8B Performance with 7x Inference Acceleration

15 June 2025
end devices llm

MiniCPM4: Open Local Model Achieves Qwen3-8B Performance with 7x Inference Acceleration

The OpenBMB research team presented MiniCPM4 — a highly efficient language model designed specifically for local devices. MiniCPM4-8B achieves comparable performance to Qwen3-8B (81.13 vs 80.55), while requiring 4.5 times…

Strict On-Policy Training with Optimal Baseline: Microsoft Introduces Simplified Algorithm for RLHF

4 June 2025
On-Policy RL with Optimal Reward Baseline

Strict On-Policy Training with Optimal Baseline: Microsoft Introduces Simplified Algorithm for RLHF

The Microsoft Research team introduced On-Policy RL with Optimal reward baseline (OPO) — a simplified reinforcement learning algorithm for aligning large language models. The new method addresses key problems of…

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

22 May 2025
Диаграмма процесса обучения Visual-ARFT

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

A research team from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory has introduced Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT) — a new approach to training large multimodal models with…

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

19 May 2025
NVIDIA Isaac robotics platform showing a humanoid robot interacting with objects

NVIDIA Isaac 5.0: Enhanced Sensor Physics and Expanded Synthetic Data Generation

NVIDIA continues to push the boundaries of AI-driven robotics with significant updates to its Isaac ecosystem, announced at COMPUTEX 2025. These innovations address key challenges in robotics development by enhancing…

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

9 May 2025
zerosearch method

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

Alibaba’s NLP research team has officially open-sourced ZEROSEARCH, a complete framework for training LLMs to search without using real search engines. ZEROSEARCH builds on a key insight: LLMs have already…

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

4 May 2025
phi-4-reasoning-model

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

Microsoft recently unveiled Phi-4-reasoning, a 14-billion parameter model that achieves exceptional performance on complex reasoning tasks, outperforming models 5-47 times larger while requiring significantly less computational resources, with developers able…

DeepMath-103K: Advancing AI Reasoning Through Challenge

21 April 2025
deepmath reasoning dataset ai

DeepMath-103K: Advancing AI Reasoning Through Challenge

Mathematical reasoning stands as a crucial benchmark for artificial intelligence systems, requiring logical deduction, symbolic manipulation, and multi-step problem-solving. Recent breakthroughs in AI reasoning have been significantly driven by reinforcement…

MedSAM2: Open Source SOTA 3D Medical Image and Video Segmentation Model

13 April 2025
medsam2 human in the loop

MedSAM2: Open Source SOTA 3D Medical Image and Video Segmentation Model

Medical image segmentation plays a critical role in precision medicine, enabling more accurate diagnosis, treatment planning, and quantitative analysis. While significant progress has been made in developing both specialized and…

Llama Nemotron: NVIDIA Launches Family of Open Reasoning AI Models Overtaking DeepSeek R1

19 March 2025
llama nemotron 3.3

Llama Nemotron: NVIDIA Launches Family of Open Reasoning AI Models Overtaking DeepSeek R1

NVIDIA has announced the open Llama Nemotron family of models with reasoning capabilities, designed to provide a business-ready foundation for creating advanced AI agents. These models can work independently or…

Chain-of-Experts: Novel Approach Improving MoE Efficiency with up to 42% Memory Reduction

11 March 2025
CoE

Chain-of-Experts: Novel Approach Improving MoE Efficiency with up to 42% Memory Reduction

Chain-of-Experts (CoE) – a novel approach fundamentally changing how sparse language models process information, delivering better performance with significantly less memory. This breakthrough addresses key limitations in current Mixture-of-Experts (MoE)…