Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

22 May 2025
Диаграмма процесса обучения Visual-ARFT

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

A research team from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory has introduced Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT) — a new approach to training large multimodal models with…

Molmo: Open Source Multimodal Vision-Language Models Outperform Gemini 1.5 and Claude 3.5

26 September 2024

Molmo: Open Source Multimodal Vision-Language Models Outperform Gemini 1.5 and Claude 3.5

Molmo is a new series of multimodal vision-language models (VLMs) created by researchers at the Allen Institute for AI and the University of Washington. The Molmo family outperforms many state-of-the-art…