NLP / Neural Networks and Deep Learning

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

8 July 2025

TreeQuest Framework: Adaptive LLM Teams Outperform Individual Models by 30%

Researchers from Sakana AI have introduced Adaptive Branching Monte Carlo Tree Search (AB-MCTS) — a revolutionary approach to creating “dream teams” from large language models that allows them to dynamically…

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

22 May 2025

Visual-ARFT: Multimodal AI Agents Outperform GPT-4o by 18.6% in Complex Visual Tasks

A research team from Shanghai Jiao Tong University and Shanghai Artificial Intelligence Laboratory has introduced Visual Agentic Reinforcement Fine-Tuning (Visual-ARFT) — a new approach to training large multimodal models with…

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

9 May 2025

ZEROSEARCH: A Framework That Cuts LLM Search Training Costs by 88%

Alibaba’s NLP research team has officially open-sourced ZEROSEARCH, a complete framework for training LLMs to search without using real search engines. ZEROSEARCH builds on a key insight: LLMs have already…

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

4 May 2025

Phi-4-reasoning: Microsoft’s Breakthrough in AI Thinking

Microsoft recently unveiled Phi-4-reasoning, a 14-billion parameter model that achieves exceptional performance on complex reasoning tasks, outperforming models 5-47 times larger while requiring significantly less computational resources, with developers able…

Claude for Education: Revolutionizing Higher Education with AI-Powered Learning

3 April 2025

3 April 2025

Claude for Education: Revolutionizing Higher Education with AI-Powered Learning

3 April 2025

Anthropic has released Claude for Education, specifically designed for implementation in universities and other higher education institutions. While the classic chatbot provides direct answers to questions, Claude for Education uses…

Chain-of-Experts: Novel Approach Improving MoE Efficiency with up to 42% Memory Reduction

11 March 2025

Chain-of-Experts: Novel Approach Improving MoE Efficiency with up to 42% Memory Reduction

Chain-of-Experts (CoE) – a novel approach fundamentally changing how sparse language models process information, delivering better performance with significantly less memory. This breakthrough addresses key limitations in current Mixture-of-Experts (MoE)…

EPFL Study: Language Models Don’t Translate Into English – They Operate Through Concepts

30 January 2025

EPFL Study: Language Models Don’t Translate Into English – They Operate Through Concepts

New research from EPFL sheds light on the internal mechanisms of multilingual data processing in LLMs, which is critical for understanding how modern language models work and how to optimize…

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

6 November 2024

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

Hugging Face has released SmolLM2 – a new family of compact language models with , demonstrates impressive performance against larger competitors, with its 1.7B parameter version outscoring Llama-1B and Qwen2.5-1.5B…

xLAM and xGen-Sales: Salesforce’s Open Source AI Models for Sales Automation

9 September 2024

xLAM and xGen-Sales: Salesforce’s Open Source AI Models for Sales Automation

Salesforce has taken a significant leap in AI development with the release of its xLAM family, introducing Large Action Models (LAMs) to enable more efficient and autonomous workflows. Unlike Large…

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

2 September 2024

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

Current academic language models still rely on external Text-to-Speech (TTS) systems, causing undesirable latency in speech synthesis. To address this, the Mini-Omni model introduces an audio-based, end-to-end conversational capability that…

Scaling Test-Time Compute: A New Paradigm in LLM Performance

27 August 2024

Scaling Test-Time Compute: A New Paradigm in LLM Performance

Researchers from UC Berkeley and Google DeepMind published a groundbreaking paper titled “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters.” This paper introduces a transformative…

Mistral Large 2: Leading the Way in Open Source AI Code Generation

25 July 2024

Performance accuracy on code generation benchmarks (all models were benchmarked through the same evaluation pipeline)

Mistral Large 2: Leading the Way in Open Source AI Code Generation

Mistral AI has announced Mistral Large 2, the latest iteration of its flagship model, setting a new state of the art (SOTA) in open-source code generation models. This new model…

Claude 3.5 Sonnet: State-of-the-Art LLM by Anthropic Overtakes GPT-4o in Major Benchmarks

21 June 2024

Claude 3.5 Sonnet: State-of-the-Art LLM by Anthropic Overtakes GPT-4o in Major Benchmarks

Anthropic has introduced the new large language model Claude 3.5 Sonnet. It is now available on the ClaudeAI chatbot, Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude 3.5…

Zyda: 1.3T Dataset for Open Language Modeling

12 June 2024

Zyda: 1.3T Dataset for Open Language Modeling

Zyda is a 1.3 trillion-token open-source dataset designed for open language modeling. Zyda integrates a range of high-quality open datasets, including RefinedWeb, Starcoder, C4, Pile, enhancing them through comprehensive filtering…

NVIDIA DrEureka Model Accelerates Robot Training Faster Than Humans

12 May 2024

NVIDIA DrEureka Model Accelerates Robot Training Faster Than Humans

NVIDIA has demonstrated that large language models can expedite robot training. Robots with four limbs trained using the DrEureka model outperform standard learning systems by 34% in real-world movement speed…

Google RecurrentGemma: Next-Gen Local Language Model

14 April 2024

14 April 2024

Google RecurrentGemma: Next-Gen Local Language Model

14 April 2024

Google has introduced the RecurrentGemma language model, designed to operate locally on devices with limited resources such as smartphones, personal computers, and smart speakers. The new architecture from Google significantly…

Gretel: The Largest Open Text-to-SQL Dataset

7 April 2024

Gretel: The Largest Open Text-to-SQL Dataset

Gretel, a startup specializing in generating high-quality synthetic data, has announced the creation of the largest open text-to-SQL dataset aimed at accelerating the development of no-code analytics tools. The dataset…

Microsoft ViSNet: Predicting Molecule Activity

3 March 2024

3 March 2024

Microsoft ViSNet: Predicting Molecule Activity

3 March 2024

Microsoft has unveiled ViSNet – a graph neural network modeling the geometry of complex molecules to predict their activity. ViSNet has the potential to significantly expedite the search for and…