EPFL Study: Language Models Don’t Translate Into English – They Operate Through Concepts

30 January 2025
unnamed

EPFL Study: Language Models Don’t Translate Into English – They Operate Through Concepts

New research from EPFL sheds light on the internal mechanisms of multilingual data processing in LLMs, which is critical for understanding how modern language models work and how to optimize…

MiniMax-01: 4M Context Length Benchmark Leader Powered by Lightning Attention

15 January 2025

MiniMax-01: 4M Context Length Benchmark Leader Powered by Lightning Attention

MiniMax has open-sourced its latest MiniMax-01 series, introducing two models that push the boundaries of context length and attention mechanisms: MiniMax-Text-01 for language processing and MiniMax-VL-01 for visual-language tasks. The…

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

7 January 2025
nvidia update ces 2025

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

Nvidia just announced a major shift in consumer AI computing at CES 2025, combining new GPUs with a platform for running foundation models locally. The announcement includes next-gen RTX 50…

FinRobot: Open Source Multi-Agent Framework for Automated Equity Research

16 November 2024
finrobot model

FinRobot: Open Source Multi-Agent Framework for Automated Equity Research

AI4Finance Foundation researchers have released FinRobot, the first AI agent framework specifically designed for equity research. Finrobot addresses key limitations of existing automated research tools by combining both quantitative and…

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

6 November 2024
SmolLM v2

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

Hugging Face has released SmolLM2 – a new family of compact language models with , demonstrates impressive performance against larger competitors, with its 1.7B parameter version outscoring Llama-1B and Qwen2.5-1.5B…

SynthID: DeepMind’s Open Source Approach for Generated Text Watermarking

31 October 2024
synthID deepmind text generator watermark

SynthID: DeepMind’s Open Source Approach for Generated Text Watermarking

DeepMind has released SynthID Text, expanding their established AI content authentication ecosystem to include text watermarking. This release, now available in Hugging Face Transformers v4.46.0+, follows DeepMind’s deployment of SynthID…

MinerU: Open-Source AI Solution Significantly Boosts Document Extraction Accuracy

30 September 2024
Structure AI document extraction ai

MinerU: Open-Source AI Solution Significantly Boosts Document Extraction Accuracy

Researchers from the Shanghai Artificial Intelligence Laboratory have developed MinerU, a cutting-edge open-source solution for precise document content extraction. MinerU is designed to extract and structure content from diverse document…

Molmo: Open Source Multimodal Vision-Language Models Outperform Gemini 1.5 and Claude 3.5

26 September 2024

Molmo: Open Source Multimodal Vision-Language Models Outperform Gemini 1.5 and Claude 3.5

Molmo is a new series of multimodal vision-language models (VLMs) created by researchers at the Allen Institute for AI and the University of Washington. The Molmo family outperforms many state-of-the-art…

OpenAI Launches o1 Model Family, Introducing Advanced Reasoning

13 September 2024
openai o1

OpenAI Launches o1 Model Family, Introducing Advanced Reasoning

OpenAI has introduced its new “o1” model family, marking a pivotal shift from the widely recognized GPT series. The o1 models — specifically o1-preview and o1-mini — are designed to…

Scaling Test-Time Compute: A New Paradigm in LLM Performance

27 August 2024
search types

Scaling Test-Time Compute: A New Paradigm in LLM Performance

Researchers from UC Berkeley and Google DeepMind published a groundbreaking paper titled “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters.” This paper introduces a transformative…

LongWriter: Open-Source Framework for Generating Texts Beyond 10,000 Words

19 August 2024
agentwrite

LongWriter: Open-Source Framework for Generating Texts Beyond 10,000 Words

LongWriter is a framework and a set of large language models (LLMs) designed specifically to enable ultra-long text generation, often exceeding 10,000 words while maintaining coherence, quality, and relevance. It…

Mistral Large 2: Leading the Way in Open Source AI Code Generation

25 July 2024
Performance accuracy on code generation benchmarks (all models were benchmarked through the same evaluation pipeline)

Mistral Large 2: Leading the Way in Open Source AI Code Generation

Mistral AI has announced Mistral Large 2, the latest iteration of its flagship model, setting a new state of the art (SOTA) in open-source code generation models. This new model…

LLaMA 3.1 Models Released: Open Source and Comparable to GPT-4

24 July 2024
llama 3.1 human evaluation

LLaMA 3.1 Models Released: Open Source and Comparable to GPT-4

LLAMA 3,1 models has been officially released, including the massive 405 billion-parameter LLaMA 3.1 405B model. Expanded context length to 128K, support for eight languages, and the introduction of LLaMA…

Google PH-LLM: A Language Model for Health Monitoring

16 June 2024
Google PH-LLM pipeline

Google PH-LLM: A Language Model for Health Monitoring

Google developed the PH-LLM language model to analyze medical data collected from wearable devices such as smartwatches and heart rate monitors. During experiments, the model answered health-related questions and predicted…

Zyda: 1.3T Dataset for Open Language Modeling

12 June 2024
zyda dataset composition

Zyda: 1.3T Dataset for Open Language Modeling

Zyda is a 1.3 trillion-token open-source dataset designed for open language modeling. Zyda integrates a range of high-quality open datasets, including RefinedWeb, Starcoder, C4, Pile, enhancing them through comprehensive filtering…

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

11 June 2024
Apple-WWDC24-Apple-Intelligence-OpenAI-deal

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

Apple’s Worldwide Developers Conference (WWDC) saw a major focus on artificial intelligence, introducing “Apple Intelligence” and a strategic partnership with OpenAI. These announcements highlight Apple’s commitment to integrating AI across…

Qwen2: A Leap Forward in Open Source Language Models

7 June 2024
qwen2-72b comparison

Qwen2: A Leap Forward in Open Source Language Models

In a momentous development, the highly anticipated transition from Qwen1.5 to Qwen2 has finally arrived, marking a significant milestone in the realm of language models. Qwen 2 outperforms Llama 3…

GPT-4 Trained to Predict Financial Metrics Better Than Analysts

26 May 2024
finance market analisys ai model

GPT-4 Trained to Predict Financial Metrics Better Than Analysts

Scientists from the University of Chicago have demonstrated that large language models can conduct financial statement analysis of companies with greater accuracy than professional analysts. These research findings could impact…

Google RecurrentGemma: Next-Gen Local Language Model

14 April 2024
recurrentgemma пщщпду

Google RecurrentGemma: Next-Gen Local Language Model

Google has introduced the RecurrentGemma language model, designed to operate locally on devices with limited resources such as smartphones, personal computers, and smart speakers. The new architecture from Google significantly…

Apple MGIE: Multimodal Models for Image Editing

12 February 2024
apple mgie

Apple MGIE: Multimodal Models for Image Editing

Apple, in collaboration with the University of California, has developed the open-source MGIE model for image editing based on text input. This model tackles various editing tasks, including Photoshop-style image…

Google Introduces Gemini, a Cutting-Edge Language Model Set

7 December 2023

Google Introduces Gemini, a Cutting-Edge Language Model Set

Google has announced the creation of Gemini, a set of three language models surpassing competitors in 30 out of 32 benchmarks. The top-tier model, Gemini Ultra, is available through an…