SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

6 November 2024
SmolLM v2

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

Hugging Face has released SmolLM2 – a new family of compact language models with , demonstrates impressive performance against larger competitors, with its 1.7B parameter version outscoring Llama-1B and Qwen2.5-1.5B…

xLAM and xGen-Sales: Salesforce’s Open Source AI Models for Sales Automation

9 September 2024
salesforce AI models open sourced xlam

xLAM and xGen-Sales: Salesforce’s Open Source AI Models for Sales Automation

Salesforce has taken a significant leap in AI development with the release of its xLAM family, introducing Large Action Models (LAMs) to enable more efficient and autonomous workflows. Unlike Large…

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

2 September 2024
mini-omni model architecture

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

Current academic language models still rely on external Text-to-Speech (TTS) systems, causing undesirable latency in speech synthesis. To address this, the Mini-Omni model introduces an audio-based, end-to-end conversational capability that…

Scaling Test-Time Compute: A New Paradigm in LLM Performance

27 August 2024
search types

Scaling Test-Time Compute: A New Paradigm in LLM Performance

Researchers from UC Berkeley and Google DeepMind published a groundbreaking paper titled “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters.” This paper introduces a transformative…

Mistral Large 2: Leading the Way in Open Source AI Code Generation

25 July 2024
Performance accuracy on code generation benchmarks (all models were benchmarked through the same evaluation pipeline)

Mistral Large 2: Leading the Way in Open Source AI Code Generation

Mistral AI has announced Mistral Large 2, the latest iteration of its flagship model, setting a new state of the art (SOTA) in open-source code generation models. This new model…

Claude 3.5 Sonnet: State-of-the-Art LLM by Anthropic Overtakes GPT-4o in Major Benchmarks

21 June 2024
claude 3.5 sonnet by anthropic

Claude 3.5 Sonnet: State-of-the-Art LLM by Anthropic Overtakes GPT-4o in Major Benchmarks

Anthropic has introduced the new large language model Claude 3.5 Sonnet. It is now available on the ClaudeAI chatbot, Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude 3.5…

Zyda: 1.3T Dataset for Open Language Modeling

12 June 2024
zyda dataset composition

Zyda: 1.3T Dataset for Open Language Modeling

Zyda is a 1.3 trillion-token open-source dataset designed for open language modeling. Zyda integrates a range of high-quality open datasets, including RefinedWeb, Starcoder, C4, Pile, enhancing them through comprehensive filtering…

NVIDIA DrEureka Model Accelerates Robot Training Faster Than Humans

12 May 2024
nvidia dreureka

NVIDIA DrEureka Model Accelerates Robot Training Faster Than Humans

NVIDIA has demonstrated that large language models can expedite robot training. Robots with four limbs trained using the DrEureka model outperform standard learning systems by 34% in real-world movement speed…

Google RecurrentGemma: Next-Gen Local Language Model

14 April 2024
recurrentgemma пщщпду

Google RecurrentGemma: Next-Gen Local Language Model

Google has introduced the RecurrentGemma language model, designed to operate locally on devices with limited resources such as smartphones, personal computers, and smart speakers. The new architecture from Google significantly…

Gretel: The Largest Open Text-to-SQL Dataset

7 April 2024
gretel dataset sql

Gretel: The Largest Open Text-to-SQL Dataset

Gretel, a startup specializing in generating high-quality synthetic data, has announced the creation of the largest open text-to-SQL dataset aimed at accelerating the development of no-code analytics tools. The dataset…

Microsoft ViSNet: Predicting Molecule Activity

3 March 2024
microsoft visnet

Microsoft ViSNet: Predicting Molecule Activity

Microsoft has unveiled ViSNet – a graph neural network modeling the geometry of complex molecules to predict their activity. ViSNet has the potential to significantly expedite the search for and…

DeepMind Trains AlphaGeometry Model to Solve Olympiad Geometry Problems

21 January 2024
AlphaGeometry

DeepMind Trains AlphaGeometry Model to Solve Olympiad Geometry Problems

DeepMind has unveiled AlphaGeometry – a model capable of solving geometric problems at the level of International Mathematical Olympiad winners. AlphaGeometry solved 25 out of 30 Olympiad problems, while on…

FractalGPT Launches Question-Answer Agent for Handling Loaded Documents

14 December 2023
fractalgpt

FractalGPT Launches Question-Answer Agent for Handling Loaded Documents

Developers at FractalGPT have rolled out a QA agent FractalGPT designed for interacting with documents, allowing users to engage in dialogues using uploaded PDF, TXT, and DOCX files. Key Features…

OpenAI DevDay 2023: GPTs, GPT-4 Turbo, and Other Updates from OpenAI

12 November 2023
openai devday2023

OpenAI DevDay 2023: GPTs, GPT-4 Turbo, and Other Updates from OpenAI

OpenAI introduced over ten products and features for developers at DevDay 2023. Here’s a rundown of the new models and API updates: The GPT-4 Turbo model, trained on data up…

Microsoft LeMa: Boosting Language Model Accuracy in Math

4 November 2023
Microsoft LeMa

Microsoft LeMa: Boosting Language Model Accuracy in Math

Microsoft researchers have introduced LeMa (Learning from Mistakes), an open-source algorithm designed to enhance the ability of large language models to solve mathematical problems. LeMa encourages models to learn from…

“Compact Giant” Mistral 7B Outperforms Llama 2 13B and Llama 34B

1 October 2023
Mistral 7B vs Llama 2

“Compact Giant” Mistral 7B Outperforms Llama 2 13B and Llama 34B

The Mistral AI team has unveiled the remarkable Mistral 7B – an open-source language model with a staggering 7.3 billion parameters, surpassing the significantly larger Llama 2 13B model in…

MIT Releases Free Lecture Course on TinyML & Efficient DL Computing on Youtube

29 September 2023
TinyML & Efficient DL Computing

MIT Releases Free Lecture Course on TinyML & Efficient DL Computing on Youtube

In recent years, large language and diffusion models have showcased impressive results. However, their demands on computational resources and memory consumption pose significant challenges for researchers and developers. The TinyML…

Google Apps Integration Elevates Bard Chatbot’s Capabilities

19 September 2023
bard_with_google_services

Google Apps Integration Elevates Bard Chatbot’s Capabilities

Google has rolled out an update for the Bard chatbot, introducing seamless integration with various Google apps, such as Gmail, Docs, Sheets, Maps, and YouTube. This integration propels Bard ahead…

Persimmon-8B: An Open Model with a 16k Token Context, Running on a Single GPU

11 September 2023
persimmon-8b-llm

Persimmon-8B: An Open Model with a 16k Token Context, Running on a Single GPU

Researchers from Adept have introduced the open-source language model Persimmon-8B with a 16k token context, which is four times larger than the most compact Llama 2 and text-davinci-002 used in…

Arthur Bench: Framework for Evaluating Language Models

20 August 2023
arthur bench

Arthur Bench: Framework for Evaluating Language Models

American startup Arthur has released an open-source framework called Bench for evaluating and comparing the performance of large language models. This tool enables users to select the most suitable language…

ReLoRA: Method for Enhancing Performance in Training Large Language Models

16 August 2023
relora method

ReLoRA: Method for Enhancing Performance in Training Large Language Models

ReLoRA is a technique for training large transformer-based language models using low-rank matrices, aimed at boosting training efficiency. The effectiveness of this method increases with the scale of the models.…