ArtAug: Open Source Framework for Image Generation Models Enhancement

18 December 2024
Enhancing Text-to-Image Generation

ArtAug: Open Source Framework for Image Generation Models Enhancement

East China Normal University and Alibaba Group researchers have introduced ArtAug, a framework that enhances text-to-image generation through synthesis-understanding interaction. This approach significantly improves image quality without requiring extensive manual…

X-MeshGraphNet: NVIDIA’s Scalable Solution for Physics Simulation Using Graph Neural Networks

27 November 2024
Illustration of the partitioning scheme with Halo on a Koenigsegg car.

X-MeshGraphNet: NVIDIA’s Scalable Solution for Physics Simulation Using Graph Neural Networks

NVIDIA researchers have introduced X-MeshGraphNet, a novel extension of MeshGraphNet that addresses key scalability and practical limitations in physics simulation. The open-source framework, now available through NVIDIA Modulus, enables efficient…

FinRobot: Open Source Multi-Agent Framework for Automated Equity Research

16 November 2024
finrobot model

FinRobot: Open Source Multi-Agent Framework for Automated Equity Research

AI4Finance Foundation researchers have released FinRobot, the first AI agent framework specifically designed for equity research. Finrobot addresses key limitations of existing automated research tools by combining both quantitative and…

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

6 November 2024
SmolLM v2

SmolLM2: Open Source Compact LLM by Hugging Face Outscoring Llama-1B and Qwen2.5-1.5B

Hugging Face has released SmolLM2 – a new family of compact language models with , demonstrates impressive performance against larger competitors, with its 1.7B parameter version outscoring Llama-1B and Qwen2.5-1.5B…

SynthID: DeepMind’s Open Source Approach for Generated Text Watermarking

31 October 2024
synthID deepmind text generator watermark

SynthID: DeepMind’s Open Source Approach for Generated Text Watermarking

DeepMind has released SynthID Text, expanding their established AI content authentication ecosystem to include text watermarking. This release, now available in Hugging Face Transformers v4.46.0+, follows DeepMind’s deployment of SynthID…

Mochi 1: Open-Source Video Generation Model by Genmo

23 October 2024

Mochi 1: Open-Source Video Generation Model by Genmo

Genmo AI has introduced Mochi 1, an open-source video generation model featuring Asymmetric Diffusion Transformer (AsymmDiT) architecture. With 10 billion parameters, it closes the gap between closed and open systems,…

MinerU: Open-Source AI Solution Significantly Boosts Document Extraction Accuracy

30 September 2024
Structure AI document extraction ai

MinerU: Open-Source AI Solution Significantly Boosts Document Extraction Accuracy

Researchers from the Shanghai Artificial Intelligence Laboratory have developed MinerU, a cutting-edge open-source solution for precise document content extraction. MinerU is designed to extract and structure content from diverse document…

Molmo: Open Source Multimodal Vision-Language Models Outperform Gemini 1.5 and Claude 3.5

26 September 2024

Molmo: Open Source Multimodal Vision-Language Models Outperform Gemini 1.5 and Claude 3.5

Molmo is a new series of multimodal vision-language models (VLMs) created by researchers at the Allen Institute for AI and the University of Washington. The Molmo family outperforms many state-of-the-art…

EzAudio: Open Source Hyperrealistic Text-to-Audio Model

19 September 2024
ezaudio text-to-audio model generation ai

EzAudio: Open Source Hyperrealistic Text-to-Audio Model

EzAudio, a new transformer-based text-to-audio (T2A) diffusion model developed by researchers from Tencent AI Lab and Johns Hopkins University. EzAudio addresses key challenges in T2A generation, including generation quality, computational…

xLAM and xGen-Sales: Salesforce’s Open Source AI Models for Sales Automation

9 September 2024
salesforce AI models open sourced xlam

xLAM and xGen-Sales: Salesforce’s Open Source AI Models for Sales Automation

Salesforce has taken a significant leap in AI development with the release of its xLAM family, introducing Large Action Models (LAMs) to enable more efficient and autonomous workflows. Unlike Large…

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

2 September 2024
mini-omni model architecture

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

Current academic language models still rely on external Text-to-Speech (TTS) systems, causing undesirable latency in speech synthesis. To address this, the Mini-Omni model introduces an audio-based, end-to-end conversational capability that…

LongWriter: Open-Source Framework for Generating Texts Beyond 10,000 Words

19 August 2024
agentwrite

LongWriter: Open-Source Framework for Generating Texts Beyond 10,000 Words

LongWriter is a framework and a set of large language models (LLMs) designed specifically to enable ultra-long text generation, often exceeding 10,000 words while maintaining coherence, quality, and relevance. It…

Mistral Large 2: Leading the Way in Open Source AI Code Generation

25 July 2024
Performance accuracy on code generation benchmarks (all models were benchmarked through the same evaluation pipeline)

Mistral Large 2: Leading the Way in Open Source AI Code Generation

Mistral AI has announced Mistral Large 2, the latest iteration of its flagship model, setting a new state of the art (SOTA) in open-source code generation models. This new model…

LLaMA 3.1 Models Released: Open Source and Comparable to GPT-4

24 July 2024
llama 3.1 human evaluation

LLaMA 3.1 Models Released: Open Source and Comparable to GPT-4

LLAMA 3,1 models has been officially released, including the massive 405 billion-parameter LLaMA 3.1 405B model. Expanded context length to 128K, support for eight languages, and the introduction of LLaMA…

MindsDB: Transforming Databases with Open Source AI

12 July 2024
ai for database enterprise

MindsDB: Transforming Databases with Open Source AI

MindsDB is transforming how AI integrates with databases by embedding machine learning models directly within them. This approach allows users to leverage AI capabilities without altering their existing database infrastructure.…

Unique3D Model Generates 3D Mesh from a Single Image in 30 Seconds

27 June 2024
unique 3d

Unique3D Model Generates 3D Mesh from a Single Image in 30 Seconds

Unique3D is a state-of-the-art model for 3D mesh generation from single images, noted for its efficiency and high fidelity. Unique3D is code and weights are available open source. This approach…

Zyda: 1.3T Dataset for Open Language Modeling

12 June 2024
zyda dataset composition

Zyda: 1.3T Dataset for Open Language Modeling

Zyda is a 1.3 trillion-token open-source dataset designed for open language modeling. Zyda integrates a range of high-quality open datasets, including RefinedWeb, Starcoder, C4, Pile, enhancing them through comprehensive filtering…

Hugging Face and Pollen Robotics Introduce Reachy2 – an Open-Source Robot for Household Tasks

10 June 2024

Hugging Face and Pollen Robotics Introduce Reachy2 – an Open-Source Robot for Household Tasks

Hugging Face and Pollen Robotics unveiled the anthropomorphic robot Reachy2, whose training dataset and model are open-source. Reachy2 performs household tasks and interacts safely with people and pets. Pollen Robotics…

Qwen2: A Leap Forward in Open Source Language Models

7 June 2024
qwen2-72b comparison

Qwen2: A Leap Forward in Open Source Language Models

In a momentous development, the highly anticipated transition from Qwen1.5 to Qwen2 has finally arrived, marking a significant milestone in the realm of language models. Qwen 2 outperforms Llama 3…

Microsoft ViSNet: Predicting Molecule Activity

3 March 2024
microsoft visnet

Microsoft ViSNet: Predicting Molecule Activity

Microsoft has unveiled ViSNet – a graph neural network modeling the geometry of complex molecules to predict their activity. ViSNet has the potential to significantly expedite the search for and…

Apple MGIE: Multimodal Models for Image Editing

12 February 2024
apple mgie

Apple MGIE: Multimodal Models for Image Editing

Apple, in collaboration with the University of California, has developed the open-source MGIE model for image editing based on text input. This model tackles various editing tasks, including Photoshop-style image…