Text-to-Audio

daVinci-MagiHuman: Open 15B Model Generates a 5-Second Lip Sync Video in 2 Seconds on a Single H100

24 March 2026

24 March 2026
- State-of-the-Art

daVinci-MagiHuman: Open 15B Model Generates a 5-Second Lip Sync Video in 2 Seconds on a Single H100

24 March 2026
- State-of-the-Art

SII-GAIR and Sand.ai have published daVinci-MagiHuman — an open-source multimodal 15B model based on a single-stream transformer that simultaneously generates video with precise lip sync and synchronized audio, producing a…

EzAudio: Open Source Hyperrealistic Text-to-Audio Model

19 September 2024

19 September 2024
- State-of-the-Art

ezaudio text-to-audio model generation ai

EzAudio: Open Source Hyperrealistic Text-to-Audio Model

19 September 2024
- State-of-the-Art

EzAudio, a new transformer-based text-to-audio (T2A) diffusion model developed by researchers from Tencent AI Lab and Johns Hopkins University. EzAudio addresses key challenges in T2A generation, including generation quality, computational…