Mini-Omni: Open-Source Model for Real-Time Speech Interaction

2 September 2024
mini-omni model architecture

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

Current academic language models still rely on external Text-to-Speech (TTS) systems, causing undesirable latency in speech synthesis. To address this, the Mini-Omni model introduces an audio-based, end-to-end conversational capability that…

DenseAV Algorithm Learns Language from Videos

23 June 2024
DenseAV Algorithm

DenseAV Algorithm Learns Language from Videos

The algorithm DenseAV, developed at MIT, learns to understand the meaning of words and sentences by watching videos of people conversing. DenseAV outperformed other algorithms in tasks involving identifying objects…

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

11 June 2024
Apple-WWDC24-Apple-Intelligence-OpenAI-deal

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

Apple’s Worldwide Developers Conference (WWDC) saw a major focus on artificial intelligence, introducing “Apple Intelligence” and a strategic partnership with OpenAI. These announcements highlight Apple’s commitment to integrating AI across…

Voice Engine: OpenAI’s Voice Synthesis Model

1 April 2024
voice engine openai

Voice Engine: OpenAI’s Voice Synthesis Model

OpenAI has unveiled Voice Engine, a model capable of voice cloning from a 15-second audio recording. Among the users of the model, the company mentions podcasters, announcers, audiobook authors, advertisers,…

ChatGPT Enhancements: Voice Conversations and Image Recognition

25 September 2023
ChatGPT conversations and image recognition

ChatGPT Enhancements: Voice Conversations and Image Recognition

ChatGPT will be able to engage in voice conversations and recognize objects in images. For instance, ChatGPT is ready to read bedtime stories, assist in creating recipes from photos of…

“Deepdub Go” Empowers Content Creators with AI for Video Dubbing

9 July 2023
ai for video dubbing - neural network based service

“Deepdub Go” Empowers Content Creators with AI for Video Dubbing

Israeli startup Deepdub has unveiled its groundbreaking service, Deepdub Go, which utilizes AI for dubbing to automatically dub videos in 65 languages. This innovative platform targets game development studios, advertising…

AI.XYZ: Personal AI Assistant for Personal and Work Tasks

2 July 2023
персональный ИИ ассистент

AI.XYZ: Personal AI Assistant for Personal and Work Tasks

The AI Foundation research lab has launched AI.XYZ, a platform for creating personal AI assistants. The company claims that AI.XYZ is the world’s first platform for managing life using AI,…

AudioPaLM: Google’s Multimodal Model for Voice Translation

29 June 2023
audiopalm google

AudioPaLM: Google’s Multimodal Model for Voice Translation

Google has introduced AudioPaLM, a large language model for speech processing and generation that combines two Google language models, PaLM-2 and AudioLM, into a multimodal architecture. The model can recognize…

Uni-TTSv4: Microsoft’s Text-to-Speech Model

19 December 2021
microsoft text-to-speech model

Uni-TTSv4: Microsoft’s Text-to-Speech Model

Microsoft has introduced an update to Uni-TTS – a model that converts text to speech. Uni-TTSv4 provides the best speech quality among similar state-of-the-art models and will soon be available…

IBM has increased the quality of speech recognition by 57% in the Watson Speech to Text service

29 April 2021

IBM has increased the quality of speech recognition by 57% in the Watson Speech to Text service

The improved neural network training strategy has allowed IBM to significantly increase the efficiency of the speech-to-text tool. The service works with eight languages and provides a record high speed…

MLS: FAIR’s Multilingual Speech Recognition Dataset

4 March 2021

MLS: FAIR’s Multilingual Speech Recognition Dataset

Facebook AI published a multilingual dataset used to train speech recognition models. Multilingual LibriSpeech (MLS) contains 50 thousand hours of audio with people speaking in 8 languages: English, German, Spanish,…

Neural Network Has Learned to Separate Individuals’ Speech on Video

13 April 2018
Neural Network Has Learned to Separate Individuals’ Speech on Video

Neural Network Has Learned to Separate Individuals’ Speech on Video

The fact that our brain in a noisy environment can effectively focus on a particular speaker, “turning off” background sounds — is no secret. This phenomenon even received the popular…