Speech Recognition / Neural Networks and Deep Learning

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

7 January 2025

7 January 2025

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

7 January 2025

Nvidia just announced a major shift in consumer AI computing at CES 2025, combining new GPUs with a platform for running foundation models locally. The announcement includes next-gen RTX 50…

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

2 September 2024

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

Current academic language models still rely on external Text-to-Speech (TTS) systems, causing undesirable latency in speech synthesis. To address this, the Mini-Omni model introduces an audio-based, end-to-end conversational capability that…

DenseAV Algorithm Learns Language from Videos

23 June 2024

DenseAV Algorithm Learns Language from Videos

The algorithm DenseAV, developed at MIT, learns to understand the meaning of words and sentences by watching videos of people conversing. DenseAV outperformed other algorithms in tasks involving identifying objects…

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

11 June 2024

Apple-WWDC24-Apple-Intelligence-OpenAI-deal

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

Apple’s Worldwide Developers Conference (WWDC) saw a major focus on artificial intelligence, introducing “Apple Intelligence” and a strategic partnership with OpenAI. These announcements highlight Apple’s commitment to integrating AI across…

Voice Engine: OpenAI’s Voice Synthesis Model

1 April 2024

1 April 2024

Voice Engine: OpenAI’s Voice Synthesis Model

1 April 2024

OpenAI has unveiled Voice Engine, a model capable of voice cloning from a 15-second audio recording. Among the users of the model, the company mentions podcasters, announcers, audiobook authors, advertisers,…

ChatGPT Enhancements: Voice Conversations and Image Recognition

25 September 2023

25 September 2023

ChatGPT conversations and image recognition

ChatGPT Enhancements: Voice Conversations and Image Recognition

25 September 2023

ChatGPT will be able to engage in voice conversations and recognize objects in images. For instance, ChatGPT is ready to read bedtime stories, assist in creating recipes from photos of…

“Deepdub Go” Empowers Content Creators with AI for Video Dubbing

9 July 2023

ai for video dubbing - neural network based service

“Deepdub Go” Empowers Content Creators with AI for Video Dubbing

Israeli startup Deepdub has unveiled its groundbreaking service, Deepdub Go, which utilizes AI for dubbing to automatically dub videos in 65 languages. This innovative platform targets game development studios, advertising…

AI.XYZ: Personal AI Assistant for Personal and Work Tasks

2 July 2023

AI.XYZ: Personal AI Assistant for Personal and Work Tasks

The AI Foundation research lab has launched AI.XYZ, a platform for creating personal AI assistants. The company claims that AI.XYZ is the world’s first platform for managing life using AI,…

AudioPaLM: Google’s Multimodal Model for Voice Translation

29 June 2023

AudioPaLM: Google’s Multimodal Model for Voice Translation

Google has introduced AudioPaLM, a large language model for speech processing and generation that combines two Google language models, PaLM-2 and AudioLM, into a multimodal architecture. The model can recognize…

Uni-TTSv4: Microsoft’s Text-to-Speech Model

19 December 2021

Uni-TTSv4: Microsoft’s Text-to-Speech Model

Microsoft has introduced an update to Uni-TTS – a model that converts text to speech. Uni-TTSv4 provides the best speech quality among similar state-of-the-art models and will soon be available…

IBM has increased the quality of speech recognition by 57% in the Watson Speech to Text service

29 April 2021

29 April 2021

IBM has increased the quality of speech recognition by 57% in the Watson Speech to Text service

29 April 2021

The improved neural network training strategy has allowed IBM to significantly increase the efficiency of the speech-to-text tool. The service works with eight languages and provides a record high speed…

Speech Recognition

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

Nvidia Drops Monster AI Update at CES 2025: Local Foundation Models Meet New RTX 50 Series

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

Mini-Omni: Open-Source Model for Real-Time Speech Interaction

DenseAV Algorithm Learns Language from Videos

DenseAV Algorithm Learns Language from Videos

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

Apple Unveils “Apple Intelligence” and OpenAI Partnership at WWDC

Voice Engine: OpenAI’s Voice Synthesis Model

Voice Engine: OpenAI’s Voice Synthesis Model

ChatGPT Enhancements: Voice Conversations and Image Recognition

ChatGPT Enhancements: Voice Conversations and Image Recognition

“Deepdub Go” Empowers Content Creators with AI for Video Dubbing

“Deepdub Go” Empowers Content Creators with AI for Video Dubbing

AI.XYZ: Personal AI Assistant for Personal and Work Tasks

AI.XYZ: Personal AI Assistant for Personal and Work Tasks

AudioPaLM: Google’s Multimodal Model for Voice Translation

AudioPaLM: Google’s Multimodal Model for Voice Translation

Uni-TTSv4: Microsoft’s Text-to-Speech Model

Uni-TTSv4: Microsoft’s Text-to-Speech Model

IBM has increased the quality of speech recognition by 57% in the Watson Speech to Text service

IBM has increased the quality of speech recognition by 57% in the Watson Speech to Text service

MLS: FAIR’s Multilingual Speech Recognition Dataset

MLS: FAIR’s Multilingual Speech Recognition Dataset

Neural Network Has Learned to Separate Individuals’ Speech on Video

Neural Network Has Learned to Separate Individuals’ Speech on Video