Converting Mono Sound Into Binaural Sound Using Deep Learning

Two researchers, Ruohan Gao from the University of Texas at and Kristen Grauman from Facebook Research, have proposed a method to convert mono sound recordings into binaural sound.

As humans, we can consume binaural sound, without any effort at all. Our hearing system and ears are designed such that we can locate a source of a sound in the 3D space around us. Although we have advanced our technology pretty much, we are still not able to synthesize such 3D sound.

Now, Gao and Grauman have proposed a method that can generate “3D sound” to some extent. They call this a 2.5D visual sound.

Whatever we call it, this kind of sound can be perceived as 3D by us (humans). And this is all possible because of our understanding of the human hearing system. The researchers exploited that knowledge, more specifically the knowledge of how we locate the sound, to create their binaural visual sound synthesizer.

To reproduce binaural sound artificially, we need to know how to reproduce the effect that all geometry has on sound. Our ears take into account some things when locating a sound: the time difference of arrival between the left and right ear, the sound level difference, the geometry of the ears, etc.

To take the problem, Gao and Grauman have used a pair of synthetic ears (to record sound) and a go pro camera. Their setup was approximating a human head, and they used it to record monaural and binaural recordings of over 2,000 musical clips.

Then they employed deep learning techniques on their dataset of recordings to learn sound source localization in 3D space. Their method, in the end, was able to successfully distort monaural sound and produce a 3D (or 2.5D) visual binaural sound.

The researchers released a video where the recordings and their results can be heard. The paper was published on Arxiv.

Notify of

Inline Feedbacks
View all comments