Echo-Debar
Echo Cancellation Workflow | Spectrogram of the far end speech, near end mic signal, near end (target) speech and the produced echo signal. |
---|
Generation of Echo: Audio Examples:
Acoustic Echo Cancellation Performance of Echo-Debar:
While using audio headsets, or talking over telephone, hearing our own voice back (a slightly distorted version) is a vey common phenomenon, which is called Echo-ing. But, everyone will agree that it's a very annoying experience; such effects during important meeting or on-line lectures is nothing but a disaster. Echo-Debar is an algorithmic approach to tackle this issue.
Applications: Telecommunication/Voice Communication Devices, Computers, Mobile phones, Bluetooth headsets, automotive hands-free kits, video/audio conferencing equipment.
Timeline: August, 2020 - November, 2020
Collaborator(s): Achal Nilhani, Dr. S. Dhabal
Theory behind Working: Let's consider the situation depicted in the first figure: Speaker 1 and 2, at separate locations, speaking over a communication device. The speech from the Speaker 2, also called 'Far End Speech' is transmitted to the Speaker 1 side and Speaker 1 hears that over a speaker. Now, as the Speaker 1 starts speaking, the mic at speaker 1 will capture two different signals: 1. The 'Near End Speech' from Speaker 1 and 2. The distorted version of Far End Speech from Speaker 2, which is termed as 'Near End Mic Signal'. And, the overall resulting signal is sent back to the Speaker 2 side. Without Echo-Debar, Speaker 2 would directly hear the 'Echo Signal'. But the function of Echo-Debar is to extract the Near End Speech from Echo Signal so that Speaker 2 does not hear the own voice feedback anymore.
Echo = Near_end_speech + f (Far_end_speech),
where f: distortion due to hardware (speaker & mic)
and the effect of room reverberations.
In Echo-Debar, we jointly implement classical signal processing with deep learning to achieve echo cancellation in real-time. The classical signal processing (Least Mean Square Adaptive filtering) deals with the linear distortion components of the echo signal. Whereas, a multi-layer Long-short-term-memory (LSTM) based deep network is used to model the non-linearities in the echo distortion. The system has been trained and evaluated using through the Acoustic Echo Cancellation Challenge (Microsoft) dataset. Results shows that Echo-Debar achieves P.831 Echo DMOS for doubletalk scenario = 3.46 (maximum possible= 5.0). [CODE]