Speech Enhancement Tutorial - Enhancement Methods
3.1 Speech Enhancement
It is convenient to categorise the signal degradations into the three groups given below according to the way in which they alter the wanted speech signal; this categorisation also corresponds approximately to their perceived effect and to the appropriate cleaning method.
- Additive noise that is uncorrelated with the wanted speech signal may arise in either the acoustic or electronic domain. Its perceived effect is to degrade listenability and intelligibility and may, in extreme cases, completely mask the wanted signal. For some types of additive noise, the spectral characteristics are stationary or change slowly with time. This is typically true of hum and amplifier noise as well as of some environmental acoustic noise sources. Spectral subtraction and single-channel adaptive filtering have been successful in reducing the perceived level of such stationary noise sources. Other forms of additive noise are intermittent or highly non-stationary and their identification and deletion is the subject of model-based and missing data methods. Such non-stationary noise sources include media interference, unwanted co-talkers and some forms of electrical interference.
- Convolutive effects are perceived as reverberation and poor spectral balance; they differ from the previous group because the added noise is strongly correlated with the wanted signal. Reverberation and echo normally arise from acoustic reflections and can seriously degrade intelligibility. The increasing use of distant microphones in hands-free telephony has prompted extensive research into reducing the effects of reverberation. Bandwidth restrictions and uneven spectral response may arise from microphone placement, microphone characteristics and CODEC limitations. There has been some work on expanding the bandwidth narrow-band telephone signals in order to improve listenability but there is little evidence of any intelligibility benefits.
- Non-linear distortion frequently arises from amplitude limiting or clipping in the microphone, amplifier or CODEC. This is perceived as harsh distortion that varies with the signal amplitude. A similar perceptual effect can result from high bit errors in the coded signal used by some CODECs. Clipped portions of a waveform are easy to identify provided that no subsequent phase distortion is present and techniques exist for reconstruction of the corrupted portions of the waveform.
Previous | Next