Speech Enhancement Tutorial - Evaluation Methods

2.13 Non-Intrusive Methods

In some situations where objective quality measures are useful, the original clean signal is not available. Assessment must then be based on the processed signal alone. A review of a number of techniques for non-intrusive objective assessment of quality is given in Rix [2004].

For telephone circuits, non-intrusive measures of speech level, noise level, talker echo and delay can easily be measured and used to make some prediction of the likely channel quality, for example the Call Clarity Index (CCI) described in ITU-T P.562 [ITU-T, 2004a].

To obtain single-ended measures of speech quality, it is necessary to analyse the degree to which the signal appears to follow the typical statistics of speech. For example Gray et al. [2000] analysed the signal as a sequence of predicted vocal tract shapes, then rated the plausibility of the shapes and transitions. Another approach, that works from the auditory processing stages of the PESQ approach, was described in Beerends et al. [2000]. A recent competition for non-intrusive quality models as organised by ITU-T led to recommendation P.563 for a perceptual single-sided speech quality measure[ITU-T, 2004b]. More recently a number of algorithms have been developed that claim to give significantly better performance than P.563. Grancharov et al. [2006] developed a low-complexity measure that gave superior prediction of MOS scores with much less computational cost than P.563, while Kim and Tarraf [2006] describe the ANIQUE+ model which is trained on the MOS results found for 24 different speech databases covering a wide variety of distortion conditions. They claim their model predicts MOS performance better even than the P.862 intrusive method.

