The CRC has developed a commercial implementation of the ITU-R PEAQ model in its CRC-SEAQ Objective Test Module software. Technical information on CRC-SEAQ and how to buy it can be found here.
The limitations imposed by available bandwidth can affect the quality and responsiveness of digital audio communication systems. The need to conserve bandwidth has led to developments in the compression of the audio data to be transmitted. Various encoding methods remove both redundancy and perceptual irrelevancy in the audio signal so that the bit rate required to encode the signal is significantly reduced. These lossy compression algorithms take into account knowledge of human auditory perception, and typically achieve a reduced bit rate by ignoring audio information that is not likely to be heard by most listeners. A psychoacoustic model is used to predict how this information is masked by louder audio content adjacent in time and frequency. The degree of compression permitted by a codec (coder/decoder) depends, to some extent, on the sophistication of the model employed.
The perceived quality of decoded audio may suffer when a compression algorithm pushes the limit with respect to bit rate reduction. The performance typically varies with different types of audio content, and some implementations may be more successful than others in the use of psychoacoustic knowledge. Subjective tests are most reliable for assessing the quality of decoded audio. However, the expense and time to conduct such tests often prohibit their use. Therefore, a fast and reliable method for objective measurement of perceived audio quality has been developed.
The International Telecommunications Union (ITU) describes in detail a standard method for measuring the quality of wide bandwidth audio (ITU Recommendation BS-1387). The method is the result of a joint effort among laboratories in Canada, The Netherlands, France, and Germany. The acronym for the measurement model is PEAQ (Perceptual Evaluation of Audio Quality).
The psychoacoustic model employed in the method produces a number of variables based on comparisons between a reference signal and the same signal processed by a particular device such as a codec. These variables are used to predict the subjective quality rating that would be assigned to the processed signal if a formal listening test were conducted. The objective quality measurement was calibrated using results from a number of listening tests conducted using a standard methodology also recommended by the ITU.
The ITU recommendation describes two variations of the method. The Basic Version is intended to be fast enough for real-time monitoring, while the Advanced Version is computationally more demanding but is expected to give slightly more reliable results. The high level structure of both the Basic Version and the Advanced Version is shown in the figure. As in the listening tests, the quality of the test signal is measured relative to the reference signal. Each signal is transformed into a time-frequency representation by the psychoacoustic model. Then a task-specific model of auditory cognition reduces these data to a number of scalar variables, some of which are mapped to the desired quality measurement.
The psychoacoustic model in the Basic Version uses a Discrete Fourier Transform (DFT) to transform the signal to a time-frequency representation, while the Advanced Version uses both a DFT and a filter bank. The data from the DFT is mapped from the frequency scale to a pitch scale, the psychoacoustic equivalent of frequency. For the filter bank, the frequency to pitch mapping is implicitly taken into account by the bandwidths and spacing of the bandpass filters. The input energy is spread over adjacent pitch regions as a function of the level of the input.
Simultaneous masking is achieved via the masked thresholdconcept as well as bycomparison of internal representations. The approach based on the masked thresholdconcept calculates a level dependent masked threshold for the reference signal at any pitch value using a predefined psychophysical masking function. Additional energy in the test signal is deemed to be audible if the representation of that energy exceeds the masked threshold. In the approach based on the comparison of internal representations, the energies of both the test and the reference signal are spread to adjacent pitch regions in order to obtain excitation patterns, and are non-linearly compressed to approximate loudness. Non-simultaneous forward masking is implemented by smearing the excitation patterns over time prior to compression. The difference between the resulting internal representations models the energy in the test signal that is not masked by the reference audio content.
The cognitive model compares the internal representations and calculates scalar variables that summarize psychoacoustic activity over time. Important information for making the quality measurement is derived from the differences between the frequency and pitch domain representations of the reference and test signals. In the frequency domain, the spectral bandwidths of both signals are measured and the harmonic structure in the error is determined. In the pitch domain, error measures are derived from the excitation envelope modulations, the excitation magnitudes, and the excitation derived from the error signal calculated in the frequency domain. The quality measurement is based on eleven variables for the Basic Version, and on five variables for the Advanced Version.
An example of the performance of this method may be seen in the accompanying figures where objective codec quality measurements are compared with corresponding subjective ratings.