Communications Research Centre Canada
Symbol of the Government of Canada

Audio Signal Processing

Audio Signal Processing

Novel Audio Coding Paradigms

Low bit-rate audio coding technologies are evolving, leading to more efficient use of available bandwidth. More and more wireless services are being deployed, such as digital radio, mobile television and wireless Internet, creating a demand for more spectrum. This new trend requires more bandwidth-efficient coding systems for broadcasters and telco's to air more program channels, while delivering high quality audio. The traditional coding schemes fail to achieve the desired bit-rate. For instance the Eureka 147 DAB standard, which has been operating in a number of countries for some time, employs MPEG 1 Layer 2 audio coding. This coding system is lagging the recent advancement in audio coding and takes up 192 kbps for coding stereo-pair audio with broadcast quality.

Since the standardization of the MPEG-1 Layer 1, 2 and 3 audio encoders in the early 1990's, more efficient audio encoders have been developed. The most significant advancement has been the development of the MPEG AAC audio encoder, which was finalized in 1997. In a formal listening tests performed at CRC in 1997 to evaluate the state-of-the-art audio coders, AAC encoder outperformed other audio encoders. Broadcast quality can be delivered with AAC at 96 kbps as compared to 192 kbps for MPEG 1 Layer 2 for stereo signals, meaning a saving of 50% in bit-rate.

The state-of-the-art audio encoder is called MPEG HE AAC v2, which is an enhanced version of conventional AAC. To further increase compression efficiency, this system uses a technique called SBR (Spectral Band Replication) to parametrically encode high frequencies and a technique called Parametric Stereo (PS) to parametrically encode the stereo image. Although the SBR and PS tools can further push the bit rate down as compared to plain AAC, quality is sacrificed to obtain this coding gain since, with SBR and PS, an approximation only of high frequencies and stereo image is reproduced at the decoder output. For this reason, SBR and PS are mainly for applications where high quality is not a requirement.

There is almost a consensus in the audio community that no significant reduction in bit-rate can be achieved by further improvement to existing perceptual transform-based audio coding systems. As such, research activities at CRC have been initiated to develop new audio coding paradigms. One promising direction is Object-based Audio Coding in which the function of the human hearing system in analysis and perception of sounds is imitated. In that approach the physical attributes of audio signals should be converted to perceptually meaningful quantities, which can be split into groups to represent so-called audio objects. This way the bit-rate is expected to reduce drastically. Following the trend of object-based audio coding, we are developing a new paradigm for audio coding based on neural spikes.

Our proposed approach is based on the generation of auditory-inspired sparse 2-D representations of audio signals, dubbed as spikegrams. We also apply auditory masking models to further reduce spike counts and have introduced the Perceptual Matching Pursuit (PMP) algorithm that decomposes audio signals into audible spikes. We finally group spikes into audio objects (based on their dependencies) for efficient coding. More recently, our Group has started research on Compressed Sampling and its possible application in efficient coding of sparse time-frequency representations (i.e., spikegram) of audio signals. The target bitrate is less than 64 kbps to encode stereo audio signals (sampled at 44.1 kHz), with the quality above 4 (i.e. "broadcast" quality) on the ITU-R 5-grade impairment scale for diverse critical audio materials.

Audio Source Separation

Blind Source Separation (BSS) from convolutive mixtures is a problem encountered in many real world multi-sensor applications. One popular case that falls into this category is the "cocktail party" problem. Humans have the ability to focus on a sound source of interest in the noisy and interfering environment of a cocktail party. Numerous algorithms have been then developed to deal with this kind of situations. For the same purpose, we have started research activities to develop and implement algorithms for the application of multi-source separation from multi-microphone audio recordings.

The simplest class of blind source separation is instantaneous BSS where no multipath is assumed in the mixing system. In real world recordings, we face more complex situations where a matrix of filters represent the impulse responses linking sensors to sources. This falls into the category of convolutive BSS, where we have to take into account multipath channel in the mixing system. The solution to this problem consists in finding the demixing system as close as possible to the inverse of the mixing system.

Although numerous techniques have been introduced for BSS, the quality of separated sources needs improvement. That is required to enhance intelligibility of separated speech from a sound mixture, or in general to reduce interference from other sources to a separated source. In this direction, we have developed algorithms to enhance separated sources by reducing interference from other sources.

Digital Audio Watermarking

Broadband digital transmission networks such as the Internet facilitates the distribution and copy of digital audio recordings. Easy copy of audio files has raised the issue of the protection of intellectual ownership and the prevention of unauthorized distribution of multimedia data. Digital watermarking can be used to enforce intellectual property rights and protect multimedia material from illegal distribution. Audio watermarking can also be used in broadcasting to embed identification information in the audio and multimedia data.

We are developing digital watermarks to be embedded in 2-D time-frequency sparse representations of audio signals. The desired watermark should be imperceptible (i.e., perceptually transparent) and robust at a small overhead in the operational bit rate and computational complexity. Our watermarking algorithms are based on the characteristics of the human auditory system in order to insert inaudible watermark into audio signal

For more information,
contact:

Louis Thibault, Manager
Advanced Audio Systems
Communications Research Centre Canada
3701 Carling Ave., P.O. Box 11490, Station H
Ottawa, ON K2H 8S2 CANADA
Tel: +1 613 990-4349, Fax: +1 613 993-9950
Email: louis.thibault@crc.gc.ca