Chair: Marina Bosi, Dolby Labs
Home
A bi-dimensional coding scheme applied to audio bitrate reduction
Authors:
Laurent Mainard, CCETT (France)
Michel Lever, CCETT (France)
Volume 2, Page 1017
Abstract:
In this paper we present an audio bidimensional encoding scheme. Taking advantage of a new complex filterbank, and of a regular lattice associated with a new hexagonal projection kernel, this scheme provides each step of the encoder and of the decoder with fast algorithms, which keeps the overall complexity low. Moreover variable or fix length encodings are available a without look-up table. Result show a very good quality at 80 kbit/s for monophonic signals, and a significant improvement with respect to normalized algorithms of a similar complexity.
Acrobat PDF file of scanned paper: ic961017.pdf
Acrobat PDF file of original paper: ic961017.pdf
TOP
Audio Coding with a Dynamic Wavelet Packet Decomposition Based on Frequency-Varying Modulated Lapped Transforms
Authors:
Marcus Purat, Technical University of Berlin (Germany)
Peter Noll, Technical University of Berlin (Germany)
Volume 2, Page 1021
Abstract:
Optimum time-frequency decompositions are very useful in audio coding applications, because the signal energy can be maximally concentrated even for the wide variety of audio signal characteristics. Moreover, this signal representation is particularly well suited for a perceptual weighting of the quantization noise. The well known tree structure of cascaded 2-channel filterbanks allows a very flexible optimization, leading to a signal adaptive, dynamic wavelet packet decomposition. A major drawback of this technique are strong spectral side lobes which produce clearly audible aliasing in perceptual coders. In this paper we present a new dynamic wavelet packet decomposition, based on modulated lapped transforms, which allows the same flexibility while avoiding the disadvantage mentioned above. We propose a scheme for low bit rate audio coding that efficiently exploits the high energy concentration. This new codec yields excellent audio quality at about 55 kb/s for monophonic signals.
Acrobat PDF file of scanned paper: ic961021.pdf
Acrobat PDF file of original paper: ic961021.pdf
Sound files associated with this paper.
- 0479_a.wav Piano signal prior to encoding-decoding
- 0479_c.wav Male speech signal prior to encoding-decoding
- 0479_e.wav Triangle signal prior to encoding-decoding
- 0479_b.wav Piano signal following encoding-decoding (54kb/s)
- 0479_d.wav Male speech signal following encoding-decoding (64kb/s)
- 0479_f.wav Triangle signal following encoding-decoding (64kb/s)
TOP
A Test of MPEG Using Time-inverted Spoken Audio
Authors:
Thomas McLaughlin, Library of Congress (U.S.A.)
John Cookson, Library of Congress (U.S.A.)
Lloyd Rasmussen, Library of Congress (U.S.A.)
Volume 2, Page 1025
Abstract:
We excerpted a 20 second sample from aDAT-mastered talking book segment and coded it at 32 and 48 kbit/sec using MPEG I, layer 3. We also coded the same segment at 80 kbit/sec using MPEG I, layer 2. We then coded a time-inverted version of the material in the same way. After decoding, we put the inverted segments back into normal sequence and compared them with the corresponding segments coded in normal temporal order. We did the comparison by means of an ABX test with volunteer listeners. Naive listeners were unable to reliably distinguish between material coded in normal temporal order and the same material coded in inverted order. Trained listeners could reliably make the distinction in layer 3 at 32 and 48 kbit/sec but not in layer 2 at 80 kbit/sec.
Acrobat PDF file of scanned paper: ic961025.pdf
TOP
Extension and Complexity Reduction of TwinVQ Audio Coder
Authors:
Takehiro Moriya, NTT Human Interface Laboratories (Japan)
Naoki Iwakami, NTT Human Interface Laboratories (Japan)
Kazunaga Ikeda, NTT Human Interface Laboratories (Japan)
Satoshi Miki, NTT Human Interface Laboratories (Japan)
Volume 2, Page 1029
Abstract:
This paper proposes two novel techniques for TwinVQ (Transform domain Weighted Interleave VQ) high-quality audio coding scheme for lower rates than 64 kbit/s. One is an extension of the weighted interleave technique to time and input channel domains as well as the frequency domain. The other is an efficient representation scheme of the spectral envelope by means of a interpolated square root LPC (Linear Predictive Coding) spectrum.
Acrobat PDF file of scanned paper: ic961029.pdf
Acrobat PDF file of original paper: ic961029.pdf
TOP
Minimising the Effects of Subband Quantisation of the Time Domain Aliasing Cancellation Filter Bank
Authors:
Conrad Jakob, Royal Melbourne Institute of Technology (Australia)
Alan Bradley, Royal Melbourne Institute of Technology (Australia)
Volume 2, Page 1033
Abstract:
The effect of the quantisation of filter bank subbands has been analysed by incorporating quantisation noise models into the Time Domain Aliasing Cancellation (TDAC) filter bank. We have found expressions for the reconstruction error of the quantised TDAC system in terms of several signal correlated components, and an uncorrelated component. These expressions allow easy identification of subjectively annoying errors, and provide the framework for a subjective optimisation of the quantisation process. Research has been carried out on alternative quantiser models and methods of quantiser-compensation.
Acrobat PDF file of scanned paper: ic961033.pdf
TOP
Speech Analysis and Coding Using a Multi-Resolution Sinusoidal Transform
Authors:
David V. Anderson, Georgia Institute of Technology (U.S.A.)
Volume 2, Page 1037
Abstract:
The sinusoidal transform, as developed by Quatieri and McAulay, provides a sparse representation for speech signals by taking advantage of psychoacoustic masking. The currently reported work takes the sinusoidal transform one step further by considering the frequency resolution abilities of the human auditory system in more detail. The new transform is based on the wavelet principle of variable resolution in time/frequency analysis. Specifically, a sinusoidal transform is developed which uses quadrature mirror filter (QMF) banks to obtain better time resolution at high frequencies and better frequency resolution at low frequencies. This naturally provides a perceptually improved allocation of the sinusoids. The new transform matches the human auditory system better than its predecessor and it also matches speech signals well, both fricative sounds and voiced speech. The QMF based ST is then shown to be equivalent to a more efficient FFT based implementation.
Acrobat PDF file of scanned paper: ic961037.pdf
Acrobat PDF file of original paper: ic961037.pdf
Sound files associated with this paper.
- 0809_a.wav Unprocessed speech
- 0809_b.wav Processed speech with 60 msec window, 4 bands, limit of 8 peaks per band
- 0809_c.wav Processed speech with 40 msec window, 4 bands, limit of 12 peaks per band
TOP
Audio coding using the wavelet packet transform and a combined scalar-vector quantization
Authors:
Simon Boland, Queensland University of Technology (Australia)
Mohamed Deriche, Queensland University of Technology (Australia)
Volume 2, Page 1041
Abstract:
This paper investigates a hybrid scalar-vector quantization scheme for coding high quality audio signals. A Wavelet Packet Transform (WPT) is used to decompose the audio signal into frequency bands slightly finer than the critical band divisions. A masking model computation is then used as input to the hybrid quantization scheme, where scalar quantization is used for coding the subbands from 0-5.5 kHz, and vector quantization is used for coding the subbands from 5.5-22 kHz. The performance of the proposed coder is assessed from Segmental Signal-to-Noise Ratios (SNR) and the perceived quality for a number of signals. The perceived quality is determined from informal comparisons between the uncoded signals at the original bitrate of 705 kb/s, and the same signals coded with (1) the proposed coder at 80 kb/s, (2) a coder using only scalar quantization at both 128 kb/s and 96 kb/s, and (3) the MPEG layer III coder at 64 kb/s. The comparisons indicate that very good coder quality is possible with the proposed coder at bitrates of approximately 80 kb/s. This represents a saving of about 16 kb/s over full scalar quantization with a similar quality. Further bitrate reduction with the proposed coder is possible by entropy coding of the scalar quantized transform coefficients and the VQ indices.
Acrobat PDF file of scanned paper: ic961041.pdf
TOP
Low Bit Rate High Quality Audio Coding with Combined Harmonic and Wavelet Representations
Authors:
Khaled N. Hamdy, University of Minnesota (U.S.A.)
Murtaza Ali, University of Minnesota (U.S.A.)
Ahmed H. Tewfik, University of Minnesota (U.S.A.)
Volume 2, Page 1045
Abstract:
In this paper, we describe a novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals. First, we perform a harmonic analysis of the signal to remove strong periodic structures or tones from the signal. Then we carry out wavelet analysis that are useful in tracking the transients of the signal. These transients are then removed from the wavelet coefficients. The remaining coefficients have broadband noise-like structure. Since this method separates out tones (sinusoids), transients, and broadband noise, we may use tonal, noise, and temporal masking information to individually encode the tones and the wavelet coefficients. Our experiments suggest that this method yields a nominal bit rate of 1 bit/sample for high quality audio compression.
Acrobat PDF file of scanned paper: ic961045.pdf
Acrobat PDF file of original paper: ic961045.pdf
TOP
A High Performance Software Implementation Of MPEG Audio Encoder
Authors:
Manoj Kumar, IBM T.J. Watson Research Center (U.S.A.)
Mohammad Zubair, IBM T.J. Watson Research Center (U.S.A.)
Volume 2, Page 1049
Abstract:
The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard defines the decoding process and also the syntax of the coded bitstream. However, there is room for having different implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of five improvement over a straightforward implementation on the IBM PowerPC, Model 250.
Acrobat PDF file of scanned paper: ic961049.pdf
Acrobat PDF file of original paper: ic961049.pdf
TOP
Audio Compression At Low Bit Rates Using A Signal Adaptive Switched Filterbank
Authors:
Deepen Sinha, AT&T Bell Laboratories (U.S.A.)
James D. Johnston, AT&T Bell Laboratories (U.S.A.)
Volume 2, Page 1053
Abstract:
A perceptual audio coder typically consists of a filterbank which breaks the signal into its frequency components. These components are then quantized using a perceptual masking model. Previous efforts have indicated that a high resolution filterbank, e.g., the modified discrete cosine transform (MDCT) with 1024 subbands, is able to minimize the bit rate requirements for most of the music samples. The high resolution MDCT, however, is not suitable for the encoding of non-stationary segments of music. A long/short resolution or "window" switching scheme has been employed to overcome this problem but it has certain inherent disadvantages which become prominent at lower bit rates ( < 64 kbps for stereo). We propose a novel switched filterbank scheme which switches between a MDCT and a wavelet filterbank based on signal characteristics. A tree structured wavelet filterbank with properly designed filters offers natural advantages for the representation of non-stationary segments such as attacks. Furthermore, it allows for the optimum exploitation of perceptual irrelevancies.