音频编码
  http://www.ece.umassd.edu/Faculty/acosta/ICASSP/ICASSP_1996/html/ic96s212.htm Audio Coding

Chair: Marina Bosi, Dolby Labs

Audio Coding_其他 Home


A bi-dimensional coding scheme applied to audio bitrate reduction

Authors:

Laurent MainardCCETT (France) 
Michel LeverCCETT (France)

Volume 2, Page 1017

Abstract:

In this paper we present an audio bidimensional encoding scheme. Taking advantage of a new complex filterbank, and of a regular lattice associated with a new hexagonal projection kernel, this scheme provides each step of the encoder and of the decoder with fast algorithms, which keeps the overall complexity low. Moreover variable or fix length encodings are available a without look-up table. Result show a very good quality at 80 kbit/s for monophonic signals, and a significant improvement with respect to normalized algorithms of a similar complexity.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961017.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961017.pdf

Audio Coding_其他_04 TOP



Audio Coding with a Dynamic Wavelet Packet Decomposition Based on Frequency-Varying Modulated Lapped Transforms

Authors:

Marcus PuratTechnical University of Berlin (Germany) 
Peter NollTechnical University of Berlin (Germany)

Volume 2, Page 1021

Abstract:

Optimum time-frequency decompositions are very useful in audio coding applications, because the signal energy can be maximally concentrated even for the wide variety of audio signal characteristics. Moreover, this signal representation is particularly well suited for a perceptual weighting of the quantization noise. The well known tree structure of cascaded 2-channel filterbanks allows a very flexible optimization, leading to a signal adaptive, dynamic wavelet packet decomposition. A major drawback of this technique are strong spectral side lobes which produce clearly audible aliasing in perceptual coders. In this paper we present a new dynamic wavelet packet decomposition, based on modulated lapped transforms, which allows the same flexibility while avoiding the disadvantage mentioned above. We propose a scheme for low bit rate audio coding that efficiently exploits the high energy concentration. This new codec yields excellent audio quality at about 55 kb/s for monophonic signals.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961021.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961021.pdf

Sound files associated with this paper.

  • Audio Coding_音频编码_07 0479_a.wav Piano signal prior to encoding-decoding
  • Audio Coding_音频编码_07 0479_c.wav Male speech signal prior to encoding-decoding
  • Audio Coding_音频编码_07 0479_e.wav Triangle signal prior to encoding-decoding
  • Audio Coding_音频编码_07 0479_b.wav Piano signal following encoding-decoding (54kb/s)
  • Audio Coding_音频编码_07 0479_d.wav Male speech signal following encoding-decoding (64kb/s)
  • Audio Coding_音频编码_07 0479_f.wav Triangle signal following encoding-decoding (64kb/s)

Audio Coding_其他_04 TOP



A Test of MPEG Using Time-inverted Spoken Audio

Authors:

Thomas McLaughlinLibrary of Congress (U.S.A.) 
John CooksonLibrary of Congress (U.S.A.) 
Lloyd RasmussenLibrary of Congress (U.S.A.)

Volume 2, Page 1025

Abstract:

We excerpted a 20 second sample from aDAT-mastered talking book segment and coded it at 32 and 48 kbit/sec using MPEG I, layer 3. We also coded the same segment at 80 kbit/sec using MPEG I, layer 2. We then coded a time-inverted version of the material in the same way. After decoding, we put the inverted segments back into normal sequence and compared them with the corresponding segments coded in normal temporal order. We did the comparison by means of an ABX test with volunteer listeners. Naive listeners were unable to reliably distinguish between material coded in normal temporal order and the same material coded in inverted order. Trained listeners could reliably make the distinction in layer 3 at 32 and 48 kbit/sec but not in layer 2 at 80 kbit/sec.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961025.pdf

Audio Coding_其他_04 TOP



Extension and Complexity Reduction of TwinVQ Audio Coder

Authors:

Takehiro MoriyaNTT Human Interface Laboratories (Japan) 
Naoki IwakamiNTT Human Interface Laboratories (Japan) 
Kazunaga IkedaNTT Human Interface Laboratories (Japan) 
Satoshi MikiNTT Human Interface Laboratories (Japan)

Volume 2, Page 1029

Abstract:

This paper proposes two novel techniques for TwinVQ (Transform domain Weighted Interleave VQ) high-quality audio coding scheme for lower rates than 64 kbit/s. One is an extension of the weighted interleave technique to time and input channel domains as well as the frequency domain. The other is an efficient representation scheme of the spectral envelope by means of a interpolated square root LPC (Linear Predictive Coding) spectrum.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961029.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961029.pdf

Audio Coding_其他_04 TOP



Minimising the Effects of Subband Quantisation of the Time Domain Aliasing Cancellation Filter Bank

Authors:

Conrad JakobRoyal Melbourne Institute of Technology (Australia) 
Alan BradleyRoyal Melbourne Institute of Technology (Australia)

Volume 2, Page 1033

Abstract:

The effect of the quantisation of filter bank subbands has been analysed by incorporating quantisation noise models into the Time Domain Aliasing Cancellation (TDAC) filter bank. We have found expressions for the reconstruction error of the quantised TDAC system in terms of several signal correlated components, and an uncorrelated component. These expressions allow easy identification of subjectively annoying errors, and provide the framework for a subjective optimisation of the quantisation process. Research has been carried out on alternative quantiser models and methods of quantiser-compensation.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961033.pdf

Audio Coding_其他_04 TOP



Speech Analysis and Coding Using a Multi-Resolution Sinusoidal Transform

Authors:

David V. AndersonGeorgia Institute of Technology (U.S.A.)

Volume 2, Page 1037

Abstract:

The sinusoidal transform, as developed by Quatieri and McAulay, provides a sparse representation for speech signals by taking advantage of psychoacoustic masking. The currently reported work takes the sinusoidal transform one step further by considering the frequency resolution abilities of the human auditory system in more detail. The new transform is based on the wavelet principle of variable resolution in time/frequency analysis. Specifically, a sinusoidal transform is developed which uses quadrature mirror filter (QMF) banks to obtain better time resolution at high frequencies and better frequency resolution at low frequencies. This naturally provides a perceptually improved allocation of the sinusoids. The new transform matches the human auditory system better than its predecessor and it also matches speech signals well, both fricative sounds and voiced speech. The QMF based ST is then shown to be equivalent to a more efficient FFT based implementation.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961037.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961037.pdf

Sound files associated with this paper.

  • Audio Coding_音频编码_07 0809_a.wav Unprocessed speech
  • Audio Coding_音频编码_07 0809_b.wav Processed speech with 60 msec window, 4 bands, limit of 8 peaks per band
  • Audio Coding_音频编码_07 0809_c.wav Processed speech with 40 msec window, 4 bands, limit of 12 peaks per band

Audio Coding_其他_04 TOP



Audio coding using the wavelet packet transform and a combined scalar-vector quantization

Authors:

Simon BolandQueensland University of Technology (Australia) 
Mohamed DericheQueensland University of Technology (Australia)

Volume 2, Page 1041

Abstract:

This paper investigates a hybrid scalar-vector quantization scheme for coding high quality audio signals. A Wavelet Packet Transform (WPT) is used to decompose the audio signal into frequency bands slightly finer than the critical band divisions. A masking model computation is then used as input to the hybrid quantization scheme, where scalar quantization is used for coding the subbands from 0-5.5 kHz, and vector quantization is used for coding the subbands from 5.5-22 kHz. The performance of the proposed coder is assessed from Segmental Signal-to-Noise Ratios (SNR) and the perceived quality for a number of signals. The perceived quality is determined from informal comparisons between the uncoded signals at the original bitrate of 705 kb/s, and the same signals coded with (1) the proposed coder at 80 kb/s, (2) a coder using only scalar quantization at both 128 kb/s and 96 kb/s, and (3) the MPEG layer III coder at 64 kb/s. The comparisons indicate that very good coder quality is possible with the proposed coder at bitrates of approximately 80 kb/s. This represents a saving of about 16 kb/s over full scalar quantization with a similar quality. Further bitrate reduction with the proposed coder is possible by entropy coding of the scalar quantized transform coefficients and the VQ indices.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961041.pdf

Audio Coding_其他_04 TOP



Low Bit Rate High Quality Audio Coding with Combined Harmonic and Wavelet Representations

Authors:

Khaled N. HamdyUniversity of Minnesota (U.S.A.) 
Murtaza AliUniversity of Minnesota (U.S.A.) 
Ahmed H. TewfikUniversity of Minnesota (U.S.A.)

Volume 2, Page 1045

Abstract:

In this paper, we describe a novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals. First, we perform a harmonic analysis of the signal to remove strong periodic structures or tones from the signal. Then we carry out wavelet analysis that are useful in tracking the transients of the signal. These transients are then removed from the wavelet coefficients. The remaining coefficients have broadband noise-like structure. Since this method separates out tones (sinusoids), transients, and broadband noise, we may use tonal, noise, and temporal masking information to individually encode the tones and the wavelet coefficients. Our experiments suggest that this method yields a nominal bit rate of 1 bit/sample for high quality audio compression.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961045.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961045.pdf

Audio Coding_其他_04 TOP



A High Performance Software Implementation Of MPEG Audio Encoder

Authors:

Manoj KumarIBM T.J. Watson Research Center (U.S.A.) 
Mohammad ZubairIBM T.J. Watson Research Center (U.S.A.)

Volume 2, Page 1049

Abstract:

The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard defines the decoding process and also the syntax of the coded bitstream. However, there is room for having different implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of five improvement over a straightforward implementation on the IBM PowerPC, Model 250.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961049.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961049.pdf

Audio Coding_其他_04 TOP



Audio Compression At Low Bit Rates Using A Signal Adaptive Switched Filterbank

Authors:

Deepen SinhaAT&T Bell Laboratories (U.S.A.) 
James D. JohnstonAT&T Bell Laboratories (U.S.A.)

Volume 2, Page 1053

Abstract:

A perceptual audio coder typically consists of a filterbank which breaks the signal into its frequency components. These components are then quantized using a perceptual masking model. Previous efforts have indicated that a high resolution filterbank, e.g., the modified discrete cosine transform (MDCT) with 1024 subbands, is able to minimize the bit rate requirements for most of the music samples. The high resolution MDCT, however, is not suitable for the encoding of non-stationary segments of music. A long/short resolution or "window" switching scheme has been employed to overcome this problem but it has certain inherent disadvantages which become prominent at lower bit rates ( < 64 kbps for stereo). We propose a novel switched filterbank scheme which switches between a MDCT and a wavelet filterbank based on signal characteristics. A tree structured wavelet filterbank with properly designed filters offers natural advantages for the representation of non-stationary segments such as attacks. Furthermore, it allows for the optimum exploitation of perceptual irrelevancies.

Acrobat PDF file of scanned paper: Audio Coding_其他_02 ic961053.pdf

Acrobat PDF file of original paper: Audio Coding_其他_02 ic961053.pdf

Audio Coding_其他_04 TOP