AES Journal

Journal of the AES

2023 March - Volume 71 Number 3


Giant FFTs for Sample-Rate Conversion

Open Access



The audio industry uses several sample rates interchangeably, and high-quality sample-rate conversion is crucial. This paper describes a frequency-domain sample-rate conversion method that employs a single large ("giant") fast Fourier transform (FFT). Large FFTs, corresponding to the duration of a track or full-length album, are now extremely fast, with execution times on the order of a few seconds on standard commercially available hardware. The method first transforms the signal into the frequency domain, possibly using zero-padding. The key part of the technique modifies the length of the spectral buffer to change the ratio of the audio content to the Nyquist limit. For up-sampling, an appropriate number of zeros is inserted between the positive and negative frequencies. In down-sampling, the spectrum is truncated. Finally, the inverse FFT synthesizes a time-domain signal at the new sample rate. The proposed method does not result in surviving folded spectral images, which occur in some instances with timedomain methods. However, it causes ringing at the Nyquist limit, which can be suppressed by tapering the spectrum and by low-pass filtering. The proposed sample-rate conversion method is targeted to offline audio applications in which sound files need to be converted between sample rates at high quality.

  Download: PDF (HIGH Res) (4.3MB)

  Download: PDF (LOW Res) (987KB)

  Be the first to discuss this paper

Robust Audio Watermarking Based on Empirical Mode Decomposition and Group Differential Relations


An audio watermarking technique using Complementary Ensemble Empirical Mode Decomposition and group differential relations of average absolute amplitudes of the last Intrinsic Mode Function (IMF) is proposed. By using group differential relations, the relationship with neighboring samples in the last IMF is well preserved, and near-imperceptibility can be achieved. Placing a watermark on low-frequency components, the last IMF, which is perceptually significant, therefore makes the watermark difficult to be removed. The embedding watermark, which is a logo image in our experiment, is processed by Arnold transformation, secret key encryption, and Bose--Chaudhuri--Hocquenghem coding to enhance robustness and security. Experimental results of the signal-to-noise ratio fit the recommendations of imperceptibility of the International Federation of the Phonographic Industry. The average Objective Difference Grade (an objective measure that correlates very well with subjective assessment) and subjective quality assessment were performed to evaluate the imperceptibility. Furthermore, our method accomplishes robustness under 13 different categories of attacks, including noise corruption, amplitude scaling, echo addition, resampling, re-quantization, low-pass filtering, MPEG-1 Audio Layer III compression, Digital-to-Analog/Analog-to-Digital conversion, cropping, time shift, zero thresholding, jittering, and combined attacks.

  Download: PDF (HIGH Res) (4.6MB)

  Download: PDF (LOW Res) (814KB)

  Be the first to discuss this paper

Influence of the Relative Height of a Dome-Shaped Diaphragm on the Directivity of a Spherical-Enclosure Loudspeaker


The influence of diaphragm shape on loudspeaker directivity has been evaluated only in the frontal half-space in studies using infinite baffle models, and thus, information in the rear field remains unknown. To extend the result to the entire space, a spherical-enclosure loudspeaker model (SELM) is used in this paper that is based on modifications of existing rigid-sphere loudspeaker models. Using the boundary element method, the radiation of the SELM is simulated in the full audible range, and then directivities for dome-shaped diaphragms with different relative heights (RHs) are compared and analyzed using various metrics. The results show that in general, the planar diaphragm exhibits narrower directivity than convex or concave domes, whereas directivities of the latter two change differently with RH. In the case of a concave hemisphere, a resonance occurs around ka0 = 7.14 (a0 is the radius of the sphere), causing a low radiation power and an unusual directivity pattern, which agrees with findings of Suzuki and Tichy. For the rear radiation, the rear-to-front difference in sound pressure level of the convex hemisphere does not exceed 10 dB in the whole audible range, indicating that its rear radiation should not be neglected even in high frequencies.

  Download: PDF (HIGH Res) (18.3MB)

  Download: PDF (LOW Res) (962KB)

  Be the first to discuss this paper

Engineering Reports

Active Voice Amplifier: On-Device Noisy Environment-Aware Solution for Dialogue Enhancement in Real Time


Since a cathode-ray tube television was introduced first to the consumers in the late 1920s, a variety of multimedia device form-factors has appeared in the consumer market until now. Although the values of multimedia devices had been mostly put on picture quality and sound quality in the past, it is undoubtedly told at this moment that user experience is the most important and attractive value for the multimedia products. As for now, almost all outstanding features of the brand-new products are the technologies about user convenience such as voice user interaction, device unlock, contents recommendation, and so on. Likewise, an unprecedented feature of smart TVs, the Active Voice Amplifier, was introduced in the Consumer Electronics Show 2020, and it detects disturbing noise and enhances voice clarity accordingly and automatically. To design this feature and make it work in real time on real devices, state-ofthe- art signal processing methods and deep learning technologies are integrated in a function for the novel approach of noisy environment detection and speech extraction from multimedia audio contents. This paper overviews what this function pursues in user experience, describes how it was designed in terms of signal processing methods, and demonstrates how effectively it works on real-time TV systems.

  Download: PDF (HIGH Res) (4.4MB)

  Download: PDF (LOW Res) (744KB)

  Be the first to discuss this report

Standards and Information Documents

AES Standards Committee News

Page: 138

Download: PDF (278KB)



Page: 140

Download: PDF (12.4MB)


Table of Contents

Download: PDF (39KB)

Cover & Sustaining Members List

Download: PDF (35KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (125KB)

AES - Audio Engineering Society