Authors:Välimäki, Vesa; Bilbao, Stefan
Affiliation:Acoustics Laboratory, Department of Information and Communications Engineering, Aalto University, Espoo, Finland; Acoustics and Audio Group, University of Edinburgh, Edinburgh, United Kingdom
The audio industry uses several sample rates interchangeably, and high-quality sample-rate conversion is crucial. This paper describes a frequency-domain sample-rate conversion method that employs a single large ("giant") fast Fourier transform (FFT). Large FFTs, corresponding to the duration of a track or full-length album, are now extremely fast, with execution times on the order of a few seconds on standard commercially available hardware. The method first transforms the signal into the frequency domain, possibly using zero-padding. The key part of the technique modifies the length of the spectral buffer to change the ratio of the audio content to the Nyquist limit. For up-sampling, an appropriate number of zeros is inserted between the positive and negative frequencies. In down-sampling, the spectrum is truncated. Finally, the inverse FFT synthesizes a time-domain signal at the new sample rate. The proposed method does not result in surviving folded spectral images, which occur in some instances with timedomain methods. However, it causes ringing at the Nyquist limit, which can be suppressed by tapering the spectrum and by low-pass filtering. The proposed sample-rate conversion method is targeted to offline audio applications in which sound files need to be converted between sample rates at high quality.
Download: PDF (HIGH Res) (4.3MB)
Download: PDF (LOW Res) (987KB)
Authors:Lai, Wen-Hsing; Chou, Tsung-Yuan; Chou, Meng-Chen; Schuller, Björn W.
Affiliation:Department of Computer and Communication Engineering, National Kaohsiung University of Science and Technology, No. 1, University Rd., Yanchao Dist., Kaohsiung City 82445, Taiwan; ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany; Ph.D. Program in Engineering Science and Technology, College of Engineering, National Kaohsiung University of Science and Technology, Taiwan; Department of Computer and Communication Engineering, National Kaohsiung University of Science and Technology, No. 1, University Rd., Yanchao Dist., Kaohsiung City 82445, Taiwan; ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany; GLAM – Group on Language, Audio & Music, Imperial College London, U.K.
An audio watermarking technique using Complementary Ensemble Empirical Mode Decomposition and group differential relations of average absolute amplitudes of the last Intrinsic Mode Function (IMF) is proposed. By using group differential relations, the relationship with neighboring samples in the last IMF is well preserved, and near-imperceptibility can be achieved. Placing a watermark on low-frequency components, the last IMF, which is perceptually significant, therefore makes the watermark difficult to be removed. The embedding watermark, which is a logo image in our experiment, is processed by Arnold transformation, secret key encryption, and Bose--Chaudhuri--Hocquenghem coding to enhance robustness and security. Experimental results of the signal-to-noise ratio fit the recommendations of imperceptibility of the International Federation of the Phonographic Industry. The average Objective Difference Grade (an objective measure that correlates very well with subjective assessment) and subjective quality assessment were performed to evaluate the imperceptibility. Furthermore, our method accomplishes robustness under 13 different categories of attacks, including noise corruption, amplitude scaling, echo addition, resampling, re-quantization, low-pass filtering, MPEG-1 Audio Layer III compression, Digital-to-Analog/Analog-to-Digital conversion, cropping, time shift, zero thresholding, jittering, and combined attacks.
Download: PDF (HIGH Res) (4.6MB)
Download: PDF (LOW Res) (814KB)
Authors:Zhang, Zhichao; Yu, Guangzheng; Liang, Linda
Affiliation:Acoustic Laboratory, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China; Acoustic Laboratory, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China; College of Civil Engineering and Architecture, Guangxi University, Nanning, China
The influence of diaphragm shape on loudspeaker directivity has been evaluated only in the frontal half-space in studies using infinite baffle models, and thus, information in the rear field remains unknown. To extend the result to the entire space, a spherical-enclosure loudspeaker model (SELM) is used in this paper that is based on modifications of existing rigid-sphere loudspeaker models. Using the boundary element method, the radiation of the SELM is simulated in the full audible range, and then directivities for dome-shaped diaphragms with different relative heights (RHs) are compared and analyzed using various metrics. The results show that in general, the planar diaphragm exhibits narrower directivity than convex or concave domes, whereas directivities of the latter two change differently with RH. In the case of a concave hemisphere, a resonance occurs around ka0 = 7.14 (a0 is the radius of the sphere), causing a low radiation power and an unusual directivity pattern, which agrees with findings of Suzuki and Tichy. For the rear radiation, the rear-to-front difference in sound pressure level of the convex hemisphere does not exceed 10 dB in the whole audible range, indicating that its rear radiation should not be neglected even in high frequencies.
Download: PDF (HIGH Res) (18.3MB)
Download: PDF (LOW Res) (962KB)
Authors:Cho, Jaeyoun; Kim, Sunmin; Hwang, Inwoo
Affiliation:Samsung Electronics Co., Ltd., Suwon, South Korea
Since a cathode-ray tube television was introduced first to the consumers in the late 1920s, a variety of multimedia device form-factors has appeared in the consumer market until now. Although the values of multimedia devices had been mostly put on picture quality and sound quality in the past, it is undoubtedly told at this moment that user experience is the most important and attractive value for the multimedia products. As for now, almost all outstanding features of the brand-new products are the technologies about user convenience such as voice user interaction, device unlock, contents recommendation, and so on. Likewise, an unprecedented feature of smart TVs, the Active Voice Amplifier, was introduced in the Consumer Electronics Show 2020, and it detects disturbing noise and enhances voice clarity accordingly and automatically. To design this feature and make it work in real time on real devices, state-ofthe- art signal processing methods and deep learning technologies are integrated in a function for the novel approach of noisy environment detection and speech extraction from multimedia audio contents. This paper overviews what this function pursues in user experience, describes how it was designed in terms of signal processing methods, and demonstrates how effectively it works on real-time TV systems.
Download: PDF (HIGH Res) (4.4MB)
Download: PDF (LOW Res) (744KB)