In This Section
Clean Audio for TV broadcast: An Object-Based Approach for Hearing-Impaired Viewers - April 2015
Audibility of a CD-Standard A/DA/A Loop Inserted into High-Resolution Audio Playback - September 2007
Sound Board: Food for Thought, Aesthetics in Orchestra Recording - April 2015
Journal of the AES
2013 June - Volume 61 Number 6
A model that accurately computes the sound pressure field of a loudspeaker would be an efficient tool for designing a real transducer system. Although there are many tools for calculating radiation and diffraction from loudspeaker cabinets, the results are only valid for high frequencies; traditional approaches for modeling diffraction produce significant errors at frequencies below 500 Hz. This research describes an approach to solve the 3-dimensional Helmholtz equations of a piston radiator in a rectangular solid enclosure using the Method of Fundamental Solutions. This method enables accurate calculation of sound pressure, including an exact representation of diffraction. The radiation impedance of a piston in a finite enclosure can also be computed. In practice, there is a maximum frequency that depends on the cabinet size. The low- and high-frequency models can then be smoothly joined.
Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part I—Temporal Alignment
In this and the companion paper Part II, the authors present the Perceptual Objective Listening Quality Assessment (POLQA), the third-generation speech quality measurement algorithm, standardized by the International Telecommunication Union in 2011 as Recommendation P.863. In contrast to the previous standard (P.862 Perceptual Evaluation of Speech Quality), a more complex temporal alignment was developed allowing for the alignment of a wide variety of complex distortions for which P.862 was known to fail, such as multiple delay variations within utterances as well as temporal stretching and compression of the degraded signal. When this new algorithm is used in combination with the advanced perceptual model described in Part II, it provides a new measurement standard for predicting Mean Opinion Scores that outperforms the older PESQ standard, especially for wideband and super wideband speech signals (7 and 14 kHz audio bandwidth). Part I provides the basics of the POLQA approach and outlines the core elements of the temporal alignment.
Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part II—Perceptual Model
In this and the companion paper Part I, the authors present the Perceptual Objective Listening Quality Assessment (POLQA), the third-generation speech quality measurement algorithm, standardized by the International Telecommunication Union in 2011 as Recommendation P.863. This paper describes the newly developed perceptual model of this standard, allowing to assess speech quality over a wide range of distortions, from “High Definition” super-wideband speech (HD Voice, audio bandwidth up to 14 kHz) to extremely distorted narrowband telephony speech (audio bandwidth down to 2 kHz), using sample rates between 48 and 8 kHz. POLQA is suited for distortions that are outside the scope of PESQ, such as linear frequency response distortions, super-wideband degradations, time stretching/compression as found in Voice-over-IP, certain types of codec distortions, reverberations, and the impact of playback volume. Part II outlines the core elements of the underlying perceptual model and presents the final results.
This research proposes a generalized and optimized framework for time–frequency processing of spatial audio using a signal covariance matrix. This framework is relevant for a wide variety of spatial applications, such as perceptual spatial coding, stereo upmixing, decorrelation, and so on. The matrix, which represents interchannel dependencies, is perceptually relevant for the transmission of the listener’s spatial experience. In a typical application, the original time–frequency covariance matrix is transformed into the target matrix, optimizing the sound quality using a least mean square metric. In an example of upmixing stereo music, informal listening tests confirmed the validity of the framework.
With the individual requirements of different occupants and the proliferation of audio sources in the automobile, there is an interest in implementing independent front and rear listening zones to match the preferences of the occupants. Because simulations showed the physical limits for creating personal listening zones, low- and high-frequencies arrays were considered separately. Four standard audio loudspeakers were used for frequencies below 200 Hz, and phase-shift loudspeaker arrays mounted at the headrests were used for frequencies above 200 Hz. The split-band technique avoids the need for full-bandwidth loudspeakers in the headrests. To validate the results a personal audio system was implemented in an automobile cabin using the dual arrays; performance was consistent with the simulations. A contrast of 15 dB between bright and dark seats was possible.
Understanding the way in which listeners move their heads must be part of any objective model for evaluating and reproducing the sonic experience of space. Head movement is part of the listening experience because it allows for sensing the spatial distribution of parameters. In the first experiment, the head positions of subjects was recorded when they were asked to evaluate perceived source location, apparent source width, envelopment, and timbre of synthesis stimuli. Head motion was larger when judging source width than when judging direction or timbre. In the second experiment, head movement was observed in natural listening activities such as concerts, movies, and video games. Because the statistics of movement were similar to that observed in the first experiment, laboratory results can to be used as the basis of an objective model of spatial behavior. The results were based on 10 subjects.
Scientific authentication of recordings in forensic evaluations depends on having a reliable collection of parameters that can be used to determine originality, alterations, editing, and rerecording. DC offset, which is intrinsic to all electronics, is a possible candidate for such evaluations. Nine small digital audio recorders and five audio formats were measured to determine if they displayed sufficient salient characteristics to make meaningful comparisons between submitted audio files and exemplar recordings. The wide variation and inconsistency of DC offset in relation to the observed standard deviation precluded the use of this parameter.
Standards and Information Documents
AES Standards Committee News
Loudspeaker modeling and measurement; measurement and equalization of sound systems in rooms; grounding and EMC practices
49th Conference Report, London
51st Conference Preview, Helsinki
52nd Conference Preview, Guildford
52nd Conference Preliminary Program
Spatial audio can be reprocessed for reproduction over different loudspeaker formats using upmixing and downmixing. It can even be rendered binaurally for headphones. We review the latest research in this field and consider the potential pros and cons of the technology.
53rd Conference, London, Call for Papers