AES 122nd Convention
May 5-8, 2007
AES Paper Ordering
Single Convention Papers are available through the AES Paper Search and Shop facility.
Using Transient Suppression in Blind Multi-Channel Upmix Algorithms
Disch, Sascha; Uhle, Christian; Walther, Andreas
In the field of blind upmixing, many algorithms exist for generating multi-channel sound from mono or stereo sources. One of the important blind upmix scenarios is the ambience-based upmix. This approach aims at extracting the ambient parts of a given signal and their reproduction by taking the best possible advantage of a multi-channel loudspeaker setup. Depending on the number of channels and the signal characteristics of the input signal, the quality of the extracted ambience can vary. In this paper, transient suppression is suggested as a method for improving the impression of the perceived quality of an extracted ambience signal. Two methods for suppressing transient components are proposed and contrasted to existing techniques. The ability of those methods to improve the subjective quality of the overall upmix is evaluated in a listening test.
A Novel Approach to Up-Mix Stereo to Surround Based on MPEG Surround Technology
Ehret, Andreas; Gröschel, Alexander; Purnhagen, Heiko; Rödén, Jonas
With the increasing number of installed surround sound systems in consumer homes and cars, the demand for surround sound content is rising. However, the vast majority of content is still only available in stereo. Furthermore, it is in many cases difficult to create a surround mix for content previously released in mono or stereo due to the high effort necessary or simply because the original multi-track recordings are unavailable. Consequently the need for tools that allow an automated or semi-automated stereo to surround up-mix is growing. In this paper, a novel approach is described which is based on technology that is part of the MPEG Surround standard. The basic algorithm and some proposed extensions are outlined and potential use-cases described. Finally, the subjective quality of the presented approach is compared to existing solutions.
Coding of "2+2+2" Surround Sound Content Using the Mpeg Surround Standard
Ehret, Andreas; Gröschel, Alexander; Purnhagen, Heiko; Roedén, Jonas
An increasing number of recordings is available in the so-called 2+2+2 surround sound format, where in addition to two front and two rear speakers, a third pair of speakers is placed at an elevated position above the front speakers. Recently, the MPEG Surround standard has been finalized as an efficient stereo backwards compatible surround sound coding format. The present paper studies the applicability of MPEG Surround for efficient coding of "2+2+2" content. Several alternative approaches are outlined and evaluated by means of subjective listening tests.
Quality Taxonomies for Auditory Virtual Environments
The aim of the here developed new taxonomies is to describe the components involved in the quality process of Auditory Virtual Environments (AVE), and to quantify the relations between them for different applications. The taxonomy should allow an overview and identify the relations which are most important in the software development process and the design of listening experiments. For the first time the multivariate relations between the Quality Elements in the physical domain and the Quality Features in the perceptual domain of such a Quality Taxonomy are evaluated for three different AVE applications. This evaluation and quantification is done by means of an expert survey (DELPHI method) to objectify the results. Principal Component Analysis reveals that five dimensions are necessary to describe about 95% of the variance in the data. This indicates that the selected seven Quality Features are clearly distinguishable for the experts, but not orthogonal to each other. Most of the Quality Features are introduced meaningful terms in the audio engineering field and therefore usable without training for the participating experts. The results of the expert survey are compared to listening test results, which use the same Quality Features. The bottom line is that the expert survey is not only a much faster method to get a good overview about a specific application as compared to the listening test, but it also reveals more information about it.
Individual Localisation Behaviour for Perception of Virtual Sound Sources
Bradter, Cornelius; Hobohm, Klaus
Test results normally indicate very large variability in perception of lateral virtual sound sources with a 5-channel speaker setup. Three recent studies indicate that up to a third of stimuli from a side speaker pair were perceived surprisingly accurately as virtual sound sources. In other cases sound sources were perceived as coming from only one speaker or from it’s vicinity. Therefore, we specified prototypical localisation behaviours. We examined effects on localisation by reproduction rooms, exact position of test persons in relation to speakers, test persons’ head movements and trading between time delay and level differences.
Comparison of Different Sound Capture and Reproduction Techniques in a Virtual Acoustic Window.
De Bruijn, Werner; Haapsaari, Timo; Härmä, Aki
In this paper we describe a two-way audio communication system using arrays of microphones and loudspeakers to create a virtual acoustic window. We compare three different methods for capturing the sound: wave field sampling using a line array of microphones, an adaptive beamformer, and close-talk microphones. For sound reproduction, we employ wave field synthesis. In the paper we review the acoustic and perceptual requirements for a real-time virtual acoustic window system, and report results of a set of listening experiments performed with the system.
Recording of Acoustical Concerts Using a Soundfield Microphone
Schellstede, Markus; Faller, Christof
Based on extensive practical experience of one of the authors, recording of acoustical concerts, using a Soundfield Microphone, without spot or support microphones is discussed. The focus is stereo recording of classical music. Strategies for positioning of the microphone, B-Format decoding, and mastering are presented. The so-obtained final mix is largely based on the natural mix of sound reaching the microphone. This is in contrast to more conventional recording techniques which use usually a large number of spot and support microphones. Last but not least, limitations and cost considerations are discussed.
Ambience Sound Recording Utilizing Dual MS(Mid-Side) Microphone Systems Based Upon Frequency Dependent Spatial Cross Correlation (FSCC)
Ifukube, Tohru; Miura, Takahiro; Muraoka, Teruo
In musical sound recording for CD production or broadcasting, a forest of microphones is commonly observed. They are for good sound localization and favorable ambience, however it is desirable to make the forest sparse for less laborious setting up and mixing. Previously, we examined ambience representation of stereophonic microphone arrangements utilizing Frequency Dependent Spatial Cross Correlation (FSCC). FSCC is defined as a cross correlation of outputs by two microphones that of MS microphone system is most favorable. Based upon the result, we devised combination of two MS microphone systems, one is for acquisition of stage sounds and the other is for ambience representation. In addition to minor stage microphones we achieved satisfactory musical recording.
Enhanced Mpeg-4 Low Delay AAC - Low Bitrate High Quality Communication
Geiger, Ralf; Herre, Jürgen; Jander, Manuel; Multrus, Markus; Schmidt, Markus; Schnell, Markus; Schuller, Gerald
The MPEG-4 Low Delay Advanced Audio Coding (AAC-LD) scheme has recently evolved into a popular algorithm for audio communication. It produces excellent audio quality at bitrates between 64kbit/s and 48kbit/s per channel. This paper introduces an enhancement to AAC-LD which reduces the bitrate demand by 25-33%. This is achieved by adding a combination of a delay-optimized version of the Spectral Band Replication (SBR) tool and by utilizing a dedicated low delay filterbank. The introduced techniques maintain the high audio quality and offer an algorithmic delay low enough for use in two way communication systems. This paper describes the coder enhancements including a detailed discussion of algorithmic delay issues, a performance assessment and possible applications.
On the Design of Low Power MPEG-4 HE-AAC Encoder
Hsu, Han-Wen; Hu, Cheng-Lun; Lee, Wen-Chieh; Liu, Chi-Min
Spectral Band Replication (SBR) has been combined with MPEG AAC as bandwidth extension tool. The resulting scheme is referred to as the MPEG-4 High Efficient (HE) AAC or aacPlus. With SBR module taking care of the high frequency contents, the conventional AAC encoder can compress the low frequency part using most of the available bits. The SBR parameters are all calculated by SBR encoder in complex domain in the architecture of conventional QMF. If the components in SBR encoder can be implemented in real domain, the computational complexity of HE-AAC will be reduced by half. This paper proposes the Low Power MPEG-4 HE-AAC encoder to reduce the computational complexity. The objective experiments are conducted to demonstrate the quality of Low Power HE-AAC encoder on critical music tracks. Finally, the paper will extend Low Power technique to Parametric Stereo (PS) Encoder with HE-AAC.
High Quality, Low Power QMF Bank Design for SBR, Parametric Coding, and MPEG Surround Decoders
Hsu, Han-Wen; Lee, Wen-Chieh; Liu, Chi-Min; Tseng, Hsin-Yao; Yang, Chung-Han
Due to the alias-free properties, the complex quadrature mirror filter (QMF) bank has been used in MPEG-4 audio standard on SBR, parametric, and surrounds coding. The high complexity overhead from the complex QMF bank and the complex data processing in the decoder leads to the development of low power decoder which adopts the real QMF bank as the basic building module to reduce the complexity. However the artifacts from the aliasing in the real QMF bank are the major concern. This paper studies the artifacts from the real QMF bank and proposes a novel QMF bank design to achieve both low complexity and high quality. Also, this paper applies the novel QMF bank to develop the high-quality and low-power SBR, parametric, and MPEG surround decoders and shows the merits in complexity and quality.
Low Power Stereo Perceptual Audio Coding Based on Adaptive Masking Threshold Reuse
George, Sapna; Kurniawati, Evelyn
The term perceptual audio coder refers to audio compression schemes that exploit the properties of human auditory perception. The idea is to allocate the quantization noise elegantly below the masking threshold to make it imperceptible to the ear. The process requires considerable computational effort, especially due to the psychoacoustics analysis and bit allocation-quantization process. This paper proposes a new method to simplify the psychoacoustics modelling process by adaptively reusing the computed masking threshold depending on the signal characteristics. The method also device a scheme to patch the potential spectral hole problems that might occur when the quantization parameters are reused. This proposal can be applied to generic stereo perceptual audio encoder where low computational complexity is required.
A Hybrid Warped Linear Prediction (WLP) AAC Audio Coding Algorithm
Kang, Hong-Goo; Lee, Jae-Seong; Park, Young-Cheol; Youn, Dae-Hee
We propose a hybrid warped linear prediction (WLP) AAC audio coding algorithm. The proposed algorithm employs a warped linear prediction (WLP) processor to construct a perceptual pre- and post-filter for the MPEG-4 AAC. The WLP residual is applied to the MDCT filterbank and the signal-to-mask ratio (SMR) of the corresponding block is modified to set a masking threshold for the WLP residuals. In the decoder, the reconstructed residual signal is passed to a modified WLP synthesis filter to restore the audio signal. Subjective tests show that the proposed audio codec operating at 50kbps has comparable perceptual quality to the conventional MPEG-4 AAC operating at the 58kbps.
Comparison of Stereo Redundancy Reduction Schemes for an Ultra Low Delay Audio Coder
Albert, Tobias; Hirschfeld, Jens; Krämer, Ulrich; Schuller, Gerald; Wabnik, Stefan
In our Ultra Low Delay Audio Coder (ULD) a pre-filter which is controlled by a psycho-acoustic model is followed by a quantizer and a predictive coder to code signals in the time-domain. The output of the predictor is entropy coded and transmitted. Predictor and entropy coder form the lossless redundancy reduction part of the coder. Our goal is to improve the lossless redundancy-reduction part for stereo signals. We present and evaluate 6 different alternatives for the stereo redundancy reduction, and we combine those alternatives to obtain a higher compression ratio.
Speech Codec Enhancements Utilizing Time Compression and Perceptual Coding
Czyzewski, Andrzej; Kulesza, Maciej
A method for encoding wideband speech signal employing standardized narrowband speech codecs is presented as well as experimental results concerning detection of tonal spectral components. The speech signal sampled with a higher sampling rate than it is suitable for narrowband coding algorithm is compressed in order to decrease the amount of samples. Next, the time-compressed representation of a signal is encoded using a narrowband speech codec. The time expansion procedure is applied to the speech signal after transmission and decoding in order to restore original time relations. Finally, the wideband speech signal is presented to the user. The method for spectral envelope estimation involving perceptual criteria is described. The algorithms for tonal components detection were evaluated and compared during experiments carried-out.
Design and Implementation of a Web-Based Software Framework for Real Time Intelligent Audio Coding Based on Speech/Music Discrimination
Garcia Galan, Sebastian; Muñoz Exposito, Jose Enrique; Ruiz Reyes, Nicolas; Vera Candeas, Pedro
In this work a software framework based on client-server architecture is implemented for real time intelligent audio coding. A speech/music discrimination scheme analyzes the input audio signal and takes a decision about the nature of the audio signal (speech or music) on a frame by frame basis. According to the decision of the speech/music discriminator, a suitable coder is selected at each frame. The designed software framework makes use of the speech and audio coders incorporated into the MPEG4 audio standard (HVXC or CELP for speech frames and TwinVQ or AAC for music frames) to evaluate the performance of an intelligent multi-mode audio coder. The framework supports several types of audio features (timbral texture features and rhythmic content features) and classifiers (classical Statistical Pattern Recognition (SPR) classifiers, Multilayer Perceptron Neural Networks (MLPNN), Support Vector Machines (SVM), Fuzzy Expert Systems (FES), Hidden Markov Models (HMM)). Comparison between a speech/music discrimination based-intelligent audio coder and MPEG4-AAC has been performed using audio signals representative of the two corresponding classes (speech and music). Subjective and objective tests have been accomplished aiming at assessing the behaviour of the intelligent audio coding scheme.
Quantization of Laguerre-Based Stereo Linear Predictors
Biswas, Arijit; den Brinker, Albertus C.
Recently a quantization strategy for stereo linear prediction systems was proposed and tested using random data as input. This research is extended in the current paper by incorporating Laguerre filters in the stereo linear prediction scheme. First, it is shown that the associated normalized reflection matrices can be obtained efficiently. Second, the system was tested using stereo audio data in order to gain an insight into the required bit-rates for practical applications.
150 Years of Time-Base in Acoustic Measurement and 100 Years of Audio's Best Publicity Stunt - 2007 as a Commemorative Year
Léon Scott's invention of the phonoautograph in 1857 made a long time-base available for recording of vibrations, and it was also the first time an air-borne sound was recorded. Although his invention formed the basis, both for sound recording and reproduction and for acoustical science as we know it, it has been largely forgotten. Neither Scott nor the instrument maker Koenig are mentioned in the series 'Benchmark Papers of Acoustics'. Today we take sound archives for granted, but the whole sound archive movement would not have received any attention in the general public, if one particular event had not occurred: the sealed deposit in 1907 of important shellac records and a gramophone in the vaults below the Paris Opera house. They were intended to remain untouched for 100 years, and they have survived to this day. The paper will provide the documentation for these historical events that form the basis of so many of our professional activities.
Knowledge: The Missing Element in Archiving and Restoration?
Davies, Sean W.
This admittedly provocative title nevertheless calls attention to a situation which already exists and which may become critical in the future. No sound recording can be considered in isolation from the technical system which produced it. A proper working knowledge of such a system is an essential requirement for any person working on the transfer of such a recording. This paper examines the range of such required knowledge and the means by which it may be taught to personnel likely to be involved with archival material.
Non Contact Phonographic Disks Digitisation Using Structured Colour Illumination
Chenot, Jean-Hugues; Laborelli, Louis; Perrier, Alain
We propose an innovative contactless optical playing device for 78rpm and 33rpm lateral modulation phonographic disks, using structured colour illumination. An area of the disk is illuminated by a beam of rays, with colour depending on the direction of incidence, that are reflected by the groove wall towards a camera image. In contrast with standard methods, that measure the velocity of the groove at a single location, direct access to the audio signal value is obtained here directly from pictures through colour decoding, and the whole height of the groove wall is exploited. This colour coding allows for the detection of occluding dust, and automated interpolation of missing audio signal. Results on distortion, S/N ratio, and bandwidth are presented.
Improvement of Cylindrical Record Reduction Utilizing Inharmonic Frequency Analysis GHA
Ifukube, Tohru; Muraoka, Teruo; Nakagomi, Shota
Cylindrical records were important sound media around the beginning of 20th century, and a lot of historical recordings were made by using them. We have engaged in the research of Cylindrical record reproduction and noise-reduction of damaged SP records utilizing inharmonic frequency analysis of GHA(Generalized Harmonic Analysis). Surface noise of Cylindrical records are more serious than SP record, so we challenged its noise reduction by modifying GHA noise-reduction.
Method Comparison of Pick-Up and Pre-Processing of Bias Signal for Wow and Flutter Correction
Pavuza, Franz; Pichler, Heinrich; Wallaszkovits, Nadja
Based on a practical implementation within an archival workflow and prior studies of bias signal retrieval from analogue magnetic tape, the authors focus on a comparison of reproduction and pre-processing methods of the bias signal. Previous approaches are compared to the authors' proposed method. The signal pre-processing in analogue as well as digital domain are outlined, and basing on analyses of bias signals from professional and semi-professional recordings, the various practical problems are discussed: Level instability and unknown frequency of the recorded bias signal, frequency variations mainly with semi-professional devices of older types of recording equipment due to the instability of the bias oscillator, as well as effects of signal distortions, interferences and ultrasonic artefacts.
Improved Magneto-Optical ¼” Audio Tape Player for Preservation.
This improved ¼” audio tape player features a multi-track magneto-optical reader to reduce preservation cost through speed, adjustment automation and compatibility. The benefits of this 32 channels head, connected to a digital signal processor, are high speed capability, compatibility and automatic detection of any number of audio tracks, real time automatic adjustment of the best digital playback azimuth and filtering of crosstalk and partial track erasures.
Analysis and Restoration of Faulty Audio CDs
Daudet, Laurent; Fontaine, Jean-Marc; Galiègue, Hélène
Many audio CDs (mostly CD-Rs but also CD-ROMs) have defects that appear due to bad manufacturing, careless use, or simple aging of its physical constituents. Here, we study such audio CDs that are still readable with a standard player (computer CD/DVD drive or standalone audio player with digital output), but whose defects are not fully handled by error correction codes, resulting in a highly distorted signal. This study is two-fold: first, we characterize these errors on a few example discs, and second, we study different means to restore the audio content, by fusion of multiple reads and by the use of interpolation schemes.
Techniques for the Authentication of Digital Audio Recordings
Brixen, Eddy B.
In forensic audio one important task is the authentication of audio recordings. Standards and procedures already exist regarding analogue recordings. In the field of digital recording and digital media the conditions are different. A rock solid methodology is needed here, but does not exist yet. This paper reviews existing techniques and presents some results regarding an additional tool, the ENF criterion, which should be considered to become a standard within the AES as well as in the forensic community as a whole.
Using Multiple Feature Extraction with Statistical Models to Categorize Music by Genre
In recent years, large capacity portable personal music players have become widespread in their use and popularity. Coupled with the exponentially increasing processing power of personal computers and embedded devices, the way people consume and listen to music is ever changing. To facilitate the categorization of these personal music libraries, a system is employed using MPEG-7 feature vectors as well as Mel-Frequency Cepstral Coefficients classified through multiple trained Hidden Markov Models and other statistical methods. The output of these models is then compared and a genre choice is made based on which model gives the best fit. Results from these tests are analyzed and ways to improve the performance of a genre sorting system are discussed.
On the Application of Sound Source Separation to Wave-Field Synthesis
Cobos, Maximo; López, Jose J.
Wave-Field Synthesis (WFS) is a spatial sound system that can synthesize an acoustic field in an extended area by means of loudspeaker arrays. Spatial positioning of virtual sources is possible but requires separated signals for each source to be feasible. Despite most of the music is recorded in separated tracks for each instrument, in the stereo mixdown process this information is lost. Unfortunately, most of the existing recorded material is in stereo format. In this paper we propose to use Sound Source Separation techniques to overcome this problem. Existing algorithms are yet far from perfection resulting in audible artifacts that clearly reduce the quality of the resynthesized sources in practice. Despite these artifacts, when separated sources are mixed again by a WFS system they are masked by other sounds. The utility of different separation algorithms and the subjective results are discussed.
Reproduction of Arbitrarily Shaped Sound Sources with Wave Field Synthesis - Discretisation and Diffraction Effects
Baalman, Marije A. J.
Current Wave Field Synthesis (WFS) implementations only allow for point sources and plane waves. In order to reproduce arbitrarily shaped sound sources withWFS several aspects need to be considered, such as the WFS-operator for source points outside of the horizontal plane, discretisation of the object surface and diffraction of the sound around the sounding object itself, which can be modelled by introducing secondary sources at the edges of the object. This paper discusses those issues, describes the implementation in software and shows results of both objective and subjective evaluation.
The Effect of Head Diffraction on Stereo Localization in the Mid-Frequency Range
Benjamin, Eric; Brown, Phil
In a previous paper, the present author described anomalous localization in intensity stereo at frequencies above the frequency at which the head is approximately one wavelength in diameter. Conventional analysis of stereo localization has usually depended on an asymptotic shadowless model of the head’s diffraction. Measurements of the ear signals heard by the subjects in localization experiments showed that there were large differences between what was predicted by the simple model, and what was found in actual circumstances. We present a simple model for the head’s diffraction in the range of 1200 Hz to 5000 Hz and show that it produces results which correspond more closely to real-world localization.
Multiple Exponential Sweep Method for Fast Measurement of Head Related Transfer Functions
Balazs, Peter; Laback, Bernhard; Majdak, Piotr
Presenting sounds in virtual environments requires filtering of free field signals with head related transfer functions (HRTF). The HRTFs describe the filtering effects of pinna, head, and torso measured in the ear canal of a subject. The measurement of HRTFs for many positions in space is a time-consuming procedure. To speed up the HRTF measurement the multiple exponential sweep method (MESM) was developed. MESM speeds up the measurement by interleaving and overlapping sweeps in an optimized way and retrieves the impulse responses of the measured systems. In this study the MESM and its parameter optimization is described. As an example of an application of MESM, the measurement duration of a HRTF set with 1550 positions is compared to the unoptimized method. Using MESM, the measurement duration could be reduced by a factor of four without a reduction of the signal-to-noise ratio.
A Fast Multipole Boundary Element Method for Calculating HRTFs
Chen, Zhensheng; Kreuzer, Wolfgang
The form of head, torso and especially the form of the pinna play an important role for localizing sound sources. Reflections and deflections especially at the pinna induce filters, the so called head related transfer functions (HRTFs) which are dependent on frequency and location of the sound source. To numerically simulate these functions a 3D-boundary element model of the head was created. As we want to calculate HRTFs up to frequencies of 12000-13000Hz a fine grid is necessary, therefore a huge amount of calculations would be necessary. To avoid this high computational costs the BEM model was coupled with an appropriate fast multipole algorithm.
A Hybrid Artificial Reverberation Algorithm
Murphy, Damian; Stewart, Rebecca
Convolution reverberation allows for an accurate reproduction of a space, but yields no flexibility in defining that space, while filterbank-based reverberation allows computational efficiency and flexibility but lacks absolute accuracy. A hybrid artificial reverberation algorithm that uses elements of both convolution and filterbank reverberation is investigated. An impulse response is truncated to contain only the early reflections and is convolved with input audio; the output audio then is combined with audio processed through a filterbank to simulate the late reverberation. The parameters defining the filterbank are derived from the impulse response being analyzed. It is shown that this hybrid reverberator can produce a high-quality reverberation comparable to convolution reverberators.
On Development of New Audio Codecs
This contribution presents the works recently completed or on-going in 3GPP and ITU-T on the development of new audio codecs. The main applications are wideband speech telephony, audio conferencing, and mobile multimedia applications including Packet-Switched Streaming (PSS), Multimedia Messaging (MMS) and Multimedia Broadcast/Multicast Service (MBMS). In the standardization process, tems-of-reference describing design constraints and performance requirements, test plans, selection rules are finalized first. Next, extensive subjective listening testing is conducted. The codec selection is based on the selection test results and the selection rules. Characterization phase of testing allows obtaining full amount of information on the codec behavior.
Fixed-Point Processing Optimization of Mpeg Audio Encoder Using Statistical Model
Lee, Keun-Sup; Park, Young-Cheol; Youn, Dae-Hee
Audio applications for portable devices have two critical restrictions: small size and low power consumption. Therefore, fixed-point implementations are essential for those applications. Even with a fixed-point processor, however, the data width still is an issue because it can affect both the hardware cost and power consumption. In this paper, we propose a statistical model for the MPEG AAC audio encoder that can provide an optimal precision for the implementation. The hardware with the optimal precision, being compared with the floating-point system, is supposed to have perceptually insignificant errors at its output. To have an optimal precision for the AAC encoder, we estimate the maximum allowable amount of fixed-point arithmetic errors in the bit-allocation process using the statistical model. Finally, we present a architecture for the system appropriate for encoding the audio signals with minimum errors by the fixed-point processing. Tests showed that the fixed-point system optimized using the proposed model had sound quality comparable to the floating-point encoding system.
Enhanced Bass Reinforcement Algorithm for Small-Sized Transducer
Arora, Manish; Chung, Chiho; Jang, Seongcheol; Moon, Han-gil
Nowadays, it is very popular that mobile devices such as cell phones or mp3 players are using small-sized speaker systems to supply sound event to users. The reason why they are using small-sized transducers is mainly restrictions due to the design and the size of the devices. Unfortunately, their design and size restrain the transducers from high quality of low frequency performance. To breakthrough this physical barrier of poor low frequency generation, well-know psycho-acoustical background, “missing fundamental illusion” is exploited. In this paper, the method of enhancing bass perception using virtual pitch is presented. With the presented method, listeners can feel the deep bass with fewer artifacts.
Subordinate Audio Channels
Jackson, Tim D.; Li, Francis; Yates, Keith
In this paper we propose a model for a backwards-compatible subordinate audio channel within a host digital audio signal. Embedding and extraction methods are presented and objective-perceptual assessment results reported. The method is designed so as to minimize perceptual degradation to the host signal and maintain compatibility with existing systems. The implementations utilize the Discrete Cosine Transform and masking properties of the Human Auditory System. Performance evaluation is assessed using the objective-perceptual measure, Objective Difference Grade. Test results support that both the host and subordinate audio channels can maintain good audio fidelity without significant perceptual degradation.
Room Equalization Based on Acoustic and Human Perceptual Features
Hasegawa-Johnson, Mark; Kim, Lae-Hoon; Lim, Jun-Seok; Sung, Koeng-Mo
Room equalization has the potential to create improved audio display for homes, cars, and professional applications. In this paper, the signal is inverse filtered using an inverse filter computed by using newly introduced regularized optimal multi-point frequency-warped linear prediction coefficients. We present experimental results which show that the proposed room equalization algorithm improves equalization on the equalizable parts, thus enlarging the region of perceptually effective equalization.
Parametric Loudspeaker Equalization. Results and Comparison with Other Methods
López, Jose J.; Ramos, German
The results obtained by a loudspeaker equalization method are presented and compared with other equalization methods. The main characteristic of the proposed method resides on the fact that the equalizer structure is planned from the beginning as a chain of SOS (Second Order Sections), where each SOS is a low-pass, high-pass or parametric filter defined by its parameters (frequency, gain and Q), and designed by a direct search method. This filter structure, combined with the subjectively motivated error function employed, allows obtaining better results from a subjective point of view and requiring lower computational cost. The results have been compared with different FIR (Finite Impulse Response) and IIR (Infinite Impulse Response) filter design methods, with and without warped structures. In all cases, for the same computational cost, the presented method obtains a lower error function value.
A Zero-Pole Vocal Tract Model Estimation Method Accurately Reproducing Spectral Zeros
Balazs, Peter; Marelli, Damián
Model based speech coding consists in modeling the vocal tract as a linear time-variant system. The all-pole model produced by the computationally efficiency Linear Predictive Coding method provides a good representation for the majority of speech sounds. However, nasal and fricative sounds, as well as stop consonants, contain spectral zeros which requires the use of a zero-pole model. Roughly speaking, a zero-pole model estimation method typically does a non-parametric estimation of the vocal tract impulse response, and tunes the zero-pole model to fit this estimation in a square sense. In this paper we propose an alternative strategy. We tune the zero-pole model to directly fit the power spectrum of the speech signal in a logarithmic scale, to be consistent with the way the human ear perceives sounds. In this way, we avoid the error introduced by the vocal tract impulse response estimation and obtain a model that is more accurate at reproducing spectral zeros in a logarithmic scale. A drawback of the proposed method, however, is its computational complexity.
Artificial Speech Synthesis Using LPC
Kadaba, Manjunath D
It uses Linear Predictive Coding technique, which is one of the most useful methods for encoding good quality speech at low bit rate. LPC Provides extremely accurate estimates of speech parameters, and is relatively efficient for computation. The important aspect of LPC is the linear predictive filter, which allows the value of the next sample to be determined by a combination of previous samples.
Some Effects of the Torso on Head-Related Transfer Functions
Huttunen, Tomi; Kärkkäinen, Asta; Kärkkäinen, Leo; Kirkeby, Ole; Seppälä, Eira T.
A numerical method based on the ultra-weak variational formulation (UWVF) is used to calculate three sets of Head-Related Transfer Functions (HRTFs). The three sets are made by combining a hard head with a hard torso, a moderately absorbing torso, and no torso. Each set is sampled for every 50Hz from DC to 24kHz at 21,872 points almost evenly distributed in the far-field, thus providing a spatial resolution of approximately one degree everywhere. Since the results of the numerical simulations are not contaminated by the response of an electro-acoustic chain it is possible to compare the HRTFs of a head-and-torso to the HRTFs of the head only without the risk of interpreting a measurement artifact as a physical phenomenon.
An Investigation Into Head Movements Made When Evaluating Various Attributes of Sound
Brookes, Tim; Kim, Chungeun; Mason, Russell
This research extends the study of head movements during listening by including various listening tasks where the listeners evaluate spatial impression and timbre, in addition to the more common task of judging source location. Subjective tests were conducted in which the listeners were allowed to move their heads freely whilst listening to various types of sound and asked to evaluate source location, apparent source width, envelopment, and timbre. The head movements were recorded with a head tracker attached to the listener’s head. From the recorded data, the maximum range of movement, mean position and speed, and maximum speed were calculated along each axis of translational and rotational movement. The effects of various independent variables, such as the attribute being evaluated, the stimulus type, the number of repetition, and the simulated source location were examined through statistical analysis. The results showed that whilst there were differences between the head movements of individual subjects, across all listeners the range of movement was greatest when evaluating source width and envelopment, less when localising sources, and least when judging timbre. In addition, the range and speed of head movement was reduced for transient signals compared to longer musical or speech phrases. Finally, in most cases for the judgement of spatial attributes, head movement was in the direction of source direction.
Binaural Resynthesis for Comparative Studies of Acoustical Environments
Lindau, Alexander; Hohn, Torben; Weinzierl, Stefan
A framework for comparative studies of binaurally resynthesized acoustical environments is presented. It consists of a software-controlled, automated head and torso simulator with multiple degrees of freedom, an integrated measurement device for the acquisition of binaural impulse responses in high spatial resolution, a head-tracked realtime convolution software capable to render multiple acoustic scenes at a time, and a user interface to conduct listening tests according to different test designs. Methods to optimize the measurement process are discussed, as well as different approaches to datareduction. Results of a perceptive evaluation of the system are shown, where acoustical reality and binaural resynthesis of an acoustic scene were confronted in direct A/B comparison. The framework permits, for the first time, to study the perception of a listener instantaneously relocated to different binaurally rendered acoustical scenes.
Acoustic Factors of Auditory Distance Perception by the Blind While Walking
Ifukube, Tohru; Ino, Shuichi; Miura, Takahiro; Muraoka, Teruo
Ability by which the blind can recognize surrounding objects solely by hearing, is called "obstacle sense". By analyzing and modeling its mechanism, the model will be utilized for realizing an acoustic VR environment as well as training systems for the vision-impaired through acoustic analysis. In this study, the authors particularly focused on various sorts of acoustic factors which may contribute to perceive the distance from the subject to the obstacle especially in walking. We also investigated the factors based on the psychophysical experiments and acoustical analysis methods. In addition, the authors discussed the contribution of these factors to the blinds' auditory distance perception.
Listener Loudspeaker Preference Ratings Obtained In Situ Match those Obtained Via a Binaural Room Scanning Measurement and Playback System
Martens, William L.; Olive, Sean E.; Welti, Todd
Binaural room scanning (BRS) is a method of capturing, storing, and reproducing via a head-tracking headphone display system the binaural room impulse response of one or more loudspeakers in a listening room. This paper reports the results of the first test in a series of validation tests of a custom BRS system that was developed for research and evaluation of different loudspeakers and different listening spaces. The test examined whether listeners’ loudspeaker preference ratings made in a listening room with reflective walls (in situ) were comparable to ratings made in response to BRS reproductions of those loudspeakers located in the same room. Virtually the same results were obtained in these two cases.
Perceptually-Motivated Audio Morphing: Brightness
Brookes, Tim; Williams, Duncan
A system for morphing the brightness of two sounds independently from their other perceptual or acoustic attributes was coded, based on the Spectral Modelling Synthesis additive/residual model. A Multidimensional Scaling analysis of listener responses showed that the brightness control was perceptually independent from the other controls used to adjust the morphed sound. A Timbre Morpher, adjusting additional timbral attributes with perceptually meaningful controls, can now be considered for further work.
5.1 Radio - Too Much Too Soon?
5.1 Multi Channel Audio is arguably the next natural progression for radio. Although Digital Radio offered an incremental improvement over stereo for listeners, there has not been a fundamental change in radio since the migration from AM to FM. In order for radio to survive in a highly competitive environment, with an audience that is increasingly judgmental with regards to delivery mediums and content, then, as an industry, radio needs to embrace 5.1. This paper explores the principles that are vital to the success of Multi Channel Audio for Radio, the enabling technology and outlines various projects.
Wide Listening Area with Exceptional Spatial Sound Quality of a 22.2 Multichannel Sound System
Hamasaki, Kimio; Nakayama, Yasushige; Nishiguchi, Toshiyuki; Okumura, Reiko
While issues regarding the sweet spot in 5.1 surround sound have been discussed, a 22.2 multichannel sound system has been developed for ultrahigh-definition TV. One of its features is expansion of the listening area with exceptional sound quality. Although the wideness of listening area was reported in previous papers, its evaluation was performed using only sound clips of a symphony orchestra without a picture. Therefore, subjective evaluations were performed for comparing the impression of various spatial attributes at different listening positions using contents with pictures in both large and small rooms. These evaluations demonstrate that viewers have better impressions of various spatial attributes in a wider listening area with the 22.2 multichannel sound system than with other sound systems.
Up-Mixing and Localisation - Localisation Performance of Up-Mixed Consumer Multichannel Formats
Chaffey, Richard; Shirley, Ben G.
A number of listening tests were carried out to assess localisation of sound in derived surround sound fields. Two up-mixed consumer multichannel formats that use matrix decoding of 3/2 multichannel surround channels to increase the surround channel array (Dolby Pro Logic IIx and DTS Neo:6) were compared to original 3/2 multichannel material to determine the degree of spatial performance improvement. Noise bursts were panned to 11 different locations, thirteen subjects participated in the tests and results were analysed to assess any improvement in localisation in each of the assessed surround systems.
Intelligent Editing of Studio Recordings with the Help of Automatic Music Structure Extraction
Fazekas, György; Sandler, Mark
In a complex sound editing project, automatic exploration and labelling of the semantic music structure can be highly beneficial as a creative assistance. This paper describes the development of new tools that allow the engineer to navigate around the recorded project using a hierarchical music segmentation algorithm. Segmentation of musical audio into intelligible sections like: chorus and verses will be discussed followed by a short overview of the novel segmentation approach by timbre-based music representation. Popular sound-editing platforms were investigated to find an optimal way of implementing the necessary features. The integration of music segmentation and the development of a new navigation toolbar in Audacity, an open-source multi-track editor will be described in more detail.
Constant Complexity Reverberation for Any Reverberation Time
May, Tobias; Schobben, Daniël
A new artificial reverberation system is proposed which is based on perceptually relevant components in reverberated audio and as such allows for a very efficient implementation. The system first separates the signal into transient and steady-state components. The transient signal is reverberated by using an efficient time-varying recursive filter while the steady-state signal is processed separately with an all-pass filter. In contrast to common reverberation systems, the complexity of the recursive filter is determined solely by the duration of the transients and is therefore independent of the reverberation time.
Outdoor and Indoor Recording for Motion Picture. A Comparative Approach on Microphone Techniques.
Goussios, Christos A.; Kalliris, George M.; Sevastiadis, Christos V.
Several recording techniques and equipment are used in outdoor and indoor recordings for Motion Picture. The choices are usually characterized from subjectivity and technical limitations irrelevant to the desired final sound quality. Our goal is to present results of comparative recordings in order to give answers to every-day-practice problems that arise. Overhead and underneath booming and the use of wireless microphones are compared through time and frequency analysis.
Semi-Automatic Mono to Stereo Up-Mixing Using Sound Source Formation
Lagrange, Mathieu; Martins, Luis Gustavo; Tzanetakis, George
In this paper, we propose an original method to include spatial panning information when converting monophonic recordings to stereophonic ones. Sound sources are first identified using perceptively motivated clustering of spectral components. Correlations between these individual sources are then identified to build a middle level representation of the analysed sound. This allows the user to define panning information for major sound sources thus enhancing the stereophonic immersion quality of the resulting sound.
Verbal Elicitation and Scale Construction for Evaluating Perceptual Differences between Four Multichannel Microphone Techniques
Kim, Sungyoung; Martens, William L.
A verbal elicitation task using triadic comparisons was completed by eight listeners to explore the adjectives that describe the audible differences between solo piano performances captured using four different multichannel microphone techniques. Although the elicited terms differed somewhat between listeners, a set of five bipolar adjective pairs were found to represent the most salient differences between the auditory imagery associated with multichannel-loudspeaker reproduction of the piano performances. These adjectives were used as the anchors for five attribute rating scales on which the same eight listeners rated each of the 32 stimuli that had been presented for triadic comparison. Stepwise multiple regression showed that ratings on three of the five attributes successfully predicted those listeners' preference ratings for the same stimuli.
Broadcast Loudness: Mixing, Monitoring and Control
In the satellite broadcast era, to manage transmission from a digital platform that broadcasts dozens of channels and to guarantee loudness consistency to viewers has became a primary, but yet difficult task. In this paper the author describes his work in balancing the broadcasted loudness of the Italian satellite TV platform "SKY Italia". In fact, for a few years, its transmissions suffered very audible loudness inconsistency, due to several factors, such as numbers of channels, various content offer, different mastering levels in between programs and interstitials, outsourcing productions from many external facilities, etc. The project lasted for over one year and the results are a much improved audio quality and a more balanced loudness consistency throughout all the channels involved.
Sound Levels of TV Advertisements Relative to the Adjacent Programs and Cross-National Comparison of the Way of their Insertion into Programs
Kimura, Akiko; Miyasaka, Eiichi
Sound levels of TV Advertisements (CMs) were measured relative to those of the adjacent programs because they could not exist individually without adjacent programs. The results show that the averaged sound levels of about 20% of the CMs were over -5 dB with small dynamic ranges less than 10 dB. 57% and 84% of CMs in terrestrial and BS digital broadcasting, respectively, had higher sound levels than the main programs. Next, perceptual experiments were performed for 3 types of TV CMs with different sound levels, resulting in that all CMs sounded louder than a reference speech irrespective of large sound level differences. Finally, comparison of a way of insertion of CMs into programs among Japan, UK and US was performed.
Influence of Interaction on Perceived Quality in Audio Visual Applications: Subjective Assessment with n-Back Working Memory Task, II
Reiter, Ulrich; Weitzel, Mandy
The mechanisms of human audio visual perception are not fully understood yet. For interactive audio visual applications running on devices with limited computational power it is desirable to know which of the stimuli to be rendered in an audio visual room simulation have the greatest impact upon the perceived quality of the system. We have conducted experiments to determine the effect of interaction upon the precision with which test subjects are able to discriminate between different parameter values of auditory attributes. This paper details one of these experiments and compares different approaches for the analysis of the obtained data. The results show a noticeable bias towards faulty ratings during the involvement in a task, although the analyzes using significance tests do not completely confirm this effect.
On the Audibility of Comb Filter Distortions
Brunner, Stefan; Maempel, Hans-Joachim; Weinzierl, Stefan
Superpositions of delayed and undelayed versions of the same signal can occur at different stages of the audio transmission chain. Often it is a result of multiple microphone signals, sound reflections on walls or latencies in digital signal processing leading to comb-filter-shaped, linear distortions. The measurement of a hearing threshold for this type of distortion with its dependance on reflection delay, relative level and the type of audio content can be the basis for boundaries in everyday recording practice below which undesired timbral distortions can be neglected. Therefore, listening tests were conducted to determine the just noticeable difference for three stimulus categories and different time delays between direct and delayed signal from 0.1 ms to 15 ms, equivalent to 0.03 - 5.15 m of sound path difference, and to reveal the underlying sound qualities. The results show that comb-filter distortions can still be audible if the level of the first reflection is more than 20 dB lower than the level of the direct sound.
Virtualphone - A Rapid Virtual Audio Prototyping Environment
As the complexity of mobile phones increases with the evolution of digital convergence, there is increased demand to ensure high audio quality for all applications. VirtualPhone is a graphical user interface based software environment allowing for the rapid prototyping of mobile phone audio and its subsequent calibrated auralisation. This paper describes the framework of the VirtualPhone application, illustrates its usage and performance compared to other conventional prototyping schemes.
Evaluation of HE-AAC, AC-3, and E-AC-3 Codecs
Gaston, Leslie; Sanders, Richard
The Recording Arts Program at the University of Colorado at Denver and Health Sciences Center (UCDHSC) performed an independent evaluation of three audio codecs: Dolby Digital (AC-3 at 384 kbps), Advanced Audio Coding Plus (HE-AAC at 160 kbps), and Dolby Digital Plus (E-AC-3 at 224 and 200 kbps). UCDHSC performed double-blind listening tests during the summer of 2006, which adhered to the standards of ITU-R BS.1116 (which provides guidelines for multi-channel critical listening tests). The results of this test illustrate a clear delineation between the AC-3 codec and the others tested. We will present the test procedures and findings in this paper.
Perceptual Evaluation of Mobile Multimedia Loudspeakers
An experiment was conducted to compare the perceptual characteristics of stereo loudspeaker systems found in mobile multimedia devices. A descriptive sensory analysis was performed using an individual vocabulary development approach. Ten systems and five musical programs were selected for this study. Sixteen listeners developed their own set of attributes in three hours and performed a comparative evaluation of the ten systems for several program items using the attribute scales they developed. A total of 111 attributes was generated in this experiment, which could be divided into several perceptual groups relating to loudness, spatial and timbral aspects, sound disturbance and sound articulation. The principle of this individual vocabulary profiling method is described and some results of the perceptual evaluation are presented.
A Rapid Listening Test Environment - Helping Managers Make Better Decisions
As the complexity of mobile phones increases with the evolution of digital convergence, there is increased demand to ensure high audio quality for all applications. This paper presents a set of software applications that allow for the rapid definition, administration, analysis and reporting of listening tests without extensive technical knowledge of the field. A through description of the concepts behind the client / server architecture is presented followed by some example applications. Lastly, a performance comparison of listening tests performed using more traditional methods versus the presented method is made.
Ebu Tests of Multi-Channel Audio Codecs
Kozamernik, Franc; Marston, David; Mason, Andrew; Stoll, Gerhard
The latest project of one of the European Broadcasting Union technical groups has been the assessment of the sound quality of multichannel audio bit rate reduction codecs for broadcast applications. Codecs from Dolby, DTS, implementations of MPEG AAC and of the new MPEG Surround codec were tested. The bit rates ranged from 64 kbit/s to 1.5 Mbit/s. The choice of test method, selection of test material, and analysis of results, are described. Consistently high quality at a relatively low bit rate of, say, 256 kbit/s, has not yet been achieved, and some material still demands at least 448 kbit/s. It was also observed that later developments of some codecs perform worse than earlier versions.
The Design and Analysis of First Order Ambisonic Decoders for the ITU Layout
Moore, David; Wakefield, Jonathan
Ambisonic decoders for irregular layouts can be designed using heuristic search algorithms. These methods provide an alternative to solving complex mathematical equations. New fitness function objectives for search algorithms are presented which ensure derived decoders meet the requirements of the Ambisonic system more closely than previous work. The resulting new decoder coefficients are compared to other published coefficients and a detailed performance analysis of first order decoders for the ITU layout is given. This analysis highlights common poor performance characteristics that these decoders hold. Proposed future work will attempt to address these issues by looking at techniques for producing decoders with a more even error distribution around the listener and investigating methods for removing the bias towards meeting certain objectives.
A New Digital Module for Variable Acoustics and Wave Field Synthesis: Design and Applications
de Vries, Diemer; van den Heuvel, At; van Dorp Schuitman, Jasper; de Vries, Diemer
A new digital module has been developed that creates variable acoustics for multi-purpose halls according to the Acoustic Control Systems (ACS) concept. Besides, it is capable to generate wave fields according to the Wave Field Synthesis (WFS) concept. The design concepts and criteria, the technical specifications and some first applications of the module will be explained and discussed during the presentation.
Artificial Reverberator with Location Control in Multi-Channel Recording
Seo, Jeong-Hun; Shim, Hwan; Sung, Koeng-Mo
In this paper, a novel artificial reverberator is proposed. Comparing with conventional algorithms focused to append appropriate timber and reverberance, proposed algorithm is designed to produce realistic reverberation by controlling each location of sound sources. The new algorithm proposed in this paper controls perceived direction by panning the direct sound, and controls perceived distance by adjusting energy decay curve of reverberation which is obtained by location-clustering method and gain of the direct sound. In addition, the algorithm enhances Listener Envelopment (LEV) to make late reverberation incoherent among channels.
Spatial Audio Rendering using Sparse and Distributed Arrays
de Bruijn, Werner; Härmä, Aki; van de Par, Steven
A widely distributed but multi-channel audio reproduction system can be used to create dynamic spatial effects for various entertainment and communication applications. In this paper we focus on the follow-me audio effect where the sound source appears moving with the observer who is walking through a hallway or going from one room to another in the home environment. We give an overview of the array theory for the sparse distributed loudspeaker systems, study the binaural properties of the sound field rendered with a sparse line array. Finally, compare two different dynamic rendering techniques in a new type of a listening test.
Magic Arrays – Multichannel Microphone Array Design Applied to Microphone Arrays Generating Interformat Compatability
This paper describes the principles and design procedure of Multi-format-compatible Microphone Arrays for a range of different segment coverage angles and for Omnidirectional, Hypocardioid, Cardioid and Supercardioid microphones. At present the only practical solution available for the main microphone array for a multiple format recording is to use different microphone arrays for each of the required formats. This paper shows how this jungle of main microphone arrays can be replaced by a single 5 channel microphone array that will supply signals that are directly compatible with 5 standard formats : mono, two and three channel “stereo”, four channel “quadraphony”, and “multichannel” with the full 5 channels. The specific reproduction format can be chosen either during the production process as a function of the desired support media, or by the consumer from a multichannel media product according to their own particular listening configuration.
The Relation between Active Radiating Factor and Frequency Responses of Loudspeaker Line Arrays
An, Kang; Ou, Dayi; Shen, Yong
Active Radiating Factor (ARF) is an important parameter to analyze the loudspeaker line array when considering the gaps between each two radiating transducers. The relation between ARF of the loudspeaker line array and the Differential chart of its Frequency Responses in two distances (FRD) is analyzed. Some valuable conclusions about ARF and FRD are found. A method to estimate ARF by measuring frequency responses comes out.
Alternative Encoding Techniques for Digital Loudspeaker Arrays
Kontomichos, Fotios; Mourjopoulos, John; Tatlas, Nicolas-Alexander
Recent developments in Digital Loudspeakers have resulted in the introduction of Digital Transducer Arrays (DTA). In most implementations, DTA loudspeakers are driven by PCM encoded audio signals, usually resampled and requantised to an appropriate number of bits, in accordance to the number of the transducers constituting the DTA topology. However, given that DTAs generally increase harmonic distortion, especially for off-axis listening positions, optimization in signal encoding and bit-to-transducer assignment, is necessary. Here, a number of novel, alternative strategies are examined, concerning the input signal encoding via PCM-to-PWM conversion, as well as techniques for bit-assignment on the transducers of a DTA. These tests are supported by simulation results and comparisons between these alternative methods, for different operating parameters.
Online Identification of Linear Loudspeakers Parameters
Pedersen, Bo Rohde; Rubak, Per
Feed forward nonlinear error correction of loudspeakers can improve sound quality. For creating a realistic feed forward strategy identification of the loudspeaker parameters is needed. The strategy of the compensator is that the nonlinear behaviour of the loudspeakers has relatively small drift and only the linear loudspeaker parameters must be identified. This can in music systems be done with online transducer-less system identification using the voice coil current as feedback from the loudspeaker (plant). This is investigated in a simulation study for finding useful system identification algorithms. Two different identification techniques (ARMA and FIR) are compared. The stability of the nonlinearities and linear loudspeaker parameters are tested in a measurement series.
Finite Element Analysis of Near Field Beam Forming in Safety Relevant Work Spaces
Beigelbeck, Roman; Pichler, Heinrich
Due to unique features, loudspeaker arrays are an interesting alternative to standard loudspeaker setups or headphone-based solutions in security relevant workspaces such as air traffic control rooms. Consequentially, near field beam forming in small spaces plays an important role for this field of application. In this contribution, the sound design based on a set of loudspeaker arrays featuring their interaction with a typical air traffic control room infrastructure is investigated by means of finite element modelling. Guided by these results, optimized array parameters can be determined. Representative three-dimensional near field directional diagrams in front of the arrays are shown to visualize the sound field in different cases. Finally, these theoretical values are compared with practical results.
Creating Directed Microphones from Undirected Microphones
Milanov, Emil; Milanova, Elena
In this article we examine the possibility to create directed microphones from undirected microphones. The result is achieved only by using acoustical elements and is valid for all types of microphones, regardless of their way of work (electro-dynamical, condenser, optical, electro-mechanical etc.). By using alternations of the membrane usage, a force is obtained, which is equivalent to the effect of simultaneously operating undirected and bi-directed (eight) microphones. The result is a microphone with a space characteristic equivalent to the Pascal curve (i.e. directed microphone with the traditional shapes of the space characteristic - cardioid, super cardioid and hyper cardioid). The shape of the space characteristic curve is near to theoretical and does not depend on the acoustic elements of the microphone. The microphone is directed, but does not have a proximity effect.
An Improved Electrical Equivalent Circuit Model for Dynamic Moving Coil Transducers
Struck, Christopher J.; Thorborg, Knud; Unruh, Andrew D.
A series combination of inductor and resistor is traditionally used to model the blocked electrical impedance of a dynamic moving coil transducer, such as a loudspeaker driver. In practice, semi-inductive behaviour due to eddy currents and ‘skin effect’ in the pole structure as well as transformer coupling between the voice coil and pole piece can be observed, but are not well represented by this simple model. An improved model using only a few additional elements is introduced to overcome these limitations. This improved model is easily incorporated into existing equivalent circuit models. The development of the model is explained and its use is demonstrated. Examples yielding more accurate box response simulations are also shown.
Transducer with the Direct D/A Conversion Optoacoustic Principle
Transducers with the direct D/A conversion, sometimes called digital transducers, either loudspeakers or earphones, are searching their ways into being. There have been several attempts to design such devices, but none of them left research laboratories and made its way to commercial use yet. Most of them use „classical“ electroacoustic transduction principles, i.e. electrodynamic or electrostatic. In this article the possibility to use opto-acoustic transduction principle is explored. First, the principles of physical phenomena used in this transducer are revised. Then, some construction details in light of their usage in the digital earphone are described.
Demystifying the Measurement of Impulse Response in Condenser Microphones – Part I
Good impulse response is an important reason for preferring condenser microphones in audio applications that require high quality. However, it is difficult to characterize the impulse response of a microphone precisely. We cannot create an acoustic impulse that approximates the Dirac delta function closely enough that a microphone will emit only its own impulse response. Electrical spark discharges, pistol shots and pressure-step methods all approximate the Dirac distribution, but due to their limitations one must still deconvolve the impulse responses of the excitation signal and that of the microphone itself. Since every known method for performing such deconvolution has further pitfalls of its own, a novel time-domain method of deconvolution is introduced.
Towards Multimodal Interfaces for Intrusion Detection
García-Ruiz, Miguel Á.; Kapralos, Bill; Vargas Martin, Miguel
Network intrusion detection has been generally dealt with sophisticated software and statistical analysis, although sometimes it has to be done by administrators, either by detecting the intruders in real time or by revising network logs, making this a tedious and time-consuming labor. To support this, intrusion detection analysis has been carried out using visual, auditory or tactile sensory information in computer interfaces. However, little is known about how to integrate best the sensory channels for analyzing intrusion detection. We propose a multimodal human-computer interface to analyze malicious attacks during forensic examination of network logs. We describe a sonification prototype, which generates different sounds according to a number of "suspicious" network activities.
Steganographic Approach to Copyright Protection of Audio
Steganography is the technique of hiding data in images and music. It is one of the powerful mechanisms by which useful copyright information is hidden in the audio. In this paper, we propose the use of steganography and public key cryptography to store the copyright information and authenticate the original audio. A tool called Steger is being developed which automatically determines the original copyright holders of the audio content. This tool is useful in Digital Rights Management (DRM) enabling of end-user systems such as PDAs, Mobile Phones, PCs, Handheld devices, consumer electronics etc.
Headphones Technology for Surround Sound Monitoring – A Virtual 5.1 Listening Room
Gebhardt, Mario; Kuhn, Clemens; Pellegrini, Renato
This poster presents a headphones technology for professional surround monitoring with virtual 5.1 reproduction. Using perceptually motivated binaural signal processing and ultra sonic head tracking, this system enables the simulation of a loudspeaker set-up with correct localisation and room impression. As a professional recording and mixing tool it provides the advantages of a portable headphones solution but avoids the known drawbacks such as inside-head localisation, limited room perception and turning of the sonic image with the listener’s head. The combination of three technologies – binaural reproduction, room simulation and head tracking – enables the reproduction of a virtual reference listening room with clear localisation and distance perception for applications in studios, recording trucks and mobile recording set-ups.
Hybrid Sound Field Processing for Wave Field Synthesis System
Chung, Hyunjoo; Lim, JunSeok; Shim, Hwan; Sung, Koeng-Mo; Yoo, Jae Hyoun
Using the wave field synthesis (WFS) method, the sound of primary source was reproduced by plane waves. Although having some shortcomings such as spatial aliasing, these plane waves enlarged the sweet spot of the listening area and decreased the localization error of the sound source. Also, we suggested a grouped reflections algorithm (GRA) for reproducing early reflections. This sequence of early reflections increased the spaciousness of the listening room environment. The result, obtained by applying this method, was implemented by linear arrays of 32 loudspeakers constructed in an anechoic room. For backward compatibility with standard five channel surround titles, a new hybrid sound field processing algorithm using WFS and GRA method was implemented.
Reproduction of Virtual Reality with Multichannel Microphone Techniques
Hiekkanen, Timo; Lempiäinen, Tero; Mattila, Martti; Pulkki, Ville; Veijanen, Ville
The perceptual differences between virtual reality and its reproduction with different simulated multichannel microphone techniques were measured using listening tests. The virtual reality was generated using image-source method, and 16 loudspeakers in a 3-D arrangement in an anechoic chamber. Two spaced, and two coincident microphone techniques were tested, namely Fukada Tree, Decca Tree, 1st order Ambisonics and 2nd order Ambisonics. The spaced techniques utilized 5.0 setup, and Ambisonics techniques utilized the quadraphonic setup. The perceptual difference was measured with ITU impairment scale.
Refinements of Transmission Line Loudspeaker Models
Simple waveguide models of loudspeaker enclosures describe well enclosures with simple interior geometry, but their accuracy is limited if used with more complex internal structures. A refinement of transmission line loudspeaker model is discussed, presenting one-dimensional waveguide approximations for bends and corners. Bends and corners are represented as area changes in the line, approximated as one-dimensional line segments with parameters adjusted to match the exact solutions for sharp (rectangular) corners in a waveguide. Besides modeling the paper discusses the sound transmission characteristics of commonly used bend types and the applicability of the results to folded horns.
The Use of Negative Source Impedance with Moving Coil Loudspeaker Drive Units: A Review and Analysis
Turner, Michael J.; Wilson, David A.
The effect of negative source impedance on the frequency response and pole-zero pattern of a moving coil loudspeaker drive unit is explored from first principles, and closed form expressions for the transfer function and system poles are developed. Direct control of motor velocity via the substantial cancellation of voice coil impedance is discussed. Implementation using positive current feedback is analyzed, considering loop gain, damping and stability from a control theory perspective. Pole placement techniques are shown to be effective in controlling theoretical system behavior at high frequencies. Modeled and measured results are presented. A selection of previous papers and applications concerned with operation of loudspeakers from negative source impedances is briefly reviewed. Practical issues and some possible applications are discussed.
Effects of Acoustic Damping on Current-Driven Loudspeakers
Bortoni, Rosalfonso; Filho, Sidnei Noceti; Silva, Homero Sette
Previous works show the benefits of exciting loudspeakers with current sources, but they do not present a study showing the behavior of this technique when acoustic damping is applied to diverse types of loudspeakers. This work presents theoretical and practical analysis of the frequency response of acoustically damped current-driven loudspeakers installed in closed box, vented box and 4th and 6th orders band-pass systems. Also, it presents a subjective analysis comparing a closed box system excited by voltage and current sources.
Development of a Highly Directive Endfire Loudspeaker Array
Boone, Marinus M.; Cho, Wan-Ho; Ih, Jeong-Guon
Control of the directivity of loudspeaker systems is important in applications of sound reproduction with public address systems. The use of loudspeaker arrays shows great advantages to bundle the sound in specific directions. Usually, the loudspeakers are placed on a vertical line and the directivity is mainly in a plane perpendicular to that line although the radiation direction can be adapted with filter techniques, called beamforming. In this paper we present results on the applicability of a loudspeaker line array where the main directivity is in the direction of that line, using so-called endfire beamforming, resulting in a “spotlight” of sound in a preferred direction. Optimized beamforming techniques were used, which were developed for the reciprocal problem of directional microphone arrays. Effects of the design parameters of the loudspeaker array system were investigated and we found that the stability factor can be a useful parameter to control the directional characteristics. A prototype constant beam width array system was tested by simulation and measurement and the results supported our findings.
Mass Nonlinearity and Intrinsic Friction of the Loudspeaker Membrane
Djurek, Danijel; Djurek, Ivan; Petošic, Antonio
Vibration of loudspeaker’s membrane was analyzed in the regime of comparatively low driving currents (I < 120 mA) and in terms of mass nonlinearity Meff and intrinsic friction RM. Later contributes to damping term of the differential equation of motion and depends on the elongation of vibration. Independent measurements of flexural strength of the membrane were performed and correlated to experimental observations of the vibrating system. Experiments were also performed with membrane additionally reinforced by application of materials with higher Young module.
Modelling of an Electrodynamic Loudspeaker with Runge-Kutta ODE Solver
Djurek, Danijel; Djurek, Ivan; Petošic, Antonio
In this paper the modelling of low frequency electrodynamic loudspeaker is presented as one degree of freedom nonlinear damped oscillator described by an ordinary differential equation of motion. The model has been compared to equivalent LRC circuit model and it was shown that differential equation approach is more suitable for calculations which include nonlinearities occurring in an electrodynamic loudspeaker, as well as couplings of different vibration modes, particularly those coming from vibrating air and loudspeaker itself. The nonlinear differential equation of periodically driven anharmonic oscillator was solved numerically, and calculated electric impedance compared to the experimental data. Calculations included different working regimes of the loudspeaker being operated in an evacuated space and air.
Chaotic State in an Electrodynamic Loudspeaker
Djurek, Danijel; Djurek, Ivan; Petošic, Antonio
An electrodynamic loudspeaker has been driven in comparatively strong nonlinear conditions. For driving AC currents up to 1 A the vibration spectrum contains high frequency harmonics of classic von Kármán type, for currents in the range 3.4 to 3.6 A doubling of driving period appears, for currents in the range from 3.6 to 4 A multiple sequences of subharmonic vibrations appear beginning with 1/4f and 3/4f, while an application of currents larger than 4 A results in white noise spectrum, characteristic for chaotic state.
A Biologically-Inspired Low-Bit-Rate Universal Audio Coder
Najaf-Zadeh, Hossein; Pichevar, Ramin; Thibault, Louis
We propose a new biologically-inspired paradigm for universal audio coding based on neural spikes. Our approach is based on the generation of sparse 2-D representations of audio signals, dubbed as spikegrams. The spikegrams are generated by projecting the signal onto a set of overcomplete adaptive gammachirp (gammatone with additional tuning parameters) kernels. A masking model is applied to the spikegrams to remove inaudible spikes and to increase the coding efficiency. The paradigm is a first step towards the implementation of a high-quality audio encoder by further processing acoustical events generated in the spikegrams. Upon necessary optimization, our coding system operating at 1 bit/sample for sound sampled at 44.1kHz, is expected to deliver high quality audio for broadcasting, archiving, etc.
The Relationship between Basic Audio Quality and Selected Artefacts in Perceptual Audio Codecs - Part II: Validation Experiment
Marins, Paulo; Rumsey, Francis; Zielinski, Slawomir K.
A pilot study  was conducted to investigate the perceptual importance of selected audio coding artefacts and their relationship with basic audio quality. An additional experiment was undertaken to validate the results obtained in the pilot calibration experiment. A listening test was designed which required a panel of expert subjects to evaluate the selected artefacts used in the initial study. In this second experiment however, certain experimental parameters were modified; these included different levels of degradation and programme material. The outcomes of the validation experiment are presented in this paper along with a detailed evaluation of the impact of the chosen experimental artefacts on basic audio quality assessments for perceptual audio codecs.
New Enhancements to Immersive Sound Field Rendition (ISR) System
Annadana, Raghuram; Dubey, Chandresh; Ferreira, Anibal J. S.; Sinha, Deepen
Consumer audio applications such as satellite radio broadcasts, multi-channel audio streaming and playback systems coupled with the need to meet stringent bandwidth requirements are eliciting newer challenges in parametric multi-channel audio coding schemes. This paper describes the continuation of our research concerning the Immersive Soundfield Rendition (ISR) system and the different enhancements in various algorithmic components. The need to maintain a constant bit rate for many applications requires a rate control mechanism. The various strategies utilized in the rate control mechanism are presented. In addition, an innovative phase compensated down-mixing scheme has been incorporated in the ISR system so as to generate a high quality carrier signal. Enhancements have been made to the blind up-mixing scheme and to considerable gains have been made in terms of acoustic diversity. The various enhancements of the ISR system and its performance are detailed. Audio demonstrations are available at http://www.atc-labs.com/isr.
Aspects of Scalable Audio Coding
Banded weight data is transmitted as side information within coded audio bitstreams in order to achieve psychoacoustically-appropriate shaping of quantisation noise. Methods of reducing the information overhead corresponding to weight data are discussed in the context of scalable bitplane coding. Two approaches to coding band weights are compared in terms of coding efficiency and error resilience. In the first, weights are coded as a block of data at the beginning of each frame, using a predictor and Golomb coding of weight prediction residuals to achieve high coding efficiency. This approach is compared to coding weights for bands as they become significant, with weight data distributed across each coded bitstream frame.
Source-Controlled Variable Bit Rate Extension for the AMR-WB+ Audio Codec
Lefebvre, Roch; Marty, Amélie
This paper presents a source-controlled, variable bit rate extension to the AMR-WB+ standard audio codec. AMR-WB+ allows multi-rate operation and in particular rate switching at every frame. However, the standard does not support source-controlled rate determination since it does not include a signal classifier. The proposed extension includes a signal classifier and rate mapping function for each signal class. Classification is performed at a lower frame rate compared to AMR-WB+, with typically 1 classification decision every second. Significant rate savings can be achieved by encoding speech at lower rates than other signals such as music. Applications include audio broadcasting over packet networks and storage of multimedia signals with mixed signals in the audio track.
Multiple Description Coding for Audio Transmission Using Conjugate Vector Quantization
Cherkaoui, Soumaya; Kwong, Mylene; Lefebvre, Roch
This paper explores robustness issues for real-time audio transmission over perturbed networks where multiple paths can be considered. Conjugate Vector Quantization (CVQ), a form of Multiple Description Coding, can improve the resilience to packet losses. This work presents a generalized CVQ structure, where K>2 different conjugate codebooks are trained to create the best resulting codebook. Experiments show that 4-description CVQ performs very closely to unconstrained VQ in clear channel conditions, while providing significant improvements in lossy channels. We also present a fast search algorithm which allows tradeoffs between computational complexity and memory storage at the encoder. This robust quantization scheme can encode sensitive information such as spectral coefficients in a speech coder or a perceptual audio coder.
MPEG Surround – the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding
Breebaart, Jeroen; Chong, Kok Seng; Disch, Sascha; Faller, Christof; Herre, Jürgen; Hilpert, Johannes; Kjörling, Kristofer; Koppens, Jeroen; Linzmeier, Karsten; Oomen, Werner; Purnhagen, Heiko; Rödén, Jonas
In 2004, the ISO/MPEG Audio standardization group started a new work item on efficient and backward compatible coding of high-quality multi-channel sound using parametric coding techniques. Finalized in the fall of 2006, the resulting MPEG Surround specification allows the transmission of surround sound at bitrates that have been commonly used for coding of mono or stereo sound. This paper summarizes the results of the standardization process by describing the underlying ideas and providing an overview of the MPEG Surround technology. The performance of the scheme is characterized by the results of the recent verification tests. These tests include several operation modes as they would be used in typical application scenarios to introduce multi-channel audio into existing audio services.
Adaptive Design of the Preprocessing Stage for Stereo Lossless Audio Compression
Ghido, Florin; Tabus, Ioan
We propose a novel lossless audio compression scheme, which combines stereo preprocessing with stereo prediction. We show that such a scheme provides improved asymmetrical compression at almost no complexity increase for decoder (compared with stereo prediction alone), or the same compression for lower decoder complexity. The stage of stereo prediction is preceded by a rotation-like channel transformation, which improves compression by requiring smaller inter-channel optimal prediction orders and by obtaining smaller magnitudes for prediction coefficients. For the OptimFROG-AS (asymmetric) lossless audio compressor using stereo prediction with orders 8/4 we obtained, on an audio corpus (in CD-Audio format) of size 51.6 GB, compression improvements up to 5.10%, on average 0.23%.
Acoustic Treatment of the Regional Flight Control Center Hall in Zagreb, Croatia
Domitrovic, Hrvoje; Grubesa, Sanja; Horvat, Marko
Acoustic treatment has been realized in the hall of Regional Flight Control Center in Zagreb, Croatia, upon complaints made by the flight control operators working in the mentioned hall. The primary complaint was that the operators could hear each other too well across the hall, due to unwanted reflections, so the main task was to reduce those reflections. The emphasis was also made on reducing the reverberation time of the hall, proven to be too long for the size and intended purpose of the hall, thereby reducing the background noise level in the hall as well.
Investigating Classroom Acoustics by Means of Advanced Reproduction Techniques
Farnetani, Andrea; Fels, Janina; Prodi, Nicola; Smyrnova, Yuliya
A research was undertaken to investigate the loss of Italian language word intelligibility in classrooms caused by low signal to noise ratio and too high reverberation. In the first part of the work, impulse responses and background noises were measured in two primary schools using different mono, binaural and B-format probes. A dummy head with child morphology was also used for the first time in this context. It was thus possible to compare the performance of a child head to the conventional adult one. Then the restitution of the recorded sound fields in a dedicated listening room was accomplished, using stereo dipole and ambisonics technologies.
Perception of Concert Hall Acoustics in Seats Where the Reflected Energy Is Stronger than the Direct Energy
This paper describes a series of experiments into sound perception when the direct/reverberant ratio (d/r) is low. Sound source localization and the perception of being adequately close to the musicians are improved when the direct sound dominates the total reflected energy for about 40ms, during which time the direct sound can be separately perceived. For such a hall the impressions of loudness, clarity, and localization are satisfactory and nearly unchanged over a 6dB range of d/r. As the time period of direct sound dominance decreases, the d/r ratio must be higher for equal subjective clarity. Additionally, the directions of reflections in the range of 40 to 100ms are nearly inaudible.
Relation between Correlation Characteristics of Sound Field and Width of Listening Location
Measurements of the sound field in a number of listening rooms were conducted with the speakers placed in ITU-recommended and arbitrary positions. Correlation analysis was performed to determine the degree of coherence in the sound field measured in various places in the room. Results show that the radius of correlation corresponds to the size of the sweetspot. These conclusions are confirmed by listening tests and recommendations on sound field correlation are proposed.
On the Implementation of a Room Acoustics Modeling Software Using Finite-Differences Time-Domain Method
Escolano, Jose; López, Jose J.; Pueo, Basilio
The Finite-Difference Time-Domain (FDTD) approximation method has been introduced into acoustics in the last years to solve field problems numerically. However, the huge computer power needed to be used in the modeling of large rooms has delayed the launch of commercial applications, being the major part based on ray-tracing. This paper analyzes the viability of a FDTD implementation for this task in today’s personal computers and presents the resulting application. All simulation stages from the architectural model, the generation of the mesh, implementation of the recursion, parallelization and finally the result in the form of impulse response are discussed.
Sound Source Localization and B-Format Enhancement Using Soundfield Microphone Sets
Avdelidis, Kostantinos A.; Dimoulas, Charalampos A.; Kalliris, George M.; Papanikolaou, George V.
The current work focuses on the implementation of sound-field microphone arrays for sound source localization purposes and B-format enhancement. There are many applications where spatial audio information is very important, while reverberant sound-field and ambient noise deteriorate the recording conditions. As examples we may refer to sound recordings during movie production, virtual reality environments, teleconference and distance learning applications using 3D audio capabilities. B-format components, provided from a single soundfield microphone, are adequate to estimate sound source direction of arrival, while the combination of two soundfield microphones allows estimating the exact source location. In addition, the eight (or more) available signal components can be used to apply delay and sum techniques, enabling SNR improvements and virtual positioning of a signal B-format microphone to any desired place. Simplicity, reduced computational load and effectiveness are some of the advantages of the proposed methodology, which is evaluated via software simulations.
Research on Widening the Virtual Listening Space in Automotive Environment
Kim, Lae-Hoon; Seo, Jeong-Hun; Shim, Hwan; Sung, Koeng-Mo
This paper represents the research about a way to widen the virtual space in cars. Generally, the interior of cars contains small volume compared to normal listening environments. And this makes listeners feel a little stuffy. So, the way to widen a virtual space in cars is needed. One of the most important cues for spaciousness is the lateral reflections in accordance with room acoustics, so we will widen virtual space in cars using artificial lateral reflections in automotive environment.
Perceptual Distortion Maps for Room Reverberation
Mourjopoulos, John; Zarouchas, Thomas
From reverberated audio signals and using as reference the input (anechoic) audio, a number of distortion maps are extracted indicating how room reverberation distorts in time-frequency scales, perceived features in the received signal. These maps are simplified to describe the monaural time-frequency / level distortions and the distortion of the spatial cues (i.e., inter-channel cues and coherence) which are very important for sound localization in a reverberant environment. Such maps here are studied as functions of room parameters (size, acoustics, distance, etc), as well as due to input signal properties. Overall perceptual distortion ratings are produced and reverberation-resilient signal features are extracted. These tests are also compared to results of subjective evaluation for the relevant features in the reverberant signals.
A New Structure for Stereo Acoustic Echo Cancellation Based on Binaural Cue Coding
Hur, Yoomi; Park, Young-Cheol; Youn, Dae-Hee
Most stereo teleconferencing systems involve acoustic echo canceller to remove undesired echoes. However, it is difficult for the stereo echo cancellers to converge to the true echo path since the cross-correlation between the stereo input signals is high. To solve the problem, we propose a new structure that is combined with Binaural Cue Coding (BCC). BCC is a method for multi-channel spatial rendering based on one down-mixed audio channel and side information. Based on the BCC, we propose a new single channel adaptive filter for the stereo echo cancellation, which takes the down-mixed monaural signal as the reference. Simulation results confirm that the convergence speed is increased and the misalignment problem is solved. In addition, the proposed structure has better tracking capability.
Efficient Binaural Filtering in QMF Domain for BRIR
Emerit, Marc; Faure, Julien; Guerin, Alexandre; Nicol, Rozenn; Pallone, Gregory; Philippe, Pierrick; Virette, David
The MPEG Surround standard includes two "native" binauralization modules for reproducing 3D audio content over headphones. In this paper, we present a novel and efficient Binaural Room Impulse Response (BRIR) modelling algorithm that extends their possibilities. This model is based on a parametric decomposition of the BRIR and is integrated within the subband domain implementation of the MPEG Surround binaural decoding stage. First we show that for impulse responses with room effects, our approach offers a significant reduction in terms of computational requirements compared to the native methods. Second, we report results from listening tests comparing different tradeoffs between complexity and quality. We show that using our method, the quality gracefully increases when the complexity increases.
A Parametric Model of Head-Related Transfer Functions for Sound Source Localization
Kim, Jungho; Kim, Youngtae; Ko, Sangchul
A simple and effective parametric model of head-related transfer functions is presented for synthesizing binaural sound for practical 3-D sound reproduction systems. The suggested model is based on a simplified time-domain description of the physics of wave propagation and diffraction, and the components of the model have a one-to-one correspondence with the main characteristics of the measured head-related transfer functions such as sound diffraction, delay, and reflection. Their parameters are derived from some sets of the measurements, and thus enable the model to fit significant perceptual impact hidden in head-related transfer functions. Finally simple subjective listening tests verify the perceptual effectiveness of the model. This will show some promise of permitting efficient implementation in real-world applications.
Binaural Response Synthesis From Center-Of-Head Position Measurements for Stereo Applications
Bharitkar, Sunil; Kyriakakis, Chris
In two-channel or stereo applications, such as for televisions, automotive infotainment, and hi-fi systems, the speakers are typically placed substantially close to each other. The sound field generated from such a setup creates an image that is perceived as monophonic while lacking sufficient spatial ``presence'. Due to this limitation, a stereo expansion technique may be utilized to widen the soundstage to give the perception to listener(s) that sound is origination from a wider angle (e.g., +/- 30 degrees relative to the median plane) using head-related-transfer functions (HRTF's). In this paper, we present a method to synthesize responses at a listener¡¦s ears given simply two room-response measurements, where each measurement is obtained between a speaker, in a stereo speaker setup, and an assumed center-of-head position (where the listener is assumed to be seated). The binaural response synthesis approach uses head-shadowing models (for inter-aural intensity differences) and the Woodworth-Schlosberg delay model. This approach is useful when dummy heads are not readily available for HRTF measurements as well as to generalize the approach to reflect measurements that would have been obtained over a large corpus of data (viz., human subjects). We also compare the responses obtained from this approach with measurements done with a dummy-head as well as a few human subjects.
Physical and Filter Pinna Models Based on Anthropometry
Algazi, V. Ralph; Duda, Richard O.; Satarzadeh, Patrick
This paper addresses the fundamental problem of relating the anthropometry of the pinna to the localization cues it creates. The HRTFs for isolated pinnae (which are called PRTFs) are analyzed and modeled for sound sources directly in front of the listener. It is shown that a low-order filter model, with parameters suggested by or derived from pinna anthropometry, provides a good fit to the data. Methods are reported for adjusting the model parameters to fit the PRTF data. It is often possible to estimate the model parameters from a few geometrical measurements of the pinna. However, direct estimation from pinna anthropometry in general remains an unsolved problem, and the nature of this problem is discussed.
A Novel Approach to Robotic Monaural Sound Localization
Bou Saleh, Abdallah; Diepold, Klaus; Keyrouz, Fakheredine
The paper presents a novel monaural 3D sound localization technique that robustly estimates a sound source within a 2.5° azimuth deviation and a 5° elevation deviation. The proposed system, an upgrade of monaural-based localization techniques, uses two microphones, one inserted within the ear canal of a humanoid head equipped with an artificial ear, and the second is held outside the ear, 5 cm away from the inner microphone. The outer microphone is small enough so that minimal reflections that might contribute to localization errors are introduced. The system exploits the spectral information of the signals from the two microphones in such a way that a simple correlation mechanism, using a generic set of Head Related Transfer Functions (HRTFs), is used to localize the sound sources. The low computational requirement provides a basis for robotic real-time applications. The technique was tested through extensive simulations of a noisy reverberant room. The simulation results demonstrated the capability of the monaural system to locate with high accuracy sound sources in a three-dimensional environment even in presence of strong noise and distortion.
Optimized Binaural Modeling for Immersive Audio Applications
Floros, Andreas; Tsakostas, Christos
Recent developments related to immersive audio systems are mainly originating from binaural audio processing technology. In this work, a novel high-quality binaural modeling engine is presented suitable for supporting a wide range of applications in the area of virtual reality, mobile playback and computer games. Based on a set of optimized algorithms for Head-Related Transfer Functions (HRTF) equalization, acoustic environment modeling and cross-talk cancellation, it is shown that the proposed binaural engine can achieve significantly improved authenticity for 3D audio representation in real-time. A complete binaural synthesis application is also presented that demonstrates the efficiency of the proposed algorithms.
Head-Related Transfer Function Calculation Using Boundary Element Method
Dobrucki, Andrzej; Plaskota, Przemyslaw
Measuring the head-related transfer function (HRTF) is an efficient method in taking the influence of human body on sound spectrum into consideration. The data base used in reproduction of the sound source position is built using the measurement results. The base is individual for each human, which makes it impossible to make a versatile base for all listeners. In this paper a numerical model of artificial head is presented. The model allows to determine the value of HRTF without the measurements making. The model includes both geometrical and acoustical parameters. A method which is often used to determine the acoustical field parameters is the boundary element method, which was used to calculate the values of HRTF in this work.
Applications of the Acoustic Centre
This paper focuses on uses for the acoustic centre concept, which in this paper represents a particular point for a transducer that acts as the origin of its low-frequency radiation or reception. The concept, although new to loudspeakers, has long been employed for microphones when accurate acoustic pressure calibration is required. A theoretical justification of the concept is presented and several calculation methods are discussed. We first apply the concept to subwoofers, for which the acoustic centre is essentially a cabinet dimension away from the centre of the cabinet. This has an influence on its radiation pattern in a normal room with reflecting walls. A second application that we consider is the effective position of a microphone, which is necessary if it is to be used for accurate calibration of acoustic pressure. A final application that we consider is the effective position of the ears on the head at lower frequencies. Calculations show that the acoustic centres of the ears are well away from the head, and the effective ear separation is larger than expected. This has implications for the human localization mechanism. Measurements on a Kemar mannequin show that the separation is even larger than expected from the calculations, and most of this can be understood, but the measurements at the lowest frequencies are somewhat uncertain.
Development of a Finite Element Headphone Model
Biba, Dominik; Opitz, Martin
For the development of high end headphones numerical simulation of headphone acoustics is an efficient tool. While discrete element models give good results for frequencies up to a few kilohertz finite element models are valid for higher frequencies too. A headphone model using finite elements and boundary elements was built in three phases both numerically and as a real world sample. The validity of the model was verified by comparison of both the radiated sound field and the membrane modal behaviour using a scanning laser vibrometer. Excellent agreement between measured and computed amplitude frequency responses was achieved.
Development of Camera Mountable 5.0 Surround Microphone and Method of 3Ch to 5Ch Signal Re-Composing System
Ishii, Takeshi; Kawashima, Takako; Kikkawa, Satoshi; Kobayashi, Minoru; Komiyama, Setsu
This paper describes the development of small 5.0 surround microphone and 3ch to 5ch signal re-composing system. The principal reason we started this development was broadcasters’ recent demand for a small, light weight cameramounted 5.0 surround microphone for documentary, drama and sports shooting outdoors. We have produced a 5.0 surround microphone but if we want to use it as a camera mounted microphone, there is a limitation we couldn’t ignore. The limitation is the number of audio tracks. The commonly available HD cameras in the market have only 4 audio tracks. In order to overcome the limitation, we developed a 3ch to 5ch signal recomposing system.
Noise-Robust Recognition System Making Use of Body-Conducted Speech Microphone
Ishimitsu, Shunsuke; Nakayama, Masashi; Yanagawa, Hirofumi; Yoshimi, Toshikazu
In recent years, speech recognition systems have been introduced in a wide variety of environments such as vehicle instrumentation. In such a low SNR environment, a noise signal can be misjudged as speech, dramatically decreasing the recognition rate. Therefore, this study focuses on a recognition system that uses body-conducted signals. Since noise is not introduced within body-conducted signals that are conducted in solids, even within sites such as engine rooms which are low SNR environments, construction of a system with a high recognition rate can be expected.
Revisiting Proximity Effect Using Broadband Signals
Elliq, Mohammed; Lambert, Dominique; Lopes, Manuel; Millot, Laurent; Pelé, Gérard
Experiments studying mainly proximity effect are presented. Pink noise and music were used as stimuli and a combo guitar amplifier as source to test several microphones: omnidirectional and directional. We plot in-axis levels and spectral balances as functions of x, the distance to the source. Proximity effect was found for omnidirectional microphones. In-axis level curves show that 1/x law seems poorly valid. Spectral balance evolutions depend on microphones and moreover on stimuli: bigger decreases of low frequencies with pink noise; larger increases of other frequencies with music. For a naked loudspeaker, we found similar in-axis level curves under and above the cut-off frequency and propose an explanation. Listening equalized music recordings will help to demonstrate proximity effect for tested microphones.
Anechoic Measurements of Particle-Velocity Probes Compared to Pressure Gradient and Pressure Microphones
de Bree, Hans-Elias; Iwaki, Masakazu; Ono, Kazuho; Sugimoto, Takehiro; Woszczyk, Wieslaw
A number of anechoic measurements of Microflown™ particle velocity probes are compared to measurements of pressure-gradient and pressure microphones made under identical acoustical conditions at varying distances from a point source having a wide frequency range. Detailed measurements show specific response changes affected by the distance to the source, and focus on the importance of transducer calibration with respect to distance.
Idle Tone Behavior in Sigma Delta Modulation
Perez Gonzalez, Enrique; Reiss, Joshua D.
This paper examines the relationship between various unwanted phenomena that plague audio engineers in the design of Sigma Delta Modulators. This work aims to clarify the difference and relationship between single DC idle tones, long limit cycles, short limit cycles and ‘periodic’ short limit cycles, while extending the current knowledge in idle tone behavior. A relationship between the periodic input to the quantizer of a 1-bit Delta Sigma Modulator and the appearance of idle tones is shown. It is shown that for a large range of input signal magnitudes, the fundamental frequency of idle tones is proportional to the DC input. This finding has also been used to examine idle tone aliasing. Numerous simulations are reported which confirm these findings.
Low Distortion Sound Reproduction Using 8Bit cU and ZePoC-Algorithms
Mathis, Wolfgang; Schnick, Olaf; Wellmann, Jan
The ZePoC-Encoding algorithm for Class-D amplification allows to separate the signal-baseband completely from all higher-frequency switching artifacts. Realtime-ZePoC-Encoding demands a lot of computational power, but in applications where recorded signals should be reproduced, they can be encoded by a software-defined ZePoC-System in advance. Reproducing this pre-encoded signal has very low hardware requirements: No digital-analog-converter or linear amplifier is needed, the playback device must only contain memory, a counter for forming the rectangular output-signal and,if higher output-power is required, an additional switching power-stage and filter. A simple system made up of an 8bit micro-controller at 16Mhz clock-rate could reach a Signal-to-Noise-ratio of 80dB and a usable frequency range of up to 15kHz. A test-system made up of an 8bit RISC-Processor, external memory and a single-transistor, single power-supply switching-stage will be presented.
Efficient and High-Quality Equalization Using a Multirate Filterbank and FIR Filters
Hiipakka, Jarmo; Väänänen, Riitta
This paper presents a digital signal processing algorithm for efficient and high-quality audio equalization. In this approach, the original full-band audio signal is first downsampled and separated into two or more sub-band signals using a multirate filterbank, after which the equalization is performed in the downsampled domains. After the equalization, the signal is upsampled and combined back to a full-band audio signal. Linear-phase FIR filters, designed based on user-controlled parameters, are used to implement the actual equalization. The method presented in this paper helps in designing an implementation that results in computational savings, while still preserving optimal sound quality with any equalization parameter setting.
Correction of Crossover Phase Distortion Using Reversed Time All-Pass IIR Filter
Adam, Veronique; Benz, Sebastien
The purpose of this paper is to describe a correction implementation of group delay distortion arising from a two-way loudspeaker system crossover. Having determined an IIR all-pass filter having a group delay response corresponding to that of the system crossover to be corrected, we have validated under Matlab and implemented in DSP the time reversal solution proposed by S.A. Azizi,. S.R. Powell and P. M. Chau, enabling an IIR filter to be inversed, whilst retaining stability and causality. In addition to theory and calculation validation, we have also carried out preliminary listening tests, supporting the evaluation of timber modification, sound clarity and space localisation due to the group delay distortion correction.
Natural Timbre in Room Correction Systems
Abildgaard Pedersen, Jan; Mortensen, Henrik Green
Room correction systems are often found to provide a timbre, which is described to be artificial or unnatural. This paper presents a new approach to this problem, which is based on the finding that part of the influence of a listening room is natural to the human ear and should not be removed by a room correction system. More specifically the smooth increase of level towards lower frequencies, also referred to as room gain, must be preserved after applying a room correction system. In the described system this is done as an integral part of the automatic target calculator, which also takes into account the main characteristics of the used loudspeaker, e.g. lower cut-off frequency and directivity index.
Detection and Lateralization of Sinusoidal Signals in Presence of Dichotic Pink Noise
Laine, Petteri; Lehtonen, Heidi-Maria; Pulkki, Ville; Raitio, Tuomo
This paper investigates the ability to lateralize low-frequency sound in presence of interfering dichotic noise. This is addressed by measuring the detection and lateralization thresholds of four sinusoidal signals (62.5, 125, 250, and 500 Hz) in presence of uncorrelated pink noise in headphone listening. In lateralization test the signals were positioned to left or right by delaying either of the headphone channels by 0.5 ms. The results show that the lateralization threshold does not depart from detection threshold at frequencies 250 and 500 Hz. Interestingly, below 250 Hz the lateralization threshold rises fast, and at 62.5 Hz, the signal has to be amplified 18 dB from detection level before being lateralized correctly. This suggests that low-frequency ITD decoding mechanisms are easily distracted by random changes in signal phase. This explains at least partly why the direction of subwoofer can not be detected easily in surround sound listening of broad-band signal.
The Practice and Study of Ear Training on Discrimination of Sound Attributes
Liu, Zhi; Wu, Fan; Yang, Qing
In order to improve subjects' discrimination of sound attributes, an ear training course is designed. The training includes: discrimination of a pure tone's frequency, the frequency changes, the sound level changes, the timbre of different musical instruments and the irregularity of frequency response etc. All the items were carried on in an interlaced order to avoid listening fatigue. Meanwhile some explanations of the psychoacoustic principles and many tests were also conducted. Totally 57 subjects divided into two groups took part in the training course for about 15 weeks. After the special ear training, most subjects made great progress with nearly 85% average correctness rates for all the items.
The Training and Analysis on Listening Discrimination of Pure Tone Frequency
Liu, Zhi; Wu, Fan; Yang, Qing
After special and scientific ear training, more than 90% people will get great progress on the discrimination of pure tone frequency. It is concluded by a long term ear training for two groups of subjects. Several pure tones are selected in octave steps for the training. Fifty-seven subjects took part in the training. The training was conducted once a week, totally 15 weeks for one group. The average correctness rate increases from around 60% to above 90%. The test results also show that human ears have poor discrimination with middle frequencies, while strong ability with high frequencies and low frequencies. The relationship between the improvement and training times indicates that the effect of the frequency training has the similar increasing trend as that of the physical training.
Virtual Hearing Aid – A Computer Application for Simulating Hearing Aids Performance
Czyzewski, Andrzej; Kosikowski, Lukasz; Kostek, Bozena
The virtual hearing aid is a computer application allowing an approximate simulation of hearing aid performance. The computer application implements algorithms simulating band-pass filters, compressors and also the perceptual masking strategies for audio signal processing. Individual persons' hearing characteristics were taken into account for this purpose. The experimental part comprises verification of engineered algorithms implemented to virtual hearing prosthesis. The paper contains also results of examinations of patients aimed at verifying the applicability of the proposed signal processing strategy to the domain of hearing prostheses.
Training Versus Practice in Spatial Audio Attribute Evaluation Tasks
Brookes, Tim; Kassier, Rafael; Rumsey, Francis
Listener training in published studies has tended to focus on simple repetitive practice of experimental tasks without feedback. Time savings in listening panel selection and training could be accomplished if a more general training system could be used and applied to a variety of tasks. In order for a training system for spatial audio listening skills to prove effective, it must demonstrate that learned skills are transferable and it must compare favourably with repetitive practice on specific tasks. A novel study to compare a training system with repetitive practice has been extended to include a total of 48 subjects. Transfer is assessed and practice and training are compared against a control group for tasks involving transfer of spatial audio training.
Subjective Assessment of Quality of Multimedia Signals by Means of A-B Test
In the paper an automated method of subjective assessment of speech, music, image and video quality has been described. In the method the sound, image or video samples were randomized and paired in A-B sets and than presented to a group of listeners. On the base A-B results a preference matrix was calculated. The conversion from the preference matrix to a numerical scale was performed with accordance to Thurstone's V model of paired comparisons. The method was applied to evaluate an influence of various coding techniques on a quality of multimedia signals.
Influence of Visual Stimuli on the Sound Quality Evaluation of Loudspeaker Systems
Christensen, Flemming; Karandreas, Alex
Product sound quality evaluation aims to identify relevant attributes and assess their influence on the overall auditory impression. Extending this sound specific rationale, the present study evaluates overall impression in relation to hearing and vision, specifically for loudspeakers. In order to quantify the bias that the image of a loudspeaker has on the sound quality evaluation of a naive listening panel, loudspeaker sounds of varied degradation are coupled with positively or negatively biasing visual input of actual loudspeakers, and in a separate experiment by pictures of the same loudspeakers.
The Study of Audio Equipment Evaluations Using the Sound of Music
Ishimitsu, Shunsuke; Makino, Atsushi; Sakamoto, Koji; Sasaki, Katsuhiro; Sugawara, Keitaro; Yanagawa, Hirofumi; Yoshimi, Toshikazu
In this study, we considered an audio equipment evaluation using the sound of music. Audio amplifiers were set up as the evaluation targets, and sound quality differences between them were visualized by a wavelet analysis using an actual musical sound signal. We considered the cause of these differences and then tried to connect the sound impression to an analysis result.
Advancements in Impulse Response Measurements by Sine Sweeps
Sine sweeps are employed since long time for audio and acoustics measurements, but in recent years (2000 an later) their usage became much larger, thanks to the computational capabilities of modern computers. Recent research results allow now for a further step in sine sweep measurements, particularly when dealing with the problem of measuring impulse responses, distortion and when working with systems which are neither time invariant, nor linear. The paper present some of these advancements, and provide experimental results aimed to quantify the improvement in signal-to-noise ratio, the suppression of pre-ringing, and the techniques employable for performing these measurements cheaply employing a standard PC and a good-quality sound interface, and currently available loudspeakers and microphones.
The Challenges of Testing MP3 Players
Brunet, Pascal; Rimkunas, Zachary; Temme, Steve
MP3 players have been the ‘must-have’ electronic gadget for the past few years. Over 10 million players were sold in 2005, and this is predicted to rise to more than double this by 2010. But how can manufacturers carry out QA tests on the production line, ensure excellent sound quality and demonstrate their compliance with Sound Pressure Level regulations? MP3 player testing is challenging as it combines traditional acoustic analysis techniques with some characteristics unique to MP3 players. Here, we examine the equipment and techniques that MP3 player manufacturers can use to test the sound quality of their products, and discuss the measurement techniques and algorithms that can be used to overcome the challenges inherent to measuring MP3 players.
Tracking Harmonics and Artifacts in Spectra Using Sinusoidal and Spiral Maps
A technique for tracking a harmonic series in a spectrum using a combination of sinusoidal and spiral maps is described. The spiral map enhances patterns that appear when a sinusoid is sampled near the Nyquist rate. The correspondance between the maps facilitates derivation of properties and motivates the use of curves that cut across the sinusoid or spiral. As an application, the spatial separation of a specific musical pitch from an artifact is demonstrated.
Spatial Distribution Meter: A New Method to Display Spatial Impression Over Time
A wide-spread method for visualizing the spatial behaviour of stereo signals is the vector-scope or goniometer which shows the relation between the left and right channel. Well-trained eyes can see misbalance and mono compatibility problems in these very fast changing figures. However, this analysis tool contains no information about past behaviour of the stereo signal and no details can be seen. In this contribution we are introducing a new method for analyzing stereo signals, which is based on the vector-scope but shows the behaviour over time. The final graph looks like a spectrogram / sonogram, where the axes are time and angle. Useful applications of this new spatial distribution meter are the analysis of stereo impulse responses, and the typical applications of a vector-scope.
Low Level Audio Signal Transfer through Transformers Conflicts with Permeabily Behavior Inside their Cores
van der Veen, Menno
At the threshold of audibility, the signal and flux density levels in an amplifier with audio transformers are very small. At those levels the relative magnetic permeability of the iron transformer core collapses and the inductance of the transformer becomes very small. The impedances connected to the transformer plus its signal level and frequency dependant inductance behave as a high pass filter which corner frequencies slip into the audio bandwidth, resulting in a non linear signal transfer through the transformer. This research explains deviations in the reproduction of micro details at the threshold of audibility.
New Techniques for Measuring Speech Privacy & Efficiency of Sound Masking Systems
Speech privacy is becoming an increasingly important aspect for many workplace and security environments as well as hospital and medical centres where patient confidentially is of critical importance. Traditionally, Speech privacy has been measured by means of the Articulation Index, transposed to rate privacy rather than intelligibility (PI=1-AI). However, this is an indirect and cumbersome method that usually requires a spreadsheet calculation to yield the Privacy Index rating. The paper discusses the potential use of STI and STIPa as direct measures of speech privacy. The benefits and limitations of the methods are highlighted together with the results from a number of case studies. It is concluded that whilst the method has potential merit, a number of the limiting factors require further research.
Onset Detection Method in Piano Music: Sensibility to Threshold Levels
Beracoechea, Jon; Casujus-Quiros, Javier; Ortiz-Berenguer, Luis; Torres-Guijarro, Marisol
Piano music transcription requires an stage of onset detection. Every time a new note or chord is played, a new analysis is needed. It is a critical issue to correctly detect if a new note or chord has been played. Onset detection should have a simple solution, but several problems arise when attempting to perform it. This paper present an study of the sensibility of a detection method depending on the adjustable parameters. It also compares some results with a simpler method based on the analysis of the derivative of the energy envelope. The methods have been tested with six piano compositions. As a conclusion, accurate automatic onset detection in piano music is not a simple task.
Benefits of Using SIP for Audio Broadcasting Applications
De Jaham, Serge
The SIP protocol gains popularity for setting up temporary audio links over IP networks for broadcast applications. This paper briefly describes SIP and discusses its distinctive advantages, especially in comparison with proprietary systems.The main and obvious benefit is standardisation, which opens the way to interoperation between different makes of codecs.SIP, as a signalling protocol, readily provides efficient methods for link setups, while preserving ease of use. Now a mature technology in the VoIP field, it is supported by a wide range of network devices, and includes provision for specific issues like firewall or NAT traversal.Lastly, SIP should be the key to the transition from ISDN to IP networks, while providing at least as flexible operating modes.
Managing the Leap From Synchronous to IP for Radio Broadcasters
Increasingly Radio Broadcasters are looking at making the leap from synchronous networks to IP networks for their distribution and contribution links. The advantages of migrating away from synchronous networks to IP networks are numerous but often tempered by a number of concerns regarding the IP transport mechanism including latency, lost packets, packet size, protocol selection, jitter and algorithm selection. This paper will address concerns such as which IP Protocol is most suitable for real time audio delivery, which algorithm is most suited to IP to reduce the affects of the inherent latency on an IP network, how to protect against packet loss and how to deal with the inherent delay involved in packetising audio for delivery over an IP network.
An XML-Based Approach to Audio Connection Management
Foss, Richard; Klinkradt, Brad
An XML-based approach to firewire audio connection management has been developed that allows for the creation of connection management applications using a range of implementation tools. The XML connection management requests flow between a client and server, where the client and server can reside on the same or separate workstations. The server maintains the state of the firewire audio device configuration as well as information about potential users. XML is also used to control user access and booking of devices.
Rhythm Based Error Correction Approach for Scalable Audio Streaming Over the Internet
Cuevas-Martinez, Juan C.; Vera Candeas, Pedro; Ruiz Reyes, Nicolas
Multimedia is nowadays the most important kind of information over the internet, due to the impressive growing of the web and streaming technologies. Although there are faster lines, the amount of potential users can exceed the actual available band width. In that way, scalable audio streaming makes it possible. However, error correction is left in a second level of importance for multimedia, using in some cases TCP, FEC (Forward Error Correction) that are useless in low bit rate coders or even nothing. Therefore, in this article a rhythm based error correction approach is presented. This solution can avoid important redundant information, leaving almost all the error processing at the decoder side, without any feedback to the sender.
Sound-Transformation and Re-Mixing in Real-Time
Starting with a short overview on some very basic principles of sound perception this paper acts on the assumption that recording, storage, editing and reproduction of audio signals have compensated at least some of these principles and therefore significantly changed human listening habits. Reflecting on these changes, the idea of sound transformation and re-mixing in real-time is suggested as part of the performance and composition process. Some techniques are explored and a number of implementations are introduced.
Hybrid Time-Scale Modification of Audio
Gournay, Philippe; Lefebvre, Roch; Savard, Patrick-André
This paper presents a novel technique for time-scale modification (TSM) which integrates time-domain and frequency-domain processing. The method relies on frame-by-frame classification to choose between different techniques adapted to different signal types. Provisions are taken to seamlessly switch between techniques. The result is a more universal TSM algorithm that yields continuous high quality results on a wider range of audio signals. The method is tested on mixed-content signals and formal listening tests results are discussed.
New Audio Editor Functionality Using Harmonic Sinusoids
Sandler, Mark; Wen, Xue
This article introduces the application of harmonic sinusoid model in an audio editor. The harmonic sinusoid model is a parametric model for representing pitched audio events, with tolerance on inharmonicity and pitch evolution. While standard audio editors allow the user to select a time or frequency range to edit, with the harmonic sinusoidal parameters estimated in phase, we are able to select a pitched event and edit it as if it is a separated sound. The user interface is designed as simple as one-click selection, while the user is also given further options for better results.
Measurement and Optimization of Acoustic Feedback of Control Elements in Cars
Gruhler, Gerhard; Treiber, Alexander S.
Acoustical quality of control elements in cars is increasingly important for manufacturers in order to improve the quality appearance and security of their products. This paper presents methods and tools used in an ongoing research project. The project’s goal is to support the industry with the definition of suitable parameters and limits as well as to develop realizable proposals for measuring equipment. Jury tests are hereby used to create the scientific basis for the hearing-related benchmarking of signals.
On the Training of Multilayer Perceptrons for Speech/Non-Speech Classification in Hearing Aids
Alexandre, Enrique; Álvarez, Lorena; Cuadra, Lucas; Rosa-Zurera, Manuel
This paper explores the application of multilayer perceptrons (MLP) to the problem of speech/non-speech classification in digital hearing aids. When properly designed and trained, MLPs are able to generate an arbitrary classification frontier with a relatively low computational complexity. The paper will focus on studying the key influence of the training process on the performance of the system. An appropriate election of the training algorithm will help to provide better classification with a lower number of neurons in the network, which leads to a lower computational complexity. The results obtained will be compared with those obtained from two reference algorithms (the Fisher linear discriminant and the k-Nearest Neighbour), along with some comments regarding the computational complexity.
Production and Live Transmission of 22.2 Multichannel Sound with Ultra-High Definition TV
Ando, Akio; Hamasaki, Kimio; Imai, Atsushi; Iwaki, Masakazu; Kitajima, Shoji; Nakayama, Yasushige; Nishiguchi, Toshiyuki; Okumura, Reiko; Otsuka, Yutaka; Shimaoka, Satoko; Sugimoto, Takehiro
A 22.2 multichannel sound system was developed for ultrahigh-definition TV. The improvement in the spatial quality created by this system as compared to that of two-dimensional sound was evaluated and reported in previous papers. The first experiment on large-scale live production and transmission of 22.2 multichannel sound with ultrahigh-definition video was carried out to show the possible application of this system to next-generation broadcasting. In Tokyo, 22.2 multichannel sound was live mixed and transmitted to Osaka using an existing IP optical network. This paper describes in detail the live production and its transmission using the 22.2 multichannel sound system. It also discusses various issues of sound design, capturing, and mixing for three-dimensional sound.
Automated Audio Detection, Segmentation and Indexing, with Application to Post-Production Editing
Avdelidis, Kostantinos A.; Dimoulas, Charalampos A.; Kalliris, George M.; Papanikolaou, George V.; Vegiris, Christos
The current work deals with audio event detection, segmentation and characterization, in order to be further utilized in post-production. Browsing, selection and characterization of audio-visual content is a tiresome task, especially in audio / video editing applications, where an enormous amount of recordings with different characteristics is usually involved. Automated detection, segmentation and general audio classification are essential to deploy flexible and effective audio-visual content management. A multi-resolution scanning procedure, based mainly in wavelet-processing, is currently proposed where various energy-based comparators and signal-complexity metrics have been tested for detection purposes. A variety of audio features, including MPEG-7 audio low level descriptors, have been considered for events’ characterization and indexing purposes. Extraction of the detection / characterization results via MPEG-7 description schemes or similar indexing files are considered.
A New Method for Measuring Time Code Quality
Beckinger, Michael; Müller-Kähler, Florian; Rodigast, René
A high quality time code and word clock synchronization is essential to prevent audio drop outs and flutters in sound studios. A bad adjustment of time code generators respectively word clock synchronizers requires extensive error checks in synchronization networks. For this reason, a new measurement method is presented which enables sound engineers to measure longitudinal time code jitter and to check time code/word clock synchronization.
50 Years of Sound Control Room Design
Sound control room design is an interesting corner of small room acoustics and represents most of the problems found here: Frequency balanced reverberation time, proper distribution of room modes, low frequency reproduction, sound source and receiver positioning, etc. The function of the control room is twofold, which is often overlooked: On one hand the control room together with the monitor loudspeakers should reproduce as faithful as possible the efforts of the sound engineer and the producer in creating a new recording. On the other hand the control room should mimic the perceived acoustics of an average living room when checking the final result of the recording. Simply because most musical productions are aimed at the listening environment of a living room.
Acoustics in Rock and Pop Music Halls
Adelman-Larsen, Niels Werner; Gade, Anders Christian; Thompson, Eric R.
The existing body of literature regarding the acoustic design of concert halls has focused almost exclusively on classical music, although there are many more performances of rhythmic music, including rock and pop. Objective measurements were made of the acoustics of twenty rock music venues in Denmark and a questionnaire was used in a subjective assessment of those venues with professional rock musicians and sound engineers. Correlations between the objective and subjective results lead, among others, to a recommendation for reverberation time as a function of hall volume. Since the bass frequency sounds are typically highly amplified, they play an important role in the subjective ratings and the 63-Hz-band must be included in objective measurements and recommendations.
The Flexible Bass Absorber
Adelman-Larsen, Niels Werner; Gade, Anders Christian; Thompson, Eric R.
Multi-purpose concert halls face a dilemma. They host different performance types that require significantly different acoustic conditions in order to provide the best sound quality to both the performers, sound engineers and the audience. Pop and rock music often contains high levels of bass sound energy but still require high definition for good sound quality. The mid- and high-frequency absorption is easily regulated, but adjusting the low-frequency absorption has typically been too expensive or requires too much space to be practical for multi-purpose halls. A practical solution to this dilemma has been developed. Measurements were made on a variable and mobile low-frequency absorber. The paper presents the results of prototype sound absorption measurements as well as elements of the design.
Improvements to Binary Amplitude Diffusers
Angus, Jamie A. S.; Gehring, Gillian A.; Payne-Johnson, Elizabeth C.
Improved forms of diffusion structures based on absorption reflection gratings are presented. The theory, design, advantages and limitations of these structures are discussed and their performance presented. Two methods of improving performance are suggested. The first structure is based on diffusion limited aggregation, which models non-regular fractal growth. The second structure was a panel with square absorption patches of a variable size that was determined by an m-sequence. Of the two, the second structure performed best. These are improved diffusing structures that take up less space than phase reflecting ones. These techniques add additional materials to the acoustic designer's armoury, for tackling real acoustic designs that have physical and practical, as well as theoretical, constraints.
An MlS Method for Non-Stationary and Outdoor Acoustic Paths
Angus, Jamie A. S.; Waddington, David C.
The correlation properties of a directly carrier-modulated code sequence modulation signal are exploited to investigate sound propagation in turbulent air. An experiment is described in which the correlation properties of the spread spectrum signal are demonstrated and are used to calculate accurate times of flight that compare well with sonic anemometer measurements of speed of sound. The results illustrate that a directly carrier-modulated code sequence modulation system can provide significantly improved ways of measuring sound propagation outdoors. Moreover, the technique directly measures wind speed. This can be used to compensate the time of flight thus allowing the measurement of acoustic impulse responses in non-stationary media, for example outdoors, where reliable measurements have previously been difficult to obtain.
Holographic Sound Field Analysis with a Scalable Spherical Microphone-Array
Albrecht, Bernhard; Del Galdo, Giovanni; Husung, Stephan; Lotze, Jörg; Schlesinger, Anton
Room acoustic parameters vary greatly with the position of the receiver and the source, so that we cannot extract exhaustive information about the room acoustics from independent single-point measurements. Using array measurements permits to predict the sound field with a high spatial resolution and leads to a more precise assessment of the room acoustic properties. We propose an array technique to investigate room acoustics by reconstructing the volumetric sound field from measurements taken on a sphere using methods of Nearfield Acoustical Holography (NAH). A virtual spherical single-microphone-array was constructed and successfully tested in room acoustical modal analysis.
A Comparison of Modelling Techniques for Small Acoustic Spaces Such as Car Cabins
This paper results from a case study comparing the relative cost effectiveness of three modelling techniques applied to a small acoustic space such as a car cabin. The techniques considered are finite element analysis, analytical solutions, and the quasi-analytical ray-trace or image method. A simple test-case is used to compare solution times and accuracy.
Acoustical and Musical Design of the Sea Organ in Zadar
The Sea Organ in Zadar, Croatia, is an awarded urban architectural installation using seawave random kinetic energy to produce musical sounds. It contains 35 stopped flue pipes built into subterranean tunnels having outwards-bound appertures for the sound to emanate. Each flue pipe is blown by a column of air pushed in turn by a column of moving water entering an immersed tube. The pipes are tuned to 9 tones of the diatonic major chords G and C6. The series of excited tones is a statistical function of time- and space-distributed wave energy to particular pipes. In this paper the acoustical and musical design propositions and solutions, as parts of the multidiscipline design process, will be presented.
A Wireless PDA-Based Acoustics Measurement Platform
Alexandridis, Petros; Hatziantoniou, Panagiotis; Mourjopoulos, John; Tatlas, Nicolas-Alexander
The proposed platform allows acoustic measurements via a flexible, portable system, based on commercially available hardware, such as a personal digital assistant (PDA) equipped with a wireless adapter and a digital audio capture card, together with a standard personal computer (PC) with wireless network access. Using this hardware, three software applications were implemented: (i) a device driver that handles the communication of the digital audio capture card with the PDA, (ii) a PDA application that realizes the WiFi connection with the personal computer, also incorporating a data recording function and suitable presentation of their analysis, and (iii) the personal computer application that initiates the playback sequence as dictated by the connected PDA. The system can assist the fast measurement of large spaces as is illustrated by Room Response measurements.
Non-Linear Crosstalk in Personal Computer Based Audio Systems
Belcher, R. Allan; Chambers, Jonathon
IEC and AES standards provide comprehensive tests for the performance of Audio analogue to digital (ADC)and digital to analogue (DAC)converters for both consumer and professional applications. It is usually assumed that the ADC is more likely to degrade audio sound quality than the DAC. Tests on two samples of a professional quality PC based audio system are presented that show that a stereo DAC can introduce unexpected non-linear effects. These results suggest that the Standards should include a measure of interchannel non-linear crosss talk in the stereo DAC. Results are presented and a DAC-ADC loop test proposed to enable this measurement to be made without using additional test equipment.
A Low-Distortion Fast-Settling Audio Oscillator: A Tribute to the late Peter J. Baxandall, Audio Analog Expert
This paper is dedicated to the memory of Peter Baxandall, well-known for his work in audio and electronics. It is an exposition and analysis of a low-distortion fast-settling audio oscillator that he designed and built. Normal oscillators are shown to suffer from amplitude instability when the thermally-variable controlling resistance has a long time constant. The genius of the present two-integrator design is that it derives its amplitude stability from the cancellation of two square wave signals, of which one is fixed in amplitude, the other proportional to the oscillator output, with a threshold. A detailed analysis of the oscillator is presented. The result is an oscillator with distortion below 0.01% and settling times of approximately 1 oscillation period. It is particularly useful in automated test equipment.
Direct Current Offset and Balance for Audio Transformers Used with Paralleled Tubes or Solid State Devices
Polisois, Aristide; Touzelet, Pierre
In May 2005 (118th AES Convention), I described a self compensated transformer for Single ended Audio amplifiers, designed as SC-OPT and based on the principle that an auxiliary winding (tertiary) crossed by the same current as the primary, opposes a magnetic flux that reduces the overall flux, produced by the DC, to almost zero. In this paper, an extension is discussed where no external adjustment is needed to balance the DC flux inside the core to zero. The new transformer is fully self-auto balancing.
Acoustical Issues and Proposed Improvements for Nasa Spacesuits
Begault, Durand R.; Hieronymus, James L.
This presentation reviews current acoustical issues relevant to the design of future NASA Spacesuits, based on measurements conducted in the current Mark III advanced prototype surface suit, and proposes solutions for improving voice communications. Methods for mitigating problems including noise from the air supply, structure-borne noise from the suit, and detrimental acoustical reflections are reviewed.
Design, Construction and Qualification of the New Anechoic Chamber at Laboratorio De Sonido, Universidad Politécnica De Madrid
Blanco-Martin, Elena; Gómez-Alfageme, Juan José; Sánchez-Bote, José Luis
During the year 2005 it has proceeded to the design and construction of a new anechoic chamber at Laboratorio de Sonido of the Universidad Politécnica de Madrid. This new chamber has a free volume of 70 cubic meters and is built with rock wool wedges covered with a porous cotton cloth. The chamber cutoff frequency is 150 Hz. The chamber has been qualified according to that established in the ISO 3745 standard for the determination of the maximum distance between the sound source and the measurement position where the inverse square law is observeded, within some tolerance. For the qualification, different types of excitation signals have been used as pure tones, broadband noise, narrow band noise and pseudorandom sequences MLS.
Time Signature Detection by Using a Multi-Resolution Audio Similarity Matrix
Coyle, Eugene; Gainza, Mikel
A method that estimates the time signature of a piece of music is presented. The approach exploits the repetitive structure of most of music, where the same musical bar is repeated in different parts of a piece. The method utilises a multi-resolution audio similarity matrix approach, which allows comparisons between longer audio segments (bars) by combining comparisons of shorter segments (fraction of a note). The time signature method only depends on musical structure, and does not depend on the presence of percussive instruments or strong musical accents.
Signal Processing Parameters for Tonality Estimation
Noland, Katy; Sandler, Mark
All musical audio feature extraction techniques require some form of signal processing as a first step. However, the choice of low level parameters such as window sizes is often disregarded, and arbitrary values are chosen. We present an investigation into the effects of low level parameter choice on different tonality estimation algorithms, and show that the low level parameters can make a significant difference to the results. We also show that the choice of parameters is algorithm specific, so optimisation is required for each different method.
Audio Effects for Real-Time Performance Using Beat Tracking
Stark, A. M.; Plumbley, M. D.; Davies, M. E. P.
We present a new class of digital audio effects which can automatically relate parameter values to the tempo of a musical input in real-time. Using a beat tracking system as the front end, we demonstrate a tempo-dependent delay effect and a set of beat-synchronous low frequency oscillator (LFO) effects including auto-wah, tremolo and vibrato. The effects show better performance than might be expected as they are blind to certain beat tracker errors. All effects are implemented as VST plug-ins which operate in real-time, enabling their use both in live musical performance and the off-line modification of studio recordings.
Java Library for Automatic Musical Instruments Recognition
Aniola, Piotr; Lukasik, Ewa
The paper presents open source Java library intended for analysis and classification of musical instruments sounds. It consists of two main parts: one devoted for feature extraction and the second performing musical instruments recognition and similarity assessment. Project’s plug-in based structure enables further extendibility of both modules. In the current version two separate sound modeling algorithms have been implemented: k-means and Gaussian Mixture Models. The software has been created for the purpose of recognition of different exemplars of the same type of instruments and validated for electric guitars, guitar-amplifiers and violins. Java project follows the latest trends in software engineering. It enables the developer to easily create highly usable, reliable and extendable programs.
Extraction of Long-Term Rhythmic Structures Using the Empirical Mode Decomposition
Heydarian, Peyman; Reiss, Joshua D.
Long-term musical structures provide information concerning rhythm, melody and the composition. Although highly musically relevant, these structures are difficult to determine using standard signal processing techniques. In this paper, a new technique based on the time-domain empirical mode decomposition is explained. It enables us to analyse the long-term metrical structures in musical signals and provides insight into perceived rhythms and their relationship to the signal. Besides, it decomposes a given signal into its constituent oscillations that can be modified to produce a new version of the signal. The technique is explained, and results are reported and discussed.
Multi Core / Multi Thread Processing in Object Based Real Time Audio Rendering: Approaches and Solutions for an Optimization Problem
Partzsch, Andreas; Reiter, Ulrich
This paper presents considerations, approaches and solutions to the problem of optimization of thread distribution for multi core processing in object based real time audio rendering environments. It explains some basic problems, describes the constraints and finally suggests two approaches based on solving an optimization problem by analyzing a directed graph representing the signal processing flow. The suggested approaches can handle an arbitrary number of CPU cores and are therefore well primed for future processor developments. They are compared with respect to performance and complexity. Performance of one of these approaches is close to the theoretically possible speedup, depending on the structure of the processing graph.
A Distributed Real-Time Virtual Acoustic Rendering System for Dynamic Geometries
Kajastila, Raine; Lokki, Tapio; Lundén, Peter; Savioja, Lauri; Siltanen, Samuel
A novel room acoustic simulation system capable of producing interactive sound environments in dynamic and complex 3D geometries is introduced. The system is distributed to several modules that share the same 3D geometry. All changes made by one module are updated in all the other modules in real time. The auralization tools of the system include a geometry reduction tool, a beam tracing algorithm, and a sound rendering application. The geometry reduction simplifies 3D models for beam tracing module that forwards direct sound and early reflection paths for sound rendering. The sound rendering application contains a automatic estimation of late reverberation parameters, based on early reflections.
To Create Spatial Auditory Events Via More Channel Headphones Related on Portable 5.1 / 5.0 Surround Reproductions of Sound
König, Florian M.
In the future “portable surround” devices will be the successor of “stereo” applications in games or “mp3-players”. This portable technique evolutions need headphones, which offer 3D images of sound with a minimum of elevation effects and a virtual distance perception front and back. The past 119. AES till 28. AES Conference presented further problems like a compatible downmix of 5.1 to 2.0 or 4.0 signals; not only the binaural head-related signal processing. As well it should be available standardised vario 3,5” jack for stereo or more channel signal supply. Herewith will be presented main basics how to realise this spatial auditory events via 4-channel headphones plus a direct audio signal supply loudspeaker compatible.