Robust Acoustic Contrast Control with Reduced In-situ Measurement by Acoustic Modeling *

Personal audio systems generate a local sound field for a listener while attenuating the sound energy at pre-defined quiet zones. In practice, system performance is sensitive to errors in the acoustic transfer functions between the sources and the zones. Regularization is commonly used to improve robustness, however, selecting a regularization parameter is not always straightforward. In this paper a design framework for robust reproduction is proposed, combining transfer function and error modeling. The framework allows a physical perspective on the regularization required for a system, based on the bound of assumed additive or multiplicative errors, which is obtained by acoustic modeling. Acoustic contrast control is separately combined with worst-case and probability-model optimization, exploiting limited knowledge of the potential error distribution. Monte-Carlo simulations show that these approaches give increased system robustness compared to the state of the art approaches for regularization parameter estimation, and experimental results verify that robust sound zone control is achieved in the presence of loudspeaker gain errors. Furthermore, by applying the proposed framework, in-situ transfer function measurements were reduced to a single measurement per loudspeaker, per zone, with limited acoustic contrast degradation of less than 2 dB over 100–3000 Hz compared to the fully measured regularized case.


INTRODUCTION
Personal audio reproduction aims to use a loudspeaker array to render desired sound to a listening position with minimal interference to listeners in other regions.Various methods for sound zone reproduction are discussed and compared in [1] and [2].Several of these algorithms are based on input transfer functions (TFs) between the loudspeaker array and the control microphones, for example: acoustic contrast control (ACC) [3], pressure matching [4,5], planarity control [6], and sound field reproduction ACC [7].These systems are vulnerable to perturbations in the assumed TFs, which might result from inconsistencies in the individual sources' sensitivities, complexities in the spatial responses, source position mismatches, and so on [8].In addition, sound speed changes caused by variations in, e.g., temperature, humidity and airflow, lead to changes in the TFs and eventually lower the system performance [9].Besides these time-variant perturbations within a reproduction system, there are also perturbations from system to system, for example due to variations among loudspeakers of the same model.Furthermore, it might not always be possible to conduct in-situ measurements in the zones to calibrate or adjust installed systems.
In order to reduce performance degradation in TF-based methods under such conditions, robustness should be carefully considered in the algorithm design.Kim et al. [10] and Elliott et al. [8] applied constraints on array audible gain or array effort (AE) to increase system robustness.Coleman [9] studied the effect and selection of a regularization parameter to improve overall performance considering acoustic contrast (AC), planarity, and AE.These methods only utilize the characteristics of the array and the application geometry, without considering potential errors in the system.The AE constraints were also employed by Cheer et al. for practical systems in a car cabin [11] and a mobile device [12].
Park et al. [13] discussed the relationship between TF errors and the resulting AC but did not propose a robust PAPERS ROBUST ACOUSTIC CONTRAST CONTROL BY ACOUSTIC MODELING control solution.Bai et al. [14] incorporated information about the errors in the system by using Monte-Carlo simulations to find the best regularization parameter for a microphone array.This approach has high computational complexity and is constricted by the pre-given solution form, i.e., diagonal loading.
Error-based robust solutions, utilizing both system geometry and error distribution information, can be derived by employing different optimization strategies, e.g., probability-model optimization (PMO) and worst-case optimization (WCO).The concept of utilizing error information by PMO and WCO to increase system robustness was originally investigated for robust beamforming, e.g., [15] and [16].For personal audio, Zhang et al. [17] utilized PMO to achieve robust ACC, assuming multiplicative noise in the TFs.Cai et al. [18] also studied robust ACC, considering additive gain and phase variations, using PMO.A special case of PMO [8] treated the reverberant field above the Schroeder frequency as diffuse, and increased robustness by introducing a spatially-averaged mean square uncertainty into diagonal loading regularization.This approach was experimentally validated in [19].Cai [20] also applied WCO to ACC and discussed its relationship to the PMO approaches.However, the relationship between error-based methods, such as WCO and PMO, and other regularization methods has not been systematically elaborated in the literature.
In addition to errors in the system, the robust control performance depends on both the determined part and the error part of the TF model.The determined part and the error part should both be modeled accurately for the most robust performance.This can be realized by incorporating acoustic modeling.As an example of determined part adjustment, Chang et al. [21] and Tu et al. [22] investigated the incorporation of a rigid sphere model into TFs to help increase robustness against head scattering effects.In our previous work [23] we proposed a general framework for robust sound reproduction making use of acoustic modeling to obtain input for error-based robust optimization (i.e., PMO, WCO).The acoustic modeling described the ideal radiation pattern of the loudspeakers to the control points and additionally represented gain and phase variations in the TFs with an estimated error model.
In this article we extend our work in [23] with three new contributions.First, we present new analysis of the problem of selecting a diagonal loading parameter for robust ACC, including a method based on the maximum singular value of the TF matrix.Second, we include experimental validation results, which evaluate the performance of robust control techniques in the presence of perturbations introduced to the loudspeaker gains.Third, we investigate a practical application of error modeling to reduce the need for in-situ TF measurement.Specifically, for each loudspeaker, we make a single TF measurement to each zone and use a pointsource interpolation and uncertainty model to populate the remaining rows of the TF matrices.
In Sec. 1 we outline the acoustic-modeling based robust reproduction framework and provide the robust ACC solutions derived by two optimization strategies (PMO and WCO).In Sec. 2 the robust performance pattern of a typical ACC system is analyzed by Monte-Carlo simulations, which shows the advantage of acoustic-modeling based methods.Experimental results are presented in Sec. 3, comparing robust control strategies when there are errors in the system due to loudspeaker gain errors and missing information in the TFs.We discuss the results in Sec. 4 and summarize in Sec. 5.

METHODS
In this section we first introduce the framework for robust reproduction based on acoustic modeling.Then, two different robust control strategies (WCO and PMO) are applied to ACC as a typical application of this framework.Finally, a practical "quick" approach is proposed.The approach uses the estimated bounds of the errors to determine the control parameters for robust ACC.

Framework for Robust Sound Reproduction Based on Acoustic Modeling
The proposed framework consists of two distinct stages: acoustic modeling to obtain an acoustic TF model for a real system with uncertainty and robust design to derive source weights for robust control.Acoustic TFs from the sources to the control zones are modeled as source radiation paths with errors.As shown in Fig. 1, after setting the application scenario (including the system geometry and the acoustic environment), a set of spatial impulse responses is acquired by transfer function measurement.Then in the sound propagation analysis module, the sound field radiated by each source is modeled as the superposition of a determined part {G} and a set of potential error { G} that describes variations in the TFs for certain reproduction scenarios: where G, G and G are M × L TF matrices, with M control points in the target zone(s) and L loudspeakers.
To model the determined part, recent technologies for source radiation measurements and modeling can be considered, such as holographic nearfield measurements [24] or sound field separation [25].In this paper we focus on the direct radiation paths and consider the room effects to be errors, although these could additionally be modeled.
The error term obtained in the error estimation module is a certain probability distribution function p(*) or bound b(*) describing the variances in TF amplitude and phase due to potential errors.The error term represents any source directivity components that cannot be fully represented with limited resolution measurements and also the spatial response variances resulting from non-ideal conditions, e.g., the room effect.
With the determined TF matrix G and the error descriptions p(*) or b(*), several error-model-based robust control strategies can be applied.In the following sub-sections, the solutions of robust ACC are derived separately for WCO and PMO.

Acoustic Contrast Control
ACC maximizes the ratio of the spatially averaged sound energy between the listening zone and the quiet zone [3]: where w is the L × 1 loudspeaker weight vector to be obtained, and spatial correlation matrices defined by TF matrices G L and G Q for the listening zone and the quiet zone, respectively.

Worst-Case Optimization
In order to improve the overall performance of ACC applied to all the predictable situations, WCO [20,23] aims to find and optimize the situation giving the worst performance among all the possible variations.Considering uncertainties in the TFs, the quiet zone spatial correlation matrix is written as where * F denotes Frobenius norm.The solution is [15]: where 1 (*) denotes the principal eigenvector, corresponding to the maximum eigenvalue and I is a L × L identity matrix.Eq. ( 4) has a standard regularization form that may be found by applying an energy constraint [8,26].However, the diagonal loading parameter δ Q is derived directly from the error bound R Q F ≤ δ Q , rather than from white noise gain or AE constraints.As mentioned in [23], it might also be useful to apply a weighting to the estimated error bound of R Q F to avoid "over-robust" solutions that maintain robustness but lead to poor AC performance.Errors in the listening zone TFs could be incorporated into the WCO, replacing R L in Eq. ( 4) with (R L − δ L I) [15].However, our previous work [23,Fig. 4] showed that applying a diagonal loading parameter δ L to R L did not obviously increase robust performance.In general, if the ratio δ L /δ Q is much smaller than the potential AC, the effect of diagonal loading on R L can be ignored.

Probability-Model Optimization
Assuming the errors in the acoustic TFs have a predictable distribution, PMO aims to improve the average performance according to that distribution [16].Here we consider two kinds of error: multiplicative error and additive error.

Multiplicative Error PMO
Following [17], multiplicative errors can be incorporated into PMO for ACC.The form of the TF with multiplicative error is G(m,l) = G (m,l) ae jφ , where j = √ −1, G (m,l) denotes the TF between the lth loudspeaker and mth control point in a zone, and a and φ are the amplitude and phase of multiplicative TF errors.The statistical features of the TF errors are σ a = a a 2 p(a) da, μ a = a a p(a) da, and p(a) and p(φ) are the probability density functions describing the amplitude and phase errors, respectively.p(a) and p(φ) can be obtained by acoustic modeling.Assuming errors in each TF are independent and identically distributed, the average spatial correlation matrix for the quiet zone is where ⊗ is pointwise multiplication, and E Q is a L × L matrix with diagonal elements equal to σ a and non-diagonal elements equal to μ 2 a σ φ .The average spatial correlation matrix R L, PMO for the listening zone is obtained in the same way.ACC with PMO for multiplicative error is that has the solution

PAPERS ROBUST ACOUSTIC CONTRAST CONTROL BY ACOUSTIC MODELING
A special case assumes that the amplitudes of the error a are uniformly distributed over the range [a min , a max ] and the phase error is uniformly distributed over the range [φ min , φ max ].In this case, the entries of E Q and E L are calculated as Therefore, PMO introduces regularization, which does not directly have diagonal loading form, into the spatial correlation matrices for both zones.

Additive Error PMO
Following [18], additive errors can also be incorporated into PMO for ACC.The form of the TF with additive error is G(m,l) = G (m,l) + ae j φ , where a and φ are now the amplitude and phase of the additive TF errors.The statistical features of the additive TF errors are σ a , μ a , σ φ and μ φ = φ e −jφ p(φ) dφ.We define E 1,Q as a L × L matrix with elements in the lth column equal to M i=1 G i,l Q , i.e., the sum of TFs between the lth loudspeaker and the M control points in the quiet zone.We further define E 2, Q as a L × L matrix with diagonal elements equal to σ a and non-diagonal elements equal to μ 2 a σ φ .The average spatial correlation matrix of the quiet zone is where μ * φ is the complex conjugate of μ φ .The solution is where As before, a special case assumes uniform error distribution with amplitude a uniformly distributed in the range [0, a max ] and phase uniformly distributed in the range [0, 2π].In this case, In this special case, PMO has a diagonal loading form acting on the spatial correlation matrices for both zones.

Quick Parameter Estimation for Robust ACC
In the following, we refer to errors occurring in the system as the actual errors, while the assumed errors are the estimated errors used for filter calculations.
In the WCO approach, the robust parameter δ Q (Eq.( 4)) used to calculate w WC is frequency-varying and determined by the bound of errors in the spatial correlation matrix, where ε is a scale parameter to prevent "over-robust" performance in WCO.If the assumed error has the form of multiplicative error, then where a max, ME and a min, ME are the maximum and minimum error gains, respectively.If the assumed error has the form of additive error, then where a max, AE is the maximum error amplitude.The parameters a max, ME , a min, ME , a max, AE are determined by b( G), the bound of errors in TFs.Eqs. ( 14) and ( 15) lead to a quick parameter estimation in WCO incorporating prior knowledge, filling in the gap between the outcomes of acoustic modeling (G Q and b( G)), and directly lead to δ Q (Eq.( 13)), which in turn populates the WCO solution (Eq. ( 4)).Furthermore, though PMO is an error distribution based solution, it can also be formulated based on TF error bounds by selectively sacrificing accuracy in the error distribution estimation.Since it is difficult to accurately estimate the actual error, a practical "quick" approach is to use a simple error distribution (e.g., Eqs. ( 9) and ( 12) assuming uniform distribution) to approximate the actual error.This only requires the TF error bound parameters (a max , a min , φ max , φ min ) to calculate loudspeaker weights.These TF error bound parameters have a clear physical meaning and a relationship to the actual error.
These "quick" approaches employing coarse error estimation (that does not fully describe the actual errors) are included in the following simulation and experimental results.

SIMULATIONS
In this section we present simulation results comparing the performance of robust control approaches in the presence of TF errors.

Simulation Setup and Baseline Geometry
As shown in Fig. 2, we use an arc-shaped array with 11 loudspeakers.The loudspeakers are uniformly arranged with a distribution angle of 6 • around a radius of 1.68 m.The control points for the listening and quiet zones are defined on dual circles with 24 microphones in each ring, with radii of 0.083 m and 0.104 m.In the simulations, we suppose each loudspeaker acts as a point source.That is, the spatial response is defined by e jkd /kd, where k is the wave number and d is the distance between a loudspeaker and a receiver.Errors in the system are simulated by perturbing the spatial responses, assuming that the error has multiplicative form.In the Monte-Carlo trials, the amplitude errors are drawn from a Gaussian distribution with standard deviation of 3 dB, and the phase errors are drawn from a uniform distribution between -10 • and +10 • .

Diagonal Loading
The solutions WCO (Eq.( 4)) and PMO for assumed special additive errors (Eq.( 12)), derived above have a similar form to common regularization approaches, which employ diagonal loading and add a small parameter to the diagonal of R Q .Since the diagonal loading form is common among many robust methods, we first directly study the relationship between the value of the diagonal loading parameter applied and the average AC performance, using Monte-Carlo simulations [14] with 1000 trials for each frequency sample.The AC evaluation is defined for a single frequency as which is the ratio of the sound energy in the listening zone and the quiet zone, with p L = G L w and p Q = G Q w.The diagonal loading parameter δ Q was varied from 10 −15 to 10 5 at 1000 logarithmically spaced values, and the resulting AC, averaged over all trials for the error conditions described above, is shown in Fig. 3. Diagonal loading regularization on R Q is seen to be effective in improving robustness with a suitable selection of the parameter.A certain pattern of average AC performance under varying δ Q is observed, which is composed of three stages.If the regularization parameter is too small, the AC performance is reduced ("under-robust"); with the regularization parameter in the proper range, the AC performance stabilizes ("robust") and there is a peak that corresponds to the best regularization parameter (denoted by the solid line in Fig. 3); after that there is a sharp drop in AC and the trend eventually becomes stable (with minimum AE), where the performance is very robust but the AC achieved is relatively low ("over-robust").
The AC performance within 1 dB of the peak is also illustrated in Fig. 3 by the contoured dashed lines.The system has different sensitivity to δ Q over different frequency ranges.The "middle" frequencies, approx.740-2600 Hz and 4500-5200 Hz have relatively broad acceptable δ Q range, while at low frequencies under 430 Hz and spe- cial frequencies such as 2900-4100 Hz and 6300-7700 Hz, where grating lobes exist, the suitable δ Q range is quite narrow, leading to sensitive performance.
It can be observed that the optimal δ Q is proportional to 1/k 2 for most frequencies, since the optimal regularization value is related to the interaction between the system geometry and the errors.As equal amplitude multiplicative errors are assumed in the whole frequency range for the simulated system geometry, δ Q largely follows the square of the adopted source radiation gain 1/kd.However, selecting δ Q is more complex in practice, where the source radiation pattern may not be as simple as a point source, and error will also vary with both frequency and space (e.g., due to room reflections).Incorporating error information into control could therefore give a δ Q close to the AC peak.We explore this in the following section through comparison with alternative methods to calculate a diagonal loading parameter.

Estimation of Regularization Parameters
In this section regularization parameter calculation is described for various robust control strategies, including acoustic-modeling based methods and frequencydependent diagonal loading methods; and the resultant performance is compared.The peak AC at each frequency (i.e., the value corresponding to the solid line in Fig. 2) achieved by Monte-Carlo simulation with full knowledge of error is used as a reference denoted as MCS.

Parameter Calculation
The methods under test, and their corresponding parameters, are defined in this section.Three acoustic-modeling based methods were tested: WCO represents WCO substituting the assumed error bounds (a max,ME = √ 2 and a min,ME = √ 2/2) into Eqs.( 13) and ( 14), δ Q = 0.75 R Q /ε, with the scale parameter ε = 10 2 .

PAPERS ROBUST ACOUSTIC CONTRAST CONTROL BY ACOUSTIC MODELING
PMO M represents PMO for assumed multiplicative errors, assuming the special case of uniform error distribution (Eqs.( 8) and ( 9)).Though the error distribution features are different between the actual error and the assumed error, we directly apply the minimum and maximum values of the error gain and phase as a min , a max , φ min , and φ max in Eq. ( 9).So, a max = √ 2, a min = √ 2/2, and the phase parameters take the bound values φ max = 10 • , φ min = −10 • .
PMO A represents PMO for assumed additive errors assuming the special case (Eq.( 12)).The error distribution features are also different between the actual error and the assumed error.As the actual error in the simulation is multiplicative, PMO A here also represents a situation where the error type is wrongly estimated.The maximum value of additive error gain in Eq. ( 12) among all the potential TF variances is a max, AE .To make the estimated additive error set contain the real multiplicative error set, a max,AE = max{G} × a max,ME , where max{G} is the maximum among all the TFs in G, and a max,ME = μ 2 − 2cos(φ max )μ + 1 with μ = 10 3 20 .For the three acoustic-modeling based methods above, the assumed error models are not exactly the same as the actual error model in simulation.Nevertheless, this is indicative of the practical situation when applying error-based regularization.
In addition to these methods, three frequency-dependent regularization methods were tested: SV denotes regularization based on σ max , the maximal singular value of R Q .It was noted in the discussion of Fig. 3 that over-robust performance occurs after δ Q exceeded σ max .Therefore, we set δ Q = σ max /10.This method is also adopted in, e.g., [27].
EL0 and ELM represent the approaches based on AE control.The AE [8] is defined as: where p L,0 = G L w 0 , in which w 0 = [0 0 0 0 0 1 0 0 0 0 0] is the weighting vector for a reference source at the center of the array.EL0 and ELM limit AE towards 0 dB and the minimal value, respectively.EL0 was chosen to match previous studies (e.g., [19]).SV, EL0, and ELM represent the situations where the geometry is known but the error features are not.

Performance Comparison
In Fig. 4 we plot the mean AC and AE performance over frequency (based on Monte-Carlo simulations, with the same error conditions as above), for no regularization (NR), alongside SV, EL0, ELM, and PMO A , with MCS presented as an upper reference.The parameters estimated for WCO and PMO M gave very similar results to PMO A (except that PMO M was slightly better at low frequencies) and are therefore omitted from the figure for clarity.
In Fig. 4 all robust approaches improve performance over NR.Among them, PMO A has the closest performance to the optimal MCS performance, both in terms of AC and AE, over a broad frequency range 100-8000 Hz, with lim- ited AC degradation, mainly at low frequencies.According to the sensitivity concerns discussed in Sec.2.2, we chose three typical frequencies, namely a low frequency (200 Hz), a middle frequency (1000 Hz), and a grating lobe frequency (3538 Hz), to make a detailed comparison between the different regularization methods.The performance of each approach at these frequencies is listed in Table 1.To aid the interpretation of Table 1, the methods are ranked from "under-robust" (left) to "over-robust" (right).Metrics "mean AC" and "min AC" (denoted by AC and ∧ AC, respectively) are the average and minimum AC performance over 10000 Monte-Carlo trials.Metric " AC" is the difference between "mean AC" and "min AC," i.e., the difference between average performance and worst performance over the 10000 trials.The regularization parameter "δ Q " adopted in each method, which is roughly inversely proportional to "AE," is also shown.It can be noted that, among all the methods, PMO A achieved the best performance for both AC and ∧ AC and has the median " AC" value among all the methods.Comparing δ Q values with reference to Fig. 3, PMO A nearly reaches the optimal regularization parameter, while EL0 is a little "under-robust" and SV is a little "overrobust."Since "over-robust," SV and ELM have smaller " AC" than PMO A , with correspondingly weaker AC and ∧ AC performance.This emphasizes the difficulty in achieving optimal robust control directly from a certain AE limit.The reproduced sound fields using EL0, PMO A , and SV with normalized loudspeaker weights at 1000 Hz are illustrated in Fig. 5.These represent a visualization of a random trial from the 10000 Monte-Carlo simulations.EL0 (representing "under-robust") maintains a deep quiet zone yet does not ensure that the main lobe is transmitted directly to the required direction.On the other hand, SV and PMO A create shallower quiet zones, yet they manipulate the array to emit a relatively sharp main lobe towards the listening zone.While SV (representing "over-robust") is affected by energy lobes across the quiet zone, PMO A still creates sufficient cancellation and is a good compromise between quiet zone cancellation and efficient listening zone focusing under the error conditions.

Summary
Acoustic-modeling based solutions WCO, PMO A , and PMO M (representing WCO and PMO with assumed special error models) are compared with AE control and singular-value-based regularization by simulations.Since PMO A and WCO share a diagonal loading form with regularization methods, we first observed the robust performance by varying the regularization parameter under a certain error through Monte-Carlo simulations.The pattern shows the system's sensitivity to the regularization parameter and that good parameter estimation leads to robust AC and AE performance.By incorporating coarse error information in addition to geometry information, PMO and WCO derived diagonal loading parameters with clear physical meaning and reduced computational complexity compared to the AE-based method.

EXPERIMENTAL VALIDATION
Experimental results are presented in this section to validate the simulation results in a real-world reproduction system and investigate the feasibility of using acousticmodeling based robust control to reduce the need for in-situ measurements.
The experimental reproduction and performance measurement system is introduced in Sec.3.1.Three TF models and error estimation parameters for populating various filters are elaborated in Sec.3.2, and the error conditions are described.Finally, three main observations are made from the experimental performance measurements and are presented in Secs.3.3-3.5.Observation 1 concerns the comparison of different robust strategies against a certain error; Observation 2 explores the potential of using acoustic-modeling based robust control to compensate TF mismatches arising from reduced in-situ measurements; Observation 3 further compares the solution with reduced in-situ measurements and the solution with full in-situ measurements with different levels of error.

Experimental Setup
For comparison with the simulation study, the same loudspeaker array and zone geometry was adopted in the experiments (as shown in Fig. 2).The 11 loudspeakers (Genelec 8020b) were mounted on an arc truncated from a 60-element circular array (see [6]).The 48 monitor microphones (Countryman B3 omni) were assembled in a bicircular array and used to measure each zone at the listening plane.The "playrec" utility in Matlab was used to play and record sound with the loudspeaker and microphone arrays.A multichannel soundcard (MOTU PCIe 424) served as the analog to digital interface, and the microphone inputs were passed through a pre-amplifier (PreSonus Digimax D8).Level differences between microphones were compensated through calibration.The whole system was placed in an acoustically treated recording studio [6] with size 6.55 × To achieve the goal of rendering different audio content to two listeners, one in each sound zone, two target situations were measured: Target A -the left zone as the quiet zone and the right zone as the listening zone; Target B -the left zone as the listening zone and the right zone as the quiet zone.
Though a symmetric geometry setup was employed, the performance was not expected to be identically symmetric, being influenced by the asymmetric room effect, source position mismatches, source sensitivity inconsistencies, and so on.

Transfer Function Modeling and Filter Calculation with Error Estimation
The measurement and pre-processing of TFs between loudspeakers and zones are introduced in this section.The errors manually introduced in realization and the filter calculation are also described.

Transfer Function Measurements
Room impulse responses (RIRs) between each microphone position and each loudspeaker were measured using the maximum length sequence approach (15th order at 48 kHz).Cropping was applied to the measured RIRs to avoid the presence of unnecessary room effects in the TFs.The RIR from the center loudspeaker to the reference microphone ( , Fig. 2) in the listening zone is shown in Fig. 6, (a) without cropping, (b) cropped at 20 ms after the impulse onset, and (c) cropped at 3 ms after the impulse onset.The cropping envelopes, which include a raised cosine ramp at 15 ms duration after the crop time, are also shown.The unprocessed RIR (Fig. 6(a)) contains the direct sound, reflections, and reverberant tail, illustrating the complexity of the TFs in this system realization, whereas the 20 ms cropped RIR (Fig. 6(b)) retains the direct sound and early reflections, and the 3 ms cropped RIR (Fig. 6(c)) retains the direct sound and the first early reflections.
Three TF models, implying different levels of in-situ measurement, are adopted in the loudspeaker weight calculation and are compared in Sec.3.4 and Sec.3.5.
RIR: All the TFs (between each loudspeaker and 48 control points in each zone) used for the filter design are measured in-situ and cropped at 20 ms (i.e., as in Fig. 6(b)).
RIR-PS: Only TFs between each loudspeaker and the two reference microphones are measured and cropped at 3 ms (i.e., as in Fig. 6(c Compared with PS, RIR-PS contains the basic source equalization and partial loudspeaker directivity information.Compared with RIR, RIR-PS lacks detailed source directivity, source inconsistency, and room information but requires less measurements to be made.RIR-PS therefore represents the case where limited measurements are included in the source model.This approach can be compared to previous work that has used either full TF measurement (e.g., [6]) or purely a point-source model (e.g., [28]).
In practice, pseudo-anechoic responses from loudspeakers towards the reference microphone, similar to RIR-PS could potentially be obtained from the manufacturer.Furthermore, instead of the simple point source model, a source directivity model (e.g., [24]) could be measured and applied into the interpolation for better source modeling.

Errors in Realization
Two errors are considered in the experiments:

1) Inconsistency error over the loudspeaker array
We investigate perturbation errors by artificially introducing source inconsistencies in our experiments.Three sets of multiplicative gains for the loudspeaker responses were applied to explore the performance under this systematic error.The gains for each loudspeaker channel were randomly generated under error bounds ±0.1, ±0.5, and ±1.0 and are shown in Fig. 7.Note that the gain sets are not symmetric, contributing to performance differences between the Target A and Target B cases.

2) Mismatch error in TF modeling
Reducing in-situ TF measurements can potentially reduce the amount of time taken to realize a sound zone system.However, mismatches might arise between the modeled TFs for filter design and the real TFs in reproduction.
The artificially applied inconsistency error is adopted as the main error in Observation 1 (Sec.3.3).The mismatch error is considered as the main error in Observation 2 (Sec.3.4).Both errors are considered together in Observation 3 (Sec.3.5).
The quick estimation (e.g., Sec.2.3) was again applied to obtain the control parameters for WCO, PMO A , and PMO M in experiments.With the three types of determined model (RIR, RIR-PS or PS) and the two levels of error estimation (evenly-distributed multiplicative error within ±3 dB or ±1 dB), six sets of WCO/PMO A /PMO M filters were prepared.Due to the similarity in measured AC performance between WCO, PMO A , and PMO M , which was also observed in simulations, WCO is omitted from the figures in the following observations for clarity.

Observation 1: Robust Strategies under Loudspeaker Inconsistency Error
This experimental observation corresponds to the simulation study in Sec.2.3.2.TFs RIR were used for filter calculation.Errors were assumed to be uniform multiplicative errors with ±1 dB error bounds and the set of loudspeaker gain inconsistencies with error bound 0.5 was applied.The methods under test were SV, EL0, PMO A , PMO M , and WCO.In the simulations in Sec.2.3.2, the average/minimum AC performance was evaluated by Monte-Carlo simulation while the experimental observations are analogous to a single case study.Following Fig. 4, Fig. 8 shows the comparison of the five robust strategies.Among all the strategies, NR and EL0 were not effective at most frequencies; SV acted close to PMO A ; PMO A and PMO M performed well at most frequencies and had an obvious advantage at low frequencies for both Target A and Target B. PMO M and PMO A gave AC at or above 15 dB over 300-2800 Hz, and above 10 dB over 125-3000 Hz (for both targets), maintaining acceptable zone separation [29] in these ranges.It can be observed that even though a symmetric geometry was defined, the asymmetric errors led to differences in the performance for Target A and Target B.
Following Table 1, Table 2 shows AC and AE for Target A and Target B at three typical frequencies according to the robust performance pattern (cf., Fig. 3).At 200 Hz and 1000 Hz, PMO M achieved the best AC and also low AE; PMO A and SV both performed much better than EL0 and NR.However, at the grating lobe frequency 3538 Hz, EL0 performs slightly better than the others.However, no method is effective for delivering sound zones at this frequency.
From the case study conducted with the measured RIRs against loudspeaker inconsistency errors generated from a partially known error set, PMO A and PMO M (representing acoustic-modeling based robust ACC) gave competitive performance.

Observation 2: Reducing in-situ Measurement
In this observation the three TF models, RIR, RIR-PS, and PS (introduced in Sec.3.2), were individually adopted for filter calculation and their performance (with no additional inconsistency errors) is compared.Compared with RIR, RIR-PS, and PS lack accurate information about the TFs, which introduces a mismatch.We assume that the differences between RIR and RIR-PS (or PS) can be described by uniform multiplicative error within ±1 dB.The solutions found by applying PMO M to the RIR-PS and PS TFs (denoted RIR-PS-PMO M and PS-PMO M ), are compared with the case of fully measured TFs with maximum-singularvalue dependent regularization (RIR-SV).The AC performance and the listening-zone-averaged frequency response (FR) for these three cases, reproduced for Target A, are shown in Fig. 9.
Considering AC, RIR-SV performed better than RIR-PS-PMO M and PS-PMO M , except for some low frequen-cies where it did not work well.However, the degradations between RIR-SV and RIR-PS-PMO M /PS-PMO M are only 1.12/1.39dB respectively, averaged over 100-3000 Hz.
The FR curves here are normalized by their average value over 100-8000 Hz, and (spatially) averaged over the 48 microphones in the listening zone.The flatness of FR is related to the reproduced sound quality.It can be quantified by measuring the frequency response variance, which is defined as where N f is the number of linear frequency samples, p i is the spatially averaged listening-zone sound pressure at the ith frequency sample, and p is the sound pressure averaged over all the frequency samples.The FRV metric is included as informal listening revealed that non-flat frequency responses for some non-robust filters led to sound quality degradations.The FRVs of RIR-SV, RIR-PS-PMO M, and PS-PMO M are respectively 1.22, 1.65, and 2.56 dB.The flatness of RIR-PS-PMO M is between that of RIR-SV and PS-PMO M because it contains information about the loudspeaker equalization.Taking one path measurement into account, as in RIR-PS, can therefore be expected to improve the reproduced sound quality.The hybrid solution, using acoustic-modeling based robust control to reduce the need for in-situ measurement, could achieve satisfactory reproduction performance with a single measurement per loudspeaker, per zone.The RIR-PS-PMO M method achieved an AC performance above 15 dB over 320-3000 Hz (except a small drop at 800 Hz), and a close FRV performance with RIR-SV, especially over 250-1600 Hz, despite incorporating a simple acoustic model (point-source interpolation and quick error estimation).
Better acoustic modeling may further improve the AC and FRV performance for the reduced-measurement case RIR-PS-PMO M , such as modeling the source radiation pattern.This should, for example, improve the match in FR between the RIR-SV and the reduced-measurement cases above 1600 Hz.

Observation 3: Robustness against Different Levels of Inconsistency Error
Two practical robustness problems are investigated further in this section: regularized filter performance with various levels of error and the sensitivity of the acoustic-modeling based methods to wrongly-estimated error bounds.Three different levels of loudspeaker gain inconsistency error (Sec.3.2.2,Fig. 7) were tested.The AC performance of RIR-SV (upper) and RIR-PS-PMO M (with ±1 dB error bound estimation, lower), averaged across Target A and Target B, is shown in Fig. 10.The corresponding AC performance at three typical frequencies is listed in Table 3, which also includes RIR-PS-PMO M with ±3 dB error bound estimation.As the loudspeaker gain error increased, the overall performance of both RIR-SV and RIR-PS-PMO M dropped.However, the AC performance of both methods was above 10 dB (over 300-3000 Hz)  for all three inconsistency levels.From Table .3, RIR-PS-PMO M performed slightly worse than RIR-SV at 1000 Hz but slightly better at 200 Hz.At the grating lobe frequency (3538 Hz) neither approach achieved good AC, though RIR-SV was better.Overall, the reduced measurement case RIR-PS-PMO M remained competitive with RIR-SV as the reproduced error increased.Furthermore, the two RIR-PS-PMO M filters with different assumed error bounds (±1 dB or ±3 dB) performed closely.That is, the control parameters selected by errorbased methods (e.g., PMO M ) were quite robust to wrongly estimated error bounds, and the actual error in reproduction had a larger effect than the estimated error bound under the system we tested.For example, at 1000 Hz the difference in the inconsistency error bounds (±0.1 and ±0.5) led to a difference of more than 1.1 dB in AC performance, while the difference in estimated error bound (±1 dB ≈±0.1 and ±3 dB ≈±0.4) led to a performance difference within 0.2 dB.This experimental observation is in accordance with our previous simulations on error estimation mismatch in [23].

DISCUSSION
Overall, incorporating increasing amounts of prior knowledge into the system optimization allows for more robust control.With prior knowledge of the TFs, which encapsulate the system geometry and acoustic environment, we can aim to find a frequency-dependent regularization parameter.For some kinds of error, AE-based methods might be appropriate, however, the criterion for setting the AE constraint is not very clear and might make the performance "under-robust" or "over-robust" in reality.From Table 1, for example, a small difference in AE results in a quite different δ Q .The singular-value based method (SV) also requires a threshold to be selected.
Additionally, limited knowledge about a potential error model can be incorporated by focusing on the bounds of the error, rather than the detailed error distribution (e.g., PMO A , PMO M , WCO).In simulation, the "quick" robust methods obtained good AC and AE performance, with reduced computational complexity compared to a Monte-Carlo simulation-based reference method [14].More accurate acoustic modeling with WCO and PMO, might give better performance than the quick methods.However, from Figs. 3 and 4, it seems that any improvements might not be very remarkable compared with the increasing difficulty in error model estimation and complexity in calculation.In this case, the proposed "quick" methods offer a compelling means to calculate the regularization term.Furthermore, the proposed methods directly use physical acoustic modeling with a clear optimization goal (WCO or PMO) to choose the regularization parameter.
In practice, the error bound information required in the "quick" approaches can be obtained from both measurement and simulation, depending on the application scenario.For example, the bound value for the loudspeaker inconsistency in Observation 1 could be estimated by measuring a set of loudspeakers of the same model or directly provided by the loudspeaker manufacturer.Other error bounds could be obtained by simulation, for example by applying position mismatches.
With a full error analysis or extensive measurements, there is potential to increase the robustness of AC performance, mainly at low frequencies.However, the maximal performance loss (at low frequencies) resulting from "quick" parameter estimation was only 1.5 dB for simulations of the application geometry studied in our paper.
As an application of acoustic-modeling based robust control, we proposed a novel method to narrow the gap between no measurement and full measurement of TFs as input to the optimization.We measure a single TF per loudspeaker in each zone, interpolate the sound pressure to other microphone positions, and compensate the mismatch by acousticmodeling based robust control.In this way the TF measurement was reduced from 11 × 2 × 48 RIRs to 11 × 2 RIRs.Experimental results revealed a trade-off between reducing measurements, which saves time and simplifies PAPERS ROBUST ACOUSTIC CONTRAST CONTROL BY ACOUSTIC MODELING the equipment requirements, and the limited sacrifice in performance.Opportunities remain to improve the accuracy of the acoustic modeling by considering loudspeaker directivity (that might be obtained directly from the manufacturer), the room effect, and more detailed estimation of the individual sources of error.

CONCLUSION AND FUTURE WORK
In summary we have proposed a new framework for robust sound zone reproduction design, which uses acoustic modeling to derive the entries for calculating robust control parameters.We formulated and implemented robust ACC based on WCO and PMO for multiplicative and additive error as examples of the proposed framework.We illustrated the performance improvements and comparisons by both simulation and experimental studies and demonstrated the effectiveness of acoustic-modeling based robust ACC, with "quick" error estimation.In experimental observations, incorporation of acoustic modeling gave comparable performance to the state-of-the-art methods for regularization parameter selection in robust filter design, and enabled the number of in-situ measurements to be reduced.
Future work should conduct subjective evaluation of the reproduced audio through various robust strategies, investigate the effects of different loudspeaker directivities on the robust optimization, introduce room modeling techniques alongside source radiation analysis to make better acoustic modeling of both the determined part and stochastic part for reflective room reproduction, and explore the accuracy needed in acoustic modeling for robust ACC to further reduce in-situ measurements.

Fig. 2 .
Fig. 2. Simulation geometry with 11 loudspeakers oriented towards the origin in an arc array and 48 monitor microphones (reference microphone denoted by star, other microphones denoted by dot) in each zone.The listening zone and the quiet zone are symmetric with respect to the loudspeaker array.

Fig. 3 .
Fig. 3. Simulated AC performance with varying regularization parameter δ Q on R Q , by Monte-Carlo simulation over 1000 trials for each frequency sample between 100-8000 Hz.Solid line denotes the peak value of the mean AC performance and dashed contours show peak AC -1 dB.

Fig. 5 .
Fig. 5. Simulated sound pressure level of ACC sound field at 1 kHz with EL0 (AE limited to 0 dB), PMO A (PMO for additive error with quick parameter estimation), and SV (maximum singular-value-based regularization).The illustrated sound field corresponds to a single Monte-Carlo simulation trial.

Fig. 6 .
Fig. 6.Measured RIR: (a) unprocessed; (b) cropped 20 ms; (c) cropped 3 ms.The length shape of the cropping windows, including the crop limit and a 15 ms raised-cosine taper, are illustrated by the dashed lines.The maximal amplitude value is 0.30.
)).The TFs to the other microphone positions are obtained by interpolation through the point source model, H m,l = H ref,l × e − jk(d m,l −d ref,l ) × d ref,l /d m,l , where d ref, l and d m,l are the distances from the lth loudspeaker to the reference and mth monitor microphones, respectively.d ref, l /d m,l and (d m,l − d ref, l )/c are assumed to be the gain and delay differences between the TF from the lth loudspeaker to the mth control point (H m,l ) and that to the reference control point (H ref, l ) in a zone, and c is the speed of sound.PS: All the TFs are calculated by the point source model e − jkd m,l /d m,l , without any measurement.The distances (d ref, l and d m,l ) used in RIR-PS and PS are those according to the pre-defined geometry (in Sec.3.1) rather than experimentally measured distances.

Fig. 8 .
Fig.8.Measured AC performance for Target A and Target B with NR (no regularization), SV (maximum singular-value-based regularization), EL0 (AE limited to 0 dB), PMO A (PMO for additive error with quick parameter estimation), and PMO M (PMO for multiplicative error with quick parameter estimation), with loudspeaker gain inconsistency error bound of 0.5 and RIR TFs.
PAPERS ROBUST ACOUSTIC CONTRAST CONTROL BY ACOUSTIC MODELING

Fig. 9 .
Fig. 9. Measured AC performance of RIR-SV (full in-situ measurement with SV regularization), RIR-PS-PMO M (single channel in-situ measurement with PMO M ), and PS-PMO M (point source model with PMO M ).AC and normalized listening zone FR are shown for Target A (without artificial gain errors applied).

Fig. 10 .
Fig. 10.Robustness against artificial gain inconsistencies with multiplicative error bounds of 0.1, 0.5, 1.0.Mean AC (Target A and Target B) is shown for RIR-SV and RIR-PS-PMO M with 1 dB error bound estimation.

Table 1
and Fig. 4 also demonstrate that very close AE values can give very different AC performance, for example comparing SV and PMO A at 1000 Hz.ZHU ET AL.PAPERS

Table 1 .
Simulated performance of various robust strategies at 200, 1000, and 3538 Hz.The mean/minimal AC (AC/∧ AC) and their difference ( AC) over 10000 Monte-Carlo trials, array effort (AE), and regularization parameter (δ Q ) are shown.The methods are ordered in terms of their performance, from "under-robust" (left) to "over-robust" (right).

Table 2 .
Measured AC and AE of various robust strategies, averaged over Target A and Target B, with loudspeaker inconsistency gain with multiplicative error bound of 0.5 using RIR TFs.

Table 3 .
Measured RIR-SV and RIR-PS-PMO M performance (AC averaged over Target A and Target B) for loudspeaker gain inconsistency with multiplicative error bounds of 0.1, 0.5, and 1.0.PMO M 1 dB, PMO M 3 dB denote PMO M with 1 dB or 3 dB error bound estimation. PAPERS