We recently presented our first version of a head-mounted microphone array in this paper:
J. Ahrens, H. Helmholz, D. L. Alon, S. V. Amengual Garí, “Spherical Harmonic Decomposition of a Sound Field Based on Microphones Around the Circumference of a Human Head,” in Proc. of IEEE WASPAA, New Platz, NY, USA, Oct. 2021 [ pdf ]
It produces an ambisonic representation of the captured sound field, which may be particularly interesting when binaural rendering is targeted because it allows for headtracking about all three rotation axes. One option would be to integrate it into an augmented reality headset. Imagine combining this with a panoramic video recorded from outward-facing cameras on that same device!
A human head is, of course, not the only conceivable object to mount the microphone array on. The method can deal with everything that is kind-of round. Panoramic camera arrays of any form factor may be just as interesting of an application.
We proudly announce that the Journal of the Acoustical Society has just published our proposition of the equatorial microphone array (EMA): DOI: 10.1121/10.0005754 [ pdf ]
The EMA is essentially a spherical microphone array (SMA) but with microphones only along the equator. Our JASA article demonstrates how a spherical harmonic decomposition can be obtained from the EMA signals. We have focused on binaural rendering of the signals so far.
The main advantage of the EMA is that it requires way fewer microphones than SMA for the same spherical harmonic order. Here’s an 8th-order SMA on the left image that employs 110 microphones on a Lebedev grid (you can get away with 81 microphones on a quasi uniform grid, too) and an 8th-order EMA with 17 microphones on the right:
The EMA clearly has a few limitations. Particularly, it cannot preserve monoaural elevation cues as it can in principle not capture elevation information. Elevated sound incidence will cause a distortion of the magnitude of the binaural output signals by a few dB. Interestingly, the EMA can actually preserve ILD and ITD even for elevated sources. It was shown in different experiments that this typically also leads to the perception of elevation. Otherwise, there are no limitations for the EMA. Close and far sources and the like work just as well as with SMAs.
Last but not least: We are not aware that it has been proven that SMAs are actually capable of preserving monaural elevation information. This not surprising given that spatial aliasing typically occurs in the frequency range where most monaural HRTF elevation cues are located. It could therefore be that even SMAs are not really capable of overcoming the limitations of EMAs. Be sure that we will follow up on this. We’re currently building a prototype.
Thanks to Facebook Reality Labs Research for funding our work on the EMA!
The sources are mostly loudspeakers, the human voice, and musical instruments that are used in classical orchestras. Here’s an example set of balloon plots of the directivity of a loudspeaker:
Most directivity data that are available are incomplete in one way or another. For example, a given directivity might only be available at a limited discrete set of frequencies or along limited contours like circles or the like. This technically prevents the popular spherical harmonic (SH) representations to be computed from the data.
We describe in the article
J. Ahrens and S. Bilbao, “Computation of Spherical Harmonic Representations of Source Directivity Based on the Finite Distance Signature,” IEEE Transactions on Audio, Speech and Language Processing 29, pp. 83-92, 2021 [ pdf ]
how such SH representations can actually be obtained by means of interpolation of the magnitude and manual fitting of a suitable phase. The result are what we term complete directivities, i.e., directivites that are valid at every frequency, angle, and distance. Actually, we even store the directivities in time domain, which is usually not possible with incomplete representations.
A large part of our database bases on measurement data that are publicly available. We also have a considerable data that you cannot find for download anywhere else.
Abstract: This thesis investigates the physical and perceptual properties of selected methods for (Local) Sound Field Synthesis ((L)SFS). First, the mathematical foundations of the approaches are discussed. Special attention is drawn to the implementation of the methods in the discrete-time domain as a consequence of digital signal processing. The influence of their parametrisation on the properties of the synthesised sound field is examined on a qualitative level. A geometric model is developed to predict spatial aliasing artefacts caused by the spatial discretisation of the deployed loudspeaker arrays. In agreement with numerical sound field simulations, the geometric model shows an increase of synthesis accuracy for LSFS compared to conventional SFS approaches. However, the difference in accuracy gets smaller, the closer the listener is located to the active loudspeakers. With the help of binaural synthesis, the different (L)SFS approaches are assessed within listening experiments targeting their spatial and timbral fidelity. The results show that LSFS performs at least as good as conventional methods for azimuthal sound source localisation. A significant increase of timbral fidelity is achieved with distinct parametrisations of the LSFS approaches.
Abstract: Filters with constant phase shift in conjunction with +3/6 dB amplitude slope per octave frequently occur in sound field synthesis and sound reinforcement applications. These ideal filters, known as (half) differentiators, exhibit zero group delay and +45/90 degree phase shift. It is well known that certain group delay distortions in electro-acoustic systems are audible for trained listeners and critical audio stimuli, such as transient, impulse-like and square wave signals. It is of interest if linear distortion by a constant phase shift is audible as well. For that, we conducted a series of ABX listening tests, diotically presenting non-phase shifted references against their treatments with different phase shifts. The experiments revealed that for the critical square waves, this can be clearly detected, which generally depends on the amount of constant phase. Here, -90 degree (Hilbert transform) is comparably easier to detect than other phase shifts. For castanets, lowpass filtered pink-noise and percussion the detection rate tends to guessing for most listeners, although trained listeners were able to discriminate treatments in the first two cases based on changed pitch, attack and roughness cues. Our results motivate to apply constant phase shift filters to ensure that also the most critical signals are technically reproduced as best as possible. In the paper, we furthermore give analytical expressions for discrete-time infinite impulse response of an arbitrary constant phase shifter
and for practical filter design.
Winter, F.; Schultz S.; Firtha G.; Spors, S. (2019), “A Geometric Model for Prediction of Spatial Aliasing in 2.5D Sound Field Synthesis,” In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6
Abstract: The avoidance of spatial aliasing is a major challenge in the practical implementation of Sound Field Synthesis. Such methods aim at a physically accurate reconstruction of a desired sound field inside a target region using a finite ensemble of loudspeakers. In the past, different theoretical treatises of the inherent spatial sampling process led to anti-aliasing criteria for simple loudspeaker array arrangements, e.g. lines and circles, and fundamental sound fields, e.g. plane and spherical waves. Many criteria were independent of the listener’s position inside the target region. Within this article, a geometrical framework based on ray-approximation of the underlying synthesis problem is proposed. Unlike former approaches, this model predicts spatial aliasing artefacts for arbitrary convex loudspeaker arrays and as a function of the listening position and the desired sound field. Anti-aliasing criteria for distinct listening positions and extended listening areas are formulated based on the established predictions. For validation, the model is applied to different analytical Sound Field Synthesis approaches: The predicted spatial structure of the spatial aliasing agrees with numerical simulation of the synthesised sound fields. Moreover, it is shown within this framework, that the active prioritisation of a control region using so-called Local Sound Field Synthesis approaches does indeed reduce spatial aliasing artefacts. For the scenario under investigation, a method for Local Wave Field Synthesis achieves an artefact-free synthesis up to a frequency which is between 2.9 and 17.3 times as high as for conventional Wave Field Synthesis.
Science relies on the traceability and replicability of studies. Aiming for sustainability, this is important for the authors itself as well as for the research community once results become published. Ideally the entire research process, from initial concepts till publication, shall be performed under the open science paradigm. Recent efforts in the open source software community led to convenient tools for research data management. Nowadays, it is almost self-evident that researchers are engaged in the professional typesetting process using the mature LaTeX front-end with graphical packages like TikZ. Furthermore, version-control systems such as Git are probably used by a large part of the community for open and closed source projects. Besides that, the open source programming language Python and its various open tools for code development gradually become predominant, supporting the open science paradigm. Jupyter notebooks together rapidly gain importance in the workflow for prototyping, documentation and education. Documentation tools like ”Sphinx” and free hosting platforms like ”Read the Docs” are emerging front-ends that allow versioned technical documentation with hyperlinks. In this contribution we discuss and demonstrate a current reliable workflow for open science/research starting from puzzling ideas to publishing results.
Sound Field Synthesis (SFS) aims at production of wave fronts within a large target region enveloped by a massive number of loudspeakers. Nowadays, these techniques are known as Wave Field Synthesis (WFS) as an implicit solution of the SFS problem and as explicit solutions, like Ambisonics in the spherical domain and Spectral Division Method in the cartesian domain. Research and development on Ambisonics and WFS proceeded since the 1970s and the late 1980s, being most lively in the last decade due to DSP power available. This resulted in many SFS systems at research institutes with different rendering methods, thus complicating comparability and reproducibility. In order to pool the outcomes of different SFS approaches the Matlab/Octave based Sound Field Synthesis Toolbox was initiated 2010 as an open source project by the authors. This toolbox was later accompanied by online theoretical documentation giving an overview on the SFS approaches and citing the reference literature. In 2013 porting of the SFS Toolbox to Python was initiated, serving as convenient framework together with Jupyter notebooks. In this contribution we discuss and demonstrate the concepts, workflows and capabilities of the SFS Toolbox and their documentation as fundamental component for open research on SFS.
On the 45th Annual German Acoustic Conference (DAGA) we presented further thoughts on the links between NFC-HOA and WFS. See the accompanying github repository for the manuscript, slides and extended calculus. Schultz, F; Firtha, G.; Winter, F.; Spors, S. (2019): “On the Connections of High-Frequency Approximated Ambisonics and Wave Field Synthesis.” In: Proc. of the 45th DAGA, Rostock, p. 1446-1449.