Home > Publication > Ph.D. Dissertation  
Title HRTF Customization Technique to Implement in Virtual Audio
Author Shin, Ki Hoon
Type KAIST Ph.D. Dissertation
Year of Pub. 2008
File 신기훈박사논문
The ability of humans to use sonic cues to estimate the spatial location of a sound source is of great practical and research importance. Although considerable efforts have been made over the past century to understand the basic mechanism of the human auditory system, the role of each pinna cavity on auditory localization remains unclear to this day. Recent advances in computational power and acoustic measurement techniques have made it possible to empirically measure, analyze, and synthesize the spectral cues that influence spatial hearing. However, the current 3-D sound technologies are unable to compensate for the difference in individual spectral cues to render a fully immersive sound field, and development of an effective method for HRTF (head-related transfer function) customization still remains the biggest and open problem for virtual audio synthesis.
Besides changes in source direction, HRTFs are also known to depend on source distance in case the source lies in close proximity to the head. However, most HRTF databases accessible from the Web up to the present day contain HRTFs for distant sources only due to several technical challenges involved in the measurement of HRTFs for nearby sources like equalization of increasing interference from the loudspeaker as it is positioned closer to the ears. In this study, HRTFs for nearby sources (located within 1 m from the head center) were measured using a B&K HATS (Head And Torso Simulator) dummy head microphone system and a special acoustic point source in order to construct a database and analyze their characteristics influenced by the direction as well as the distance of the source with respect to the head center.
To provide more realistic and immersive audio experiences when listening to music or viewing live airwave TVs using PMPs (portable media players) such as MP3 or DMB (digital multimedia broadcasting) players, most high quality PMPs are available with 3-D sound rendering capability based on HRTFs despite limited memory space. Empirical HRTFs usually have 256 or 512 values to be stored per each ear for simulation of a single source direction and require a memory space far too excessive for most PMPs even when simulating sounds on the horizontal plane. A number of truncation techniques that can model these empirical HRTFs using only a few parameters are already available. This study explores some of the conventional HRTF modeling techniques and suggests a different modeling technique known as the minimum realization method that can decompose an HRTF into a more stable set of parameters with enhanced modeling accuracy.
Human’s ability to perceive elevation of a sound and distinguish whether a sound is coming from the front or rear strongly depends on the monoaural spectral features of the pinnae. For an effective virtual auditory display, this study proposes a novel HRTF customization method that can be based on any individual HRTF library of substantial size. By extracting the pinna responses from the median HRIRs (head-related impulse responses) of 45 individuals in the CIPIC HRTF database of UC Davis and modeling them by PCA (principal components analysis) in the time domain, the pinna responses could be decomposed into four or five basic temporal shapes (basis functions) per elevation. By tuning the weight on each basis function computed for a specific elevation angle to replace the pinna response in the KEMAR (Knowles Electronic Manikin for Acoustic Research) HRIR measured at the same angle with a linear combination of the tuned basis functions and listening to the filtered stimulus over headphones, four individuals with normal hearing sensitivity were able to create a set of customized HRIRs that outperformed the KEMAR HRIRs in producing vertical effects with reduced front/back ambiguity in the median plane. Since the monoaural spectral features of the pinnae are almost independent of source azimuth when source elevation with respect to the ear canal is kept fixed, similar vertical effects could also be generated in sagittal planes simply by varying the ITD (interaural time difference) according to the direction as well as the size of each individual’s own head.
The minimum realization method can be applied to model the shoulder/torso response of the KEMAR so that the above procedure to obtain customized pinna responses can be handled with less memory in real applications. The proposed methods for HRTF modeling and customization can be implemented in countless entertainment and educational applications such as home theatres, PC games, aviation simulators, and mobile devices including cellular phones and MP3/DMB players with limited memory space.