Home > Publication > Ph.D. Dissertation  
Title Modeling, Customization, and Interpolation of Head-Related Impulse Responses based on Principal Components Analysis
Author Hwang, Sungmok
Type KAIST Ph.D. Dissertation
Year of Pub. 2009
File PhD_Hwang.pdf
Virtual Auditory Display (VAD), which is defined as systems or technologies generating spatialized sounds and conveying them to a listener, has been paid much attention in many application fields. The Head-Related Transfer Functions (HRTFs), which describe the physical transform of sound waves due to physical structures of a listener, such as head, pinna, shoulder, and torso, play an important role for rendering of high-fidelity VAD. Recently, three issues, i.e. modeling, customization, and interpolation of HRTFs, are coming into the spotlight for high-fidelity VAD and its real-time implementation.
This thesis deals with the three aforementioned issues based on general basis functions. The general basis functions are obtained from Principal Components Analysis (PCA) of HRTFs or Head-Related Impulse Response (HRIR) which is the Fourier Transform pair of the HRTF. The main advantage of using the general basis functions is that the dimension of dataset can be effectively reduced without loss of meaningful information. Four kinds of PCA models based on HRIRs, complex-valued HRTFs, augmented HRTFs, and log-magnitudes of HRTFs are investigated and their modeling performances are compared. In terms of the number of Principal Components (PCs) needed to model the HRTFs or HRIRs with the same accuracy, all the PCA models show almost the same modeling performances. The systematic elevation dependencies in the weights of PCs (PCWs) are observed. The physical meaning of PC and the elevation dependencies of PCWs are explained in detail. In addition, the contribution of each PC to the vertical perception or the front-back discrimination is clarified. The PCs obtained in this thesis includes both the inter-elevation variation and the inter-subject variation, and the degree of contribution to each of these variations is also investigated. It is verified through a numerical error analysis and a series of subjective listening tests that the PCs obtained from PCA of the CIPIC HRTF database can be general basis functions to model arbitrary subject’s HRTFs or HRIRs.
This thesis also deals with the HRIR customization for synthesizing stationary sounds by letting a subject tune the weight on each PC at each static source position. However, tuning many PCWs is very exhausting and time-consuming task, thus the number of tuning PCWs are reduced based on the order of magnitude of inter-subject variation in PCW at each elevation. To verify the feasibility of the proposed method, the customization is carried out by three subjects. At each elevation, the only 3 PCWs are tuned by the subjects and the rest of PCWs are chosen to be just mean values of all subjects in the CIPIC HRTF database. In the subjective listening test results, it is found that there is no statistically significant difference in localization errors between the individual and customized HRIRs, whereas statistically significant difference is observed between the individual and Kemar HRIRs.
A simple but effective HRIR interpolation method based on the general basis functions is proposed. PCW of each subject can be decomposed into the common elevation dependency of PCW across all subjects and the inter-subject variation in PCW. In this thesis, the only inter-subject variation in PCW is modeled as a simple linear function of elevation. From a quantitative error analysis, it is found that the proposed method provides more accurate performance for the HRIR interpolation than the conventional linear and spline methods, and the enhancement of performance is more prominent with larger angular span, i.e. lower spatial resolution.
A novel HRIR customization method for synthesizing both stationary and moving sounds is proposed based on subjective tuning of the inter-subject variations. The entire median-plane HRIRs in the upper hemisphere can be customized by subjective tuning of three parameters at three static positions, 0°, 70°, and 180° of elevation. In other words, one can obtain the customized median-plane HRIRs by tuning of just nine parameters. From a series of subjective listening tests, it is validated that the proposed method can provide effective sound cues for synthesizing both stationary and moving sounds, and the localization performance with the customized HRIRs is significantly better than that with the non-individual (Kemar) HRIRs.