University of Surrey

Test tubes in the lab Research in the ATI Dance Research

Separation of sound sources : a machine audition perspective.

Litwic, Lukasz (2015) Separation of sound sources : a machine audition perspective. Doctoral thesis, University of Surrey.

thesis_Source_Separation_Systems_Litwic.pdf - Thesis (version of record)
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (6MB) | Preview
Author_Deposit_Agreement_Litwic.pdf - Thesis (version of record)
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (165kB) | Preview


Speech separation by machines has been extensively studied for many decades and several algorithms and systems have been proposed. Since the speech separation task for machines is often likened to the speech separation task performed (remarkably well) by the human auditory system several analogies can be found in the proposed systems. This thesis takes a localised view on a few of the aspects of the speech separation task and explores some of the analogies from a machine audition perspective. The first part of the thesis presents algorithms for binaural localisation and separation of speech sources based solely on analysis of the Interaural Phase Difference (IPD) cue. The IPD cue encodes time delay information between two microphones which can be used to establish spatial locations of the sources in the mixture. One well known problem with processing the IPD cue is its periodic nature. This means that a single IPD value can represent several spatial locations of the corresponding source. The phase ambiguity problem has been studied for human auditory processing as well for machines, however, mostly from source localisation perspective. Relatively little attention has been given to phase ambiguity which relates to interaction of the IPDs between the sources present in the mixture. Investigations presented in the thesis explore the use of the IPDs by machines for robust source localisation and separation. Firstly, an algorithm for source localisation is introduced. The algorithm combines the Maximum Likelihood Sample Consensus (MLESAC) based search of line patterns which correspond to speech sources. The search is performed using Cross-phasogram representation of IPDs. Next, the study on the impact of phase ambiguity on separation performance is presented. A source separation algorithm called Localisation based Mask for Source Separation (LOCUS) is introduced. The LOCUS algorithm models the IPDs using Gaussian Mixture Model (GMM). The analysis of the IPDs interaction between different sources is shown to improve initialisation of the GMM and in consequence provided performance gains over the state-of-the art binaural separation methods. The second part of this thesis focuses on using the harmonicity cue for speech separation. The harmonicity is a feature of voiced speech therefore intuitively seems a powerful cue that could enhance separation of speech sources. However, in a multi-speaker scenario segregation of harmonic components is not trivial as it relies heavily on the underlying multi-source pitch determination algorithm. The proposed system uses an approach where speech sources are firstly reconstructed using the LOCUS algorithm and fed into single-source pitch determination algorithm. This gives the opportunity to use well-established single-source pitch determination algorithms which have been known for good robustness and accuracy of provided pitch trajectories. Based on this approach the Pitch based Harmonicity Mask for Source Separation (PRIMUS) algorithm is introduced. The approach is analogous to other separation systems that can be found in the literature however there has been little formal validation of some of the algorithmic choices that need to be considered for such approach. Therefore a detailed review followed by experimental studies of all the stages of the algorithm, from reconstruction of speech sources to calculation of corresponding separation masks, are presented. The final evaluation is done for the PRIMUS and the JANUS (Joint Localisation and Harmonicity Mask for Source Separation) algorithms where the JANUS algorithm computes a set of joint separation masks combining outputs of the LOCUS and the PRIMUS algorithms. The experimental results showed improvements in separation performance that were achieved over the state-of-the art binaural separation methods.

Item Type: Thesis (Doctoral)
Divisions : Theses
Authors :
Date : 30 October 2015
Funders : N/A
Contributors :
Depositing User : Lukasz Litwic
Date Deposited : 09 Nov 2015 09:46
Last Modified : 31 Oct 2017 17:44

Actions (login required)

View Item View Item


Downloads per month over past year

Information about this web site

© The University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom.
+44 (0)1483 300800