Recently I attended a series of 6 talks each by Jiri Navratil of the IBM Thomas J. Watson research center and Frédéric Bimbot of the IRISA. Both of them are some of the best researchers I have met. Very helpful and extremely humble. In one of Jiri’s opening talk he mentioned an adage in the literature of speech/language/speaker recognition that interested me. It not only applies to speech processing research but in recognition problems in general.
[Jiri Navratil speaking : Photo taken by me after his permission]
It went like this:
It is easier to reject imposters than it is to accept true speakers.
People’s voices are distinctive. That is, a person’s speech exhibits distinctive characteristics that indicate the identity of the speaker. We are all familiar with this and we all use it in our everyday lives to help us interact with others. Of course from time to time we might notice that a person sounds very much like another person we know. Or we might even momentarily mistake as one person as another because of the sound of the person’s voice. But this similarity between voices of different individuals is not what the technical challenge in speaker recognition is all about.
The challenge in speaker recognition is variance, not similarity. That is, the challenge to decode a highly variable speech signal into the characteristics that indicate the speaker’s identity. These variations are formidable and myriad. The principal cause of variance is the speaker.
An explanation for why the speaker’s variability is such a vexing problem is that the use of speech – unlike fingerprints or handprints or retinal patterns, is to a very large degree a result of what the person “does“; rather then “who the person is” – speech is a “performing art and each performance is unique”
Above are excerpts from The NIST Speaker recognition evaluation – Overview, methodology, systems, results, perspective. By G. R Doddington et al. Speech Communication, vol 31, pp 225-254, 2000.
Dr Navratil basically spoke on Acoustics and Phonotactics in Language Identification, while Dr Bimbot spoke on Gaussian Mixture Models and Universal Background Models in the course of their talks.