Page Content

Dr Izzet Burak Yildiz



Learning and Recognizing Human Speech

joint work with Stefan Kiebel
Even the most sophisticated speech recognition algorithms today are not at the level of human performance when it comes to tasks such as recognition in noisy environments, adaptation to speed fluctuations and generalization to different speakers and accents. Therefore, it is important to understand the mechanisms of human speech recognition and develop neurobiologically plausible models that can replicate some of these capabilities. In this project, we use dynamical systems theory and a Bayesian inference scheme to propose a hierarchical speech learning and recognition model. This hierarchical approach is inspired by the song recognition system of birds which shares anatomical and functional similarities with the human speech recognition system. We expose the model to a benchmark speech recognition task and show that the model can learn words rapidly and recognize accurately and robustly, even under adverse conditions, from different speakers.
Furthermore, we used the model to provide computational explanations what may cause differences between people in (i) adapting to new accents, and (ii) learning a second language. Intuitively, the computational results show that the success of learning a second language critically depends on being able to expect deviations in the already learned internal dynamics (i.e. the representation of the first language).
In summary, inspired by the song recognition mechanism of birds, we propose a hierarchical Bayesian model for learning and online recognition of human speech. The model is shown to be robust under adverse conditions and it has the potential to explain some of the neural mechanisms underlying the behavioral results of accent adaptation and second language learning.

Re-Visiting the Echo State Property

joint work with Herbert Jaeger and Stefan Kiebel
An echo state network (ESN) consists of a large, randomly connected neural network, the reservoir, which is driven by an input signal and projects to output units. During training, only the connections from the reservoir to these output units are learned. A key requisite for output-only training is the echo state property (ESP), which means that the effect of initial conditions should vanish as time passes. In this project, we use analytical examples to show that a widely used criterion for the ESP, the spectral radius of the weight matrix being smaller than unity, is not sufficient to satisfy the echo state property. We obtain these examples by investigating local bifurcation properties of the standard ESNs. Moreover, we provide new sufficient conditions for the echo state property of standard sigmoid and leaky integrator ESNs. We furthermore suggest an improved technical definition of the echo state property, and discuss what practicians should (and should not) observe when they optimize their reservoirs for specific tasks.

Generation and Recognition of Birdsongs

joint work with Stefan Kiebel
How do birds sing acoustically complex and elegant melodies? This question kept researchers busy for more than a few decades and still we don't have a complete understanding of the mechanisms underlying birdsongs. Maybe a more intriguing question is, how do birds perceive a song produced by another bird? Birdsongs carry important information that allows females to choose their mates and males to repel other males from their territories. Therefore, even tiny acoustical structure should be identified by the listeners as it can reveal the skills and strength of the singer bird. In this project, we use dynamical systems theory to propose a biologically plausible, hierarchical model for birdsong generation and introduce a novel Bayesian inversion scheme for online recognition of birdsongs. This framework gives valuable insight into communication of birds within species; moreover, it can be used to derive novel machine learning applications such as speech recognition.
For an introductory blog entry for this project, please visit: blog
The results of this project have been published in:
Yildiz IB , Kiebel SJ , 2011 A Hierarchical Neuronal Model for Generation and Online Recognition of Birdsongs. PLoS Comput Biol 7(12): e1002303 doi
For more details, see here
Passport photo

Stephanstra├če 1A
04103 Leipzig


Last update: Jun 2, 2012 9.13.10 pm
Copyright © 2012 Max Planck Institute for Human Cognitive and Brain Sciences