Stochastic Systems Group
Home Research Group Members Programs  
Demos Calendar Publications Mission Statement Alumni

SSG Seminar Abstract

Itakura-Saito nonnegative matrix factorization and friends for music signal decomposition

Cedric Fevotte

Other the last 10 years nonnegative matrix factorization (NMF) has become a popular unsupervised dictionary learning/adaptive data decomposition technique with applications in many fields. In particular, much research about this topic has been driven by applications in audio, where NMF has been applied with success to automatic music transcription and single channel source source separation. In this setting the nonnegative data is formed by the magnitude or power spectrogram of the sound signal and is decomposed as the product of a dictionary matrix containing elementary spectra representative of the data times an activation matrix which contains the expansion coefficients of the data frames in the dictionary.

After a general overview of NMF and a focus on majorization-minimization (MM) algorithms for NMF, the presentation will discuss model selection issues in the audio setting, pertaining to 1) the choice of time-frequency representation (essentially, magnitude or power spectrogram), and 2) the measure of fit used for the computation of the factorization. We will give arguments in support of factorizing of the power spectrogram with the Itakura-Saito (IS) divergence. In particular, IS-NMF is shown to be connected to maximum likelihood estimation of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well suited to audio.

Then the presentation will briefly address variants of IS-NMF, namely IS-NMF with regularization of the activation coefficients (Markov model, group sparsity), online IS-NMF, automatic relevance determination for model order selection and multichannel IS-NMF. Audio source separation demos will be played.

Cedric Fevotte obtained the State Engineering degree and the MSc degree in Control and Computer Science from Ecole Centrale de Nantes (France) in 2000, and then the PhD degree in 2003. As a PhD student he was with the Signal Processing Group at Institut de Recherche en Communication et Cybernetique de Nantes (IRCCyN) where he worked on time-frequency approaches to blind source separation. From 2003 to 2006 he was a research associate with the Signal Processing Laboratory at University of Cambridge (Engineering Dept) where he developed Bayesian approaches to sparse component analysis with applications to audio source separation. He was then a research engineer with the start-up company Mist-Technologies (now Audionamix) in Paris, designing mono/stereo to 5.1 surround sound upmix solutions. In Mar. 2007, he joined Telecom ParisTech, first as a research associate and then as a CNRS tenured research scientist in Nov. 2007. His research interests generally concern statistical signal processing and unsupervised machine learning and in particular applications to blind source separation and music signal processing. He is the scientific leader of project TANGERINE (Theory and applications of nonnegative matrix factorization) funded by the French research funding agency ANR.

Problems with this site should be emailed to