Learning Algorithms and Applications


Andrzej  CICHOCKI    &   Shun-ichi  AMARI






Chapter 1: Introduction to Blind Signal Processing: Problems and Applications


 Blind Signal Processing (BSP) is now one of the hottest and exciting topics in the fields of neural computation, advanced statistics, and signal processing with solid theoretical foundations and many potential applications. In fact, BSP has become a very important topic of research and development in many areas, especially biomedical engineering, medical imaging, speech enhancement, remote sensing, communication systems, exploration seismology, geophysics, econometrics, data mining, neural networks, etc.  The blind signal processing techniques principally do not use any training data and do not assume a priori knowledge about parameters of convolutive, filtering and mixing systems. BSP includes three major areas: Blind Signal Separation and Extraction (BSS/BSE), Independent Component Analysis (ICA), and Multichannel Blind Deconvolution (MBD) and Equalization which are the main subjects of the book. In this chapter are formulated fundamental problems of the BSP, given important definitions and described basic mathematical and physical models. Moreover, several potential and promising applications are reviewed. 



Keywords:  Blind Source Separation (BSS), Blind Source Extraction (BSE), Independent Component Analysis (ICA), Multichannel Blind Deconvolution (MBD), Basic definitions and models, Applications.   





Chapter 2: Solving a System of Linear Algebraic Equations and Related Problems


 In modern signal and image processing fields like biomedical engineering, computer tomography (image reconstruction from projections), automatic control, robotics, speech and communication, linear parametric estimation, models such as auto-regressive moving-average (ARMA) and linear prediction (LP) have been extensively utilized. In fact, such models can be mathematically described by an overdetermined system of linear algebraic equations. Such systems of equations are often contaminated by noise or errors, thus the problem of finding an optimal and robust with respect noise solution arises if some a priori information about the error is available. On the other hand, wide classes of extrapolation, reconstruction, estimation, approximation, interpolation and inverse problems can be converted to minimum norm problems of solving underdetermined systems of linear equations. Generally speaking, in signal processing applications, the overdetermined system of linear equations describes filtering, enhancement, deconvolution and identification problems, while the underdetermined case describes inverse and extrapolation problems. This chapter provides a tutorial to the problem of solving large overdetermined and underdetermined systems of linear equations, especially when there is an uncertainty in parameter values and/or the systems are contaminated by noise. A special emphasis is placed in on-line fast adaptive and iterative algorithms for arbitrary noise statistics. This chapter also gives several illustrative examples that demonstrate the characteristics of the developed novel algorithms.



Keywords: Least Squares (LS) problem, Extended Total Least Squares (TLS), Data Least Squares

(DLS), Least Absolute Deviation (LAD), 1-norm solution, Solving of system of linear equations with non-negativity constraints, Non-negative Matrix Factorization (NMF), Regularization, Sparse signal representation, Sparse solutions, Minimum Fuel Problem (MFP), Focuss algorithms, Amari-Hopfield recurrent neural networks for on-line solutions.  





Chapter 3: Principal/Minor Component Analysis and Related Problems 


Neural networks with unsupervised learning algorithms organize themselves in such a way that they can detect or extract useful features, regularities, correlations of data or signals or separate or decorrelate some signals with little or no prior knowledge of the desired results. Normalized (constrained) Hebbian and anti-Hebbian learning rules are simple variants of basic unsupervised learning algorithms; in particular, learning algorithms for principal component analysis (PCA), singular value decomposition (SVD) and minor component analysis (MCA) belong to this class of unsupervised rules.  Recently, many efficient and powerful adaptive algorithms have been developed for PCA, MCA and SVD and their extensions The main objective of this chapter is a derivation and overview of the most important adaptive algorithms.  


Keywords: PCA, MCA, SVD, Subspace methods, Automatic dimensionality reduction, AIC and MDL criteria, Power method, Robust PCA, Multistage PCA for blind source separation.   





Chapter 4: Blind Decorrelation and Second Order Statistics for Robust Blind Identification


 Temporal, spatial and spatio-temporal decorrelations play important roles in signal processing. These techniques are based only on second-order statistics (SOS). They are the basis for modern subspace methods of spectrum analysis and array processing and often used in a preprocessing stage in order to improve convergence properties of adaptive systems, to eliminate redundancy or to reduce noise. Spatial decorrelation or prewhitening is often considered as a necessary (but not sufficient) condition for the stronger stochastic independence criteria. After prewhitening, the BSS or ICA tasks usually become somewhat easier and well-posed (less ill-conditioned), because the subsequent separating (unmixing) system is described by an orthogonal matrix for real-valued signals and a unitary matrix for complex-valued signals and weights. Furthermore, spatio-temporal and time-delayed decorrelation can be used to identify the mixing matrix and perform blind source separation of colored sources. In this chapter, we discuss and analyze a number of efficient and robust adaptive and batch algorithms for spatial whitening, orthogonalization, spatio-temporal and time-delayed blind decorrelation. Moreover, we discuss several promising robust algorithms for blind identification and blind source separation of non-stationary and/or colored sources. 



Keywords: Robust whitening, Robust orthogonalization, Gram-Schmidt orthogonalization,, Second order statistics (SOS) blind identification, Multistage EVD/SVD for BSS, Simultaneous diagonalization, Joint approximative diagonalization, SOBI and JADE algorithms, Blind source separation for non-stationary signals, Natural gradient, Atick-Redlich formula, Gradient descent with Frobenius norm constraint.





  Chapter 5: Sequential Blind Signal Extraction  


There are three main objectives of this chapter: 

(a) To present  simple neural networks (processing units) and propose unconstrained extraction and deflation criteria  that do not require either  a priori knowledge of source signals or the whitening of mixed signals. These criteria lead to simple, efficient, purely local and biologically plausible learning rules (e.g., Hebbian/anti-Hebbian type learning algorithms).

(b) To prove that the proposed criteria have no spurious equilibriums. In other words, the most learning rules discussed in this chapter always reach desired solutions, regardless of initial conditions (see appendixes for proof).

(c) To demonstrate with computer simulations the validity and high performance for practical use of the derived learning algorithms.

 In this chapter there are used two different models and approaches. The first approach is based on higher order statistics (HOS) which assume that sources are mutually statistically independent and they are non-Gaussian (expect at most one) and as criteria of independence, we will use some measures of non-Gaussianity. The second approach based on the second order statistics (SOS) assumes that source signals have some temporal structure, i.e., the sources are colored with different autocorrelation functions or equivalently different shape spectra. Special emphasis will be given to blind source extraction (BSE) in the case when sensor signals are corrupted by additive noise using the bank of band pass filters.



 Keywords: Basic criteria for blind source extraction, Kurtosis, Gray function, Cascade neural network, Deflation procedures, KuickNet, Fixed-point algorithms, Blind extraction with reference signal, Linear predictor and band-pass filters for BSS, Statistical analysis, Log likelihood, Extraction of sources from convolutive mixture, Stability, Global convergence.  





 Chapter 6: Natural Gradient Approach to Independent Component Analysis


 In this chapter, fundamental signal processing and information theoretic approaches are presented together with learning algorithms for the problem of adaptive blind source separation (BSS) and Independent Component Analysis (ICA). We discuss recent developments of adaptive learning algorithms based on the natural gradient approach in the general linear, orthogonal and Stiefel manifolds. Mutual information, Kullback-Leibler divergence, and several promising schemes are discussed and reviewed in this chapter, especially for signals with various unknown distributions and unknown number of sources. Emphasis is given to an information-theoretical and information-geometrical unifying approach, adaptive filtering models and associated on-line adaptive nonlinear learning algorithms. We discuss the optimal choice of nonlinear activation functions for various distributions, e.g., Gaussian, Laplacian, impulsive and uniformly-distributed signals based on a generalized-Gaussian-distributed model. Furthermore, families of efficient and flexible algorithms that exploit non-stationarity of signals are also derived.  



Keywords: Kullback-Leibler divergence, Natural gradient concept, Derivation and analysis of

 natural gradient algorithms, Local stability analysis, Nonholonomic constraints, Generalized Gaussian and Cauchy distributions, Pearson model. Natural gradient algorithms for non-stationary sources. Extraction of arbitrary group of sources, Semi-orthogonality constraints, Stiefel manifolds.   





Chapter 7: Locally Adaptive Algorithms for ICA and their Implementations 


The main purpose of this chapter is to describe and overview models and to present a family of practical and efficient associated adaptive or locally adaptive learning algorithms which have special advantages of efficiency and/or simplicity and straightforward electronic implementations. Some of the described algorithms have special advantages in the cases of noisy, badly scaled or ill-conditioned signals. The developed algorithms are extended for the case when the number of sources and their statistics are unknown.  Finally, problem of an optimal choice of nonlinear activation function and general local stability conditions are also discussed. In particular, we focus on simple locally adaptive Hebbian/anti-Hebbian learning algorithms and their implementations using multi-layer neural networks are proposed.



 Keywords: Modified Jutten-Herault algorithm, robust local algorithms for ICA/BSS, Multi-layer network for ICA, Flexible ICA for unknown number of sources, Generalized EASI algorithms, and Generalized stability conditions. 





 Chapter 8: Robust Techniques for BSS and ICA with Noisy Data


 In this chapter we focus mainly on approaches to blind separation of sources when the measured signals are contaminated by large additive noise. We extend existing adaptive algorithms with equivariant properties in order to considerably reduce the bias caused by measurement noise for the estimation of mixing and separating matrices. Moreover, we propose dynamical recurrent neural networks for simultaneous estimation of the unknown mixing matrix, source signals and reduction of noise in the extracted output signals. The optimal choice of nonlinear activation functions for various noise distributions assuming a generalized-Gaussian-distributed noise model is also discussed. Computer simulations of selected techniques are provided that confirms their usefulness and good performance.  The main objective of this chapter is to present several approaches and derive learning algorithms that are more robust with respect to noise than the techniques described in the previous chapters or that can reduce the noise in the estimated output vector of independent components




  Keywords: Bias removal techniques, Wiener filters with references convolutive noise, Noise cancellation and reduction, Cumulants based cost functions and equivariant algorithms, Blind source separation with more sensors than sources, Robust extraction of arbitrary group of sources, Recurrent neural network for noisy data, Amari-Hopfield neural network.




 Chapter 9: Multichannel Blind Deconvolution:  Natural Gradient Approach


 The main objective of this chapter is to review and extend existing adaptive natural gradient algorithms for various multichannel blind deconvolution models.  Blind separation/deconvolution of source signals has been a subject under consideration for more than two decades. There are significant potential applications of blind separation/deconvolution in various fields, for example, wireless telecommunication systems, sonar and radar systems, audio and acoustics, image enhancement and biomedical signal processing (EEG/MEG signals). In these applications, single or multiple unknown but independent temporal signals propagate through a mixing and filtering medium. The blind source separation/deconvolution problem is concerned with recovering independent sources from sensor outputs without assuming any a priori knowledge of the original signals, except certain statistical features.  In this chapter, we present using various models and assumptions, relatively simple and efficient, adaptive and batch algorithms for blind deconvolution and equalization for single-input/multiple-output (SIMO) and multiple-input/multiple-output (MIMO) dynamical minimum phase and non-minimum phase systems. The basic relationships between standard ICA/BSS (Independent Component Analysis and Blind Source Separation) and multichannel blind deconvolution are discussed in detail.  They enable us to extend algorithms derived in the previous chapters. In particular, the natural gradient approaches for instantaneous mixture to convolutive dynamical models. We also derive a family of equivariant algorithms and analyze their stability and convergence properties. Furthermore, a Lie group and Riemannian metric are introduced on the manifold of FIR filters and using the isometry of the Riemannian metric, the natural gradient on the FIR manifold is described. Based on the minimization of mutual information, we present then a natural gradient algorithm for the causal minimum phase finite impulse response (FIR) multichannel filter. Using information back-propagation, we also discuss an efficient implementation of the learning algorithm for the non-causal FIR filters. Computer simulations are also presented to illustrate the validity and good learning performance of the described algorithms.



 Keywords: Basic models for blind equalization and multichannel deconvolution,  Fractionally Sampled system, SIMO and MIMO models, Equalization criteria, Separation-deconvolution criteria, Relationships between BSS/ICA and multichannel blind deconvolution (MBD), Natural gradient algorithms for MBD, Information Back-propagation. 





 Chapter 10: Estimating Functions and Superefficiency for ICA and Deconvolution


 Chapter 10 introduces the method of estimating functions to elucidate the common structures in most of the ICA/BSS and MBD algorithms. We use information geometry for this purpose, and define estimating functions in semiparametric statistical models which include unknown functions as parameters. Differences in most existing algorithms are only in the choices of estimating functions. We then give error analysis and stability analysis in terms of estimating functions.  This makes it possible to design various adaptive methods for choosing unknown parameters included in estimating functions, which control accuracy and stability. The Newton method is automatically derived by the standardized estimating functions. First the standard BSS/ICA problem is formulated in the framework of the semiparametric model and a family of estimating functions. Furthermore, the present chapter will discuss and extend further convergence and efficiency of the batch estimator  and natural gradient learning for blind separation/deconvolution via the  semiparametric statistical model and estimating functions and standardized estimating functions derived by using  efficient score functions elucidated  recently by Amari et al. We present the geometrical properties of the manifold of the FIR filters based on the Lie group structure and formulate the multichannel blind deconvolution problem within the framework of the semiparametric model deriving a family of estimating functions for blind deconvolution. We then analyze the efficiency of the batch estimator based on estimating function - obtaining its convergence rate. Finally, we show that both batch learning and on-line natural gradient learning are superefficient under given nonsingular conditions.



 Keywords: Estimating functions, Semiparametric statistical models, Superefficiency, Likelihood, Score functions, Batch estimator, Information geometry, Stability analysis.





  Chapter 11: Blind Filtering and Separation Using a State-Space Approach


 The state-space description of dynamical systems is a powerful and flexible generalized model for blind separation and deconvolution or more generally for filtering and separation. There are several reasons why the state-space models are advantageous for blind separation and filtering. Although transfer function models in the Z-domain or the frequency domain are equivalent to the state-space models in the time domain for any linear, stable time-invariant dynamical system, using transfer function directly it is difficult to exploit internal representation of real dynamical systems. The main advantage of the state-space description is that it not only gives the internal description of a system, but there are various equivalent canonical types of state-space realizations for a system, such as balanced realization and observable canonical forms. In particular, it is possible to parameterize some specific classes of models which are of interest in applications. In addition, it is relatively easy to tackle the stability problem of state-space systems using the Kalman filter. Moreover, the state-space model enables a much more general description than the standard finite impulse response (FIR) convolutive filtering models discussed in the Chapter 9. In fact, all the known filtering models, such as the AR, MA, ARMA, ARMAX and Gamma filtering, could also be considered as special cases of flexible state-space models. In this chapter, we briefly review adaptive learning algorithms based on the natural gradient approach and give some perspective and new insight into multiple-input multiple-output blind separation and filtering in the state-space framework. 



 Keywords: Linear basic state space model, Natural gradient algorithm for state space model, Estimation of output and state space matrices, Comparison of various algorithms, Kalman filter, Two stage blind separation/filtering approach. 





  Chapter 12: Nonlinear State Space Models - Semi-Blind Signal Processing


 In this chapter we attempt to extend and generalize the results discussed in the previous chapters to nonlinear dynamical models. However, the problem is not only very challenging but intractable in the general case without   a priori knowledge about the mixing and filtering nonlinear process. Therefore, in this chapter we consider very briefly only some simplified nonlinear models. In addition, we assume that some information about the mixing and separating system and source signals is available.  In practice, special nonlinear dynamical models are considered in order to simplify the problem and solve it efficiently for specific applications. Specific examples include the Wiener model, the Hammerstein model and Nonlinear Autoregressive Moving Average models.



  Keywords: Semi-blind separation and filtering, Wiener and Hammerstein models, Nonlinear Autoregressive Moving Average (NARMA) model, Hyper radial basis function (HRBF) neural network.





  Appendix A: Mathematical Preliminaries



 In this appendix some mathematical background needed for complete understanding of the text are quickly reviewed. Many useful definitions, formulas for matrix algebra and matrix differentiation are given


Keywords:   Matrix inverse update rules, Matrix differentiation, Differentiations of scalar cost function with respect to a vector, Trace, Matrix differentiation of trace of matrices, Matrix expectation, Properties of determinant, Moore-Penrose pseudo inverse, Discrimination measures, Distance measures .




 Appendix B:  Glossary of Symbols and Abbreviations


Appendix B contains the list of basic symbols, notation and abbreviations used in the book.






The list of references contains more than 1350 publications.






Accompanying CD-ROM includes electronic, interactive version of the book with hyperlinks, full-color figures and text. The black and white electronic version with hyperlinks is also provided.

 In addition MATLAB user friendly demo package for performing ICA and BSS/BSE is included