Frequently Asked Questions, Tips & Tricks for ICALAB

by A. Cichocki         10-Mar-2004

Q1. Which algorithm should I apply if my data have very small number of samples?

A1. In many cases you can obtain reasonable results, even if your data have only, say 200 samples.
First of all, you should try to apply TICA (or ThinICA) algorithm with the number of time delays of 100 or more (please click Advanced Options and change default number of DELAYS from 10 to 100).
The good performance can be obtained also by using SOBI algorithm, but the number of time delayed covariance matrices should be also increased from default 4 to at least 50.
Alternatively, you can also use SONS algorithm with adjusted parameters: Size of subwindows - 100 and the number of time delayed covariance matrices - 50.
More advanced technique is to apply a suitable  preprocessing of the observed data. For example, in many cases surprising good results gives a simple differentiation ( first or second order) or high pass filtering. Please note that such pre-processing enhances the additive noise, therefore the method is not suitable for noisy data. Alternative approach is to apply preprocessing to sensor data by multi-subband filters. In such preprocessing the output signals forming consecutive band filters are unfolded and build up automatically more samples of the observed data. This technique gives good results for sparse sources only.

Q2. What should I do if the algorithm is very slow for my data?

A2. Please check, how many samples are in your data? If it has more than 20000 samples, then almost all algorithms implemented in the ICALAB could be relatively slow since they are implemented as batch algorithms and processed data matrices in such cases are huge.
It is recommended then to divide your data into several blocks (windows) with 1000-10000 samples in each block and then to perform the calculations for each block. Alternatively, you can apply downsampling, by taking, for example, every second or every tenth sample. However, please note that for very long recordings the mixing matrix may change in time and a linear model with a fixed mixing matrix will not be valid, thus in such case it is necessary to divide data into smaller blocks. For each block it is assumed that the mixing matrix is time invariant.
Moreover, it is recommended to use a suitable algorithm. The fastest algorithms in the ICALAB package are:

Q3. Which algorithm should I use for noisy data?

A3. If the problem is ill-conditioned (i.e., the condition of the mixing matrix is high, say larger than 1000) the problem is challenging and it is rather difficult to estimate the mixing matrix precisely, except of special cases.
However, for low condition number of the mixing matrix, typically less than 20, several implemented algorithms give quite impressive results even for SNR as low as 0 dB.
If your data are corrupted by a large additive noise please try at first SOBI-RO and SOBI algorithms with the number of covariance matrices at least 100. You can try to use TICA algorithm with the increased number of time delays to at least 50.

-- Why so many matrices should be jointly diagonalized?

Well, there is a current trend in ICA/BSS to investigate the “average eigen-structure” of a large set of data matrices formed as functions of available data (typically, covariance or cumulant matrices for different time delays). In other words, the objective is to extract reliable information (e.g., estimation of sources and/or the mixing matrix) from the eigen-structure of a possibly large set of data matrices. However, since in practice, we have only a finite number of samples of signals corrupted by noise, the data matrices do not exactly share the same eigen-structure. Furthermore, it should be noted that determining the eigen-structure on the basis of one or even two data matrices leads usually to poor or unsatisfactory results because such matrices, based usually on an arbitrary choice, may have some degenerate eigenvalues which leads to loss of information contained in other data matrices. Therefore, in order to provide robustness and accuracy from a statistical point of view, it is necessary to consider the average eigen-structure by taking into account simultaneously a possibly large set of data matrices.
Please also note that for noisy data, we should avoid to use algorithms with prewhitening if the number of sensors is equal to the number of sources, since prewhitening enhances the noise, especially for ill conditioned problems.
If the number of observations is larger than the number of sources you can use PCA or Factor Analysis as preprocessing for model reduction and denoising. Furthermore, in preprocessing you can apply the subband filters which dramatically reduce the influence of noise.

Q4. How can I simply compare the consistency of two or several algorithms for the same observed data?

- In other words, how can I check quickly whether two or more algorithms estimate exactly the same sources, especially, in the case when the number of sources is large, say, more than 20?

A4. You can run the first algorithm and save the estimated separating matrix as W1=W . Next, you run the second algorithm and save the estimated separating matrix as W2=W, and then compute the following global matrix G12 = W1*pinv(W2) or W2 * pinv(W1) , where pinv means pseudo-inverse. If the separating matrix W is square and non-singular you can use instead of the pseudo-inverse the standard inverse of the matrix.
If the G12 matrix is a generalized permutation matrix (or in some sense very close to it ) then both algorithms estimate the same mixing (or separating ) matrix and, consequently, they estimate the same components. The generalized permutation matrix has only one non-zero (or in non-ideal case only one dominant) element in each row and each column.
Even the same algorithm may give different results depending on initial conditions, because the algorithm can stuck in different local minima. The above approach can be used to check the consistency of any ICA algorithm.

Q5. How can I evaluate the performance of blind separation of sources if a mixing matrix is completely unknown?

A5. There are several approaches to solve this problem. If the primary sources are known to be statistically independent you can apply any independence test to check whether the estimated "independent" components are really mutually independent. In a similar way, we can check whether the estimated sources are spatio-temporally decorrelated or whether the sources are sufficiently sparse.
If the sources are partially dependent, you can apply the concept of multi-resolution subband decomposition ICA or shortly MSD-ICA (please see the next questions).

Q6. How to detect and identify blindly for which frequency sub-bands (or more generally, for which sub-bands or multi-bands) the corresponding components are really independent? In other words, how to check whether the MSD-ICA really extracts the true source signals?

A6. The ICA methods do not allow us (for real-world problems) to estimate whether we have estimated the true (real) sources or not. This depends on validity of assumed linear model and mutual statistical independence of source signals.
Only under strong assumption that original sources are independent, we can do this. However, real world source signals are often not independent.

-- How can we find whether we have extracted true sources or not?

In some cases the solution is possible using surprisingly simple concept of proper preprocessing and/or linear decomposition of the raw data to mixture of sub-components (in the time, frequency or time-frequency domains) and running the algorithms several times.
Let us assume that wide-band source signals are a linear decomposition of several narrow-band sub-components:

si = si1 + si2 + ... + siK

Such decomposition can be modeled in the time, frequency or time frequency domains using any suitable linear transform. It is assumed that some source sub-components are virtually independent but not necessarily all of them.
Let us assume, that we have made at first the decomposition (for example, subband filtering) of wide-band observed sensory data into narrow band non-overlapped or partially overlapped signals.
For each frequency subband we can use a specific ICA algorithm, for example SANG, PEARSON or FICA.
We obtain a set of demixing or separating matrices: W0, W1, W2 , ... , WL, where W0 is the separating matrix for raw sensor data x and Wj is the separating matrix for jth subband xj.
We should save them for further processing. In order to identify, for which frequency subbands the corresponding sources or sub-components are independent, we need to compute the global matrices defined as:

Gij=Wi * [ Wj ] + = Wi * pinv (Wj)

for all possible pairs i and j, where Wj is the estimated separating matrix for the j-th frequency subband and [Wj]+ = pinv (Wj) is the pseudo-inverse matrix which is equal to the separating matrix (neglecting the scaling and permutation of columns). If the specific sub-components of interest are mutually independent for at least two sub-bands, or more generally two subsets of multi-band, say for the subband "p" and subband "q", then the global matrix

Gpq= Wp * [ Wq ]+ = P

will be a sparse generalized permutation matrix P with special structure with only one non-zero (or strongly dominating) element in each row and each column.
This follows from the simple mathematical observation that in such case both matrices Wp and Wq represent pseudo-inverses of the same true mixing matrix A (ignoring non-essential and unavoidable arbitrary scaling and permutation of the columns) and by making an assumption that sources for two multi-frequency subbands are independent.
In this way, we can blindly identify essential and very important information for which frequency subbands or multi-bands of the primary sources' sub-components are independent, and moreover, we can correctly identify the mixing matrix A representing the mixing process.
This concept can be generalized for any linearly transformed data or pre-processed signals, for example, the time-frequency or otherwise transformed data. For each transformed data, we can easily estimate the mixing and/or separating matrices.
We can check, for example, pair-wise for different multi-bands, transforms or representations whether the multiplication of two matrices, say:

Wq * Ap = Wq * [Wp]+ =Wq * pinv(Wp) = Pqp

builds up a sparse generalized permutation matrix P or not. If yes, this means with high probability that their sub-components are independent and for every pair of them the same mixing process exists:

xq(t) = A* sq(t) ,    

xp(t) = A* sp(t)

where A is the same fixed mixing matrix and sp(t) , sq(t) are mutually independent sources.

Q7. How to evaluate the performance index, quality and consistency of separation in ICALAB if the true mixing matrix is unknown?

A7. Please also read the previous question. In a similar way we can check the consistency of various ICA algorithms. Let us assume that two different algorithms, say the algorithm "q" and the algorithm “p" generate two different separating matrices:

Wq and Wp

If the result of multiplication

Wq * [Wp]+ =Wq * pinv(Wp)=P

is a generalized permutation matrix or close to the permutation matrix this means that both algorithms give consistent results.
This technique can be useful if the number of components is large and when checking the consistency by comparing and visualizing them manually is very time consuming. Of course, we can apply a more sophisticated multimodal approach in which we check consistency of algorithms for different windows or block of data.

Q8. How to detect or identify the stationarity of data in the sense that mixing matrix is fixed?

A8. Let us apply the same ICA algorithm for two different time windows, say window "p" and window "q". If the product of two matrices

Wq * [Wp]+ = Wq * pinv(Wp) = Pqp

is a generalized permutation matrix this means that both sub windows are described by the same mixing matrix.

Remark: In order to obtain consistent results it is recommended to use normalized matrices Wp and Wq , i.e., matrices with unit length vectors.

In a similar way, we can estimate the performance of specific ICA/BSS algorithm in respect to another reference algorithm for which we are sure that it gives reliable results without the knowledge of the true mixing matrix.

Q9. I know that my sources are dependent. Can I still recover or estimate them blindly?

A9. This depends on what a priori information you have about the source signals. If the sources are sparse you can apply Sparse Component Analysis (SCA). This is usually two stage procedure. At the first stage, we estimate the mixing matrix A and then we estimate sparse components using linear programming optimization.
If your sources are smooth, you can use so called Smooth Component Analysis (SmoCA), where we try to recover components which are as smooth as possible or the components with best linear predictability.
For many scenarios, we can recover source signals using a simple preprocessing of observed (sensor) data, even if the original sources are partially dependent.
For example, if low frequency subcomponents are statistically dependent while the high frequency components are independent, then we can apply identical high pass filters for all channels and use the output of these filters to estimate the mixing matrix.
In many cases it is sufficient to use the first or second order differentiator to enhance high-frequency components (such preprocessing works well, for example, for dependent natural images).
In contrast, if the high frequency sub-components are dependent (for example, all sensors are distorted by the same interference or random noise), while the low frequency components are independent, we can use the low pass filters. In more general case, we can apply subband decomposition as explained above.
In order to find original (dependent) sources, we project the raw sensor data x through the estimated separating matrix A=inv(W).
Summarizing, the standard ICA algorithms are not able to estimate the original sources if they are statistically dependent, i.e. when the independence assumption does not hold. However, using simple preprocessing, we may be able in many cases to reconstruct the original sources and estimate mixing and separating matrices. In other words, the ICALAB Toolbox enables to perform the blind separation of sources, which do not fulfill the independence assumption for rather wide class of signals.
First of all, this can be done by applying second order statistics, exploiting spatio-temporal decorrelation of sources and applying linear predictability and smoothness criteria (please see the book).
On the other hand, if each unknown source can be decomposed into narrow band sub-signals and if for at least one sub-band the sources are independent, then we can apply the preprocessing sub-band filters and apply standard ICA algorithms for such transformed mixed signals, provided such sub-bands can be identified by some a priori knowledge.
In the simplest case the source signals can be modeled or decomposed into low frequency and high frequency components.
In practice, the high frequency components are often independent. If such assumption holds, we can use High Pass Filter Option to extract high frequency sub-components and apply any standard ICA algorithm for such preprocessed sensor (observed) signals.

Q10. Is SOBI or AMUSE algorithm an ICA algorithm?

A10. The SOBI and AMUSE algorithms exploit only second order statistics (SOS) information, i.e. time-delayed covariance matrices and in general they do not perform ICA . In other words, there is no guarantee that components estimated by SOBI or AMUSE are independent but rather that they are only spatio-temporally uncorrelated (please try, for example, benchmarks, like "acsin10d").
In general, all second order statistics algorithms (AMUSE, EVD2, SOBI, SOBI-RO, SOBI-BPF) are not, strictly speaking, ICA algorithms since they do not use explicitly or implicitly any criterion of statistical independence.
They perform quite efficiently the blind separation of (colored) source signals which have temporal structure and have different spectra. The estimated sources have generally lower complexity than mixed signals. In other words, the estimated sources have the best possible linear predictability.
It is worth emphasizing that in the literature, second order statistics (SOS) BSS and higher order statistics (HOS) ICA terms are often confused or interchanged. Although they refer to the same or similar models, generally they are based on different criteria and may produce completely different and inconsistent results.
In general case, especially for real world problems, the objectives for ICA and SOS BSS are somewhat different. In fact, the objective of BSS is to estimate the original source signals even if they are not completely mutually statistically
independent (see the next question), while the objective of ICA is to determine a transformation which assures that the output signals are as independent as possible.
It should be noted that ICA methods use higher-order statistics (HOS) in most cases, while SOS BSS methods are apt to use only second order statistics. The second order methods assume that sources have some temporal structure, while the higher order methods assume their mutual independence and non-Gaussianity. Thus, the second order statistics methods, generally do not perform independent component analysis. Another difference is that the higher-order statistics methods can not be applied to Gaussian signals while second order methods do not have such constraints. In fact, SOS BSS methods do not really replace ICA and vice versa, since each approach is based on different criteria, assumptions and often different objectives.

Q11. Is ICA and Blind Source Separation (BSS) the same concept ?

A11. Although the mixing model is usually the same the BSS is a more general concept. ICA can be considered as one of the method of BSS under assumption that sources of interest are independent.
Although many different source separation algorithms are available, their principles can be summarized by the following four approaches:

a. The most popular approach exploits some measure of signals independence, non-Gaussianity or sparseness as the cost function. When original sources are assumed to be statistically independent without a temporal structure, the higher-order statistics (HOS) are essential (implicitly or explicitly) to solve the BSS problem. In such a case, the method does not allow more than one Gaussian source.

b. If sources have temporal structures, then each source has non-vanishing temporal correlation, and less restrictive than statistical independence conditions can be used, for example, second-order statistics (SOS) are sufficient to estimate the mixing matrix and sources. Along this line, several methods have been developed. Note that the SOS methods do not allow the separation of sources with identical power spectra shapes or i.i.d. (independent and identically distributed) sources.

c. The third approach exploits nonstationarity (NS) properties and second order statistics (SOS). We are interested in the second-order nonstationarity, in the sense that source variances vary in time. The nonstationarity was first taken into account by Matsuoka et al. and it was shown that a simple decorrelation technique is able to perform the BSS and ICA tasks in some cases.
In contrast to other approaches, the nonstationarity information based methods allow the separation of colored Gaussian sources with identical power spectra shapes. However, they do not allow the separation of sources with identical nonstationarity properties. There are some recent works on nonstationary source separation.

d. The fourth approach exploits the various diversities of signals, typically, time, frequency, (spectral or “time coherence”) and/or time-frequency diversities, or more generally, joint space-time-frequency (STF) diversity.

Remark: In fact, the concept of space-time-frequency diversities are widely used in wireless communications systems. Signals can be separated easily if they do not overlap in either time-, frequency- or time-frequency domain. When signals do not overlap in the time-domain then one signal stops (is silent) before another one begins. Such signals are easily separated when a receiver is accessible only while the signal of interest is sent. This multiple access method is called TDMA (Time Division Multiple Access). If two or more signals do not overlap in the frequency domain, then they can be separated with bandpass filters. The method based on this principle is called FDMA (Frequency Division Multiple Access). Both TDMA and FDMA are used in many modern digital communication systems. Of course, if the source power spectra overlap, the spectral diversity is not sufficient to extract sources, therefore, we need to exploit other kinds of diversity. If the source signals have different time frequency diversity and time-frequency signatures of the sources do not (completely) overlap then still they can be extracted from one (or more) sensor signal by masking individual source signals or interference in the time-frequency domain and then synthesized from time frequency domain. However, in such cases some a priori information about source signals is necessary. Therefore, separation is not completely blind but only semi-blind.

More sophisticated or advanced approaches use combinations (parallel and or multi-stage processing) or integration of all the above mentioned approaches: HOS, SOS, NS and STF (Space-Time-Frequency) diversity, in order to separate or extract sources with various statistical properties and to reduce the influence of noise and undesirable interference. Methods that exploit either the temporal structure of sources (mainly second-order correlations) and/or the nonstationarity of sources, lead to the SOS BSS methods. In contrast to BSS methods based on HOS, all the second order statistics based methods usually do not have to infer with the probability distributions of sources or nonlinear activation functions.

Q12. What is the difference between PCA, FA (Factor Analysis) and ICA ?

A12. Generally, all three techniques use a similar model:


The principal components are random variables of maximal variance constructed from linear mixtures of the input features.
Factor analysis is a generalization of PCA which is based explicitly on maximum-likelihood. The main difference is that factor analysis allows the noise to have an arbitrary diagonal covariance matrix, while PCA assumes the noise is spherical. In addition to estimating the subspace, factor analysis estimates the noise covariance matrix.
Both PCA and FA are based on second-order statistics.
The independent components are random variables of minimum entropy constructed from linear combinations of the input features. The entropy is normalized by the variance of the component, so absolute scale doesn't matter. According to the information theory such variables should be as independent as possible. This feature extraction technique is closely related to projection pursuit, commonly used for visualization. Principally, ICA is based on higher-order statistics and minimization of mutual information.


Q13. What are the current trends and open problems in ICA and BSS ?

A13. There are several actual topics, e.g., large-scale problems, processing of noisy data and development of extended and generalized models for BSS and ICA.
There are many generalizations and extensions of ICA concept . For example, in local ICA the available sensor data are suitably preprocessed or transformed and splitted by grouping them into clusters in space, or in time, frequency and/or in time-frequency domain, and then applying linear ICA to each cluster locally. More generally, an optimal local ICA can be implemented as the result of mutual interaction of two processes: A suitable clustering (or splitting) process and the application of the ICA process to each such cluster.
A globally linear model, as implied by conventional ICA, may be insufficient to represent multivariate data in many situations. A combination of several local ICA's can provide a suitable approach in such cases. An important question is then how to find an appropriate partitioning of the data space together with a proper choice of the local number of independent components (IC's).

Despite the success of using standard ICA in many applications, the basic assumptions of ICA may not hold hence some caution should be taken when using standard ICA to analyze real world problems, especially in biomedical signal processing. In fact, by definition, the standard ICA algorithms are not able to estimate statistically dependent original sources, that is, when the independence assumption is violated. A natural extension and generalization of ICA is multiresolution subband decomposition ICA (MSD-ICA) which relaxes considerably the assumption regarding mutual independence of primarily sources. The key idea in this approach is the assumption that the wide-band source signals are dependent, however some transformed (e.g., frequency multi- band) subcomponents are independent.
The basic concept of MSD-ICA is to divide the sensor signal spectra into their subspectra or subbands, and then to treat those subspectra individually for the purpose at hand. The subband signals can be ranked and processed independently.
Multiresolution Subband Decomposition ICA (MSD-ICA) can be formulated as a task of estimation of a mixing and/or separating matrix on the basis of suitable subband decomposition of sensors signals and by applying a classical ICA (instead for raw sensor data) for one or several preselected subbands, for which source sub-components are independent.
In one of the most simplest case, source signals can be modeled or decomposed into their low- and high- frequency sub-components. In practice, the high-frequency sub-components are often found to be mutually independent.
In order to separate the original sources in such a case, we can use a High Pass Filter (HPF) to extract high frequency sub-components and then apply any standard ICA algorithm to such preprocessed sensor (observed) signals. In the preprocessing stage, more sophisticated methods, such as block transforms, multirate subband filter bank or wavelet transforms, can be applied.

Q14. Is there any algorithm which provides automatic ranking or ordering of estimated components? I want to extract only several significant components form large number of observations. Which algorithm can I apply ?

A14. Although, in general, all ICA and BSS algorithms have two ambiguities of arbitrary permutation and scaling, some algorithms are able to deliver automatically and on-line ordered estimated components.
For example, AMUSE algorithm orders the estimated sources according to increasing complexity or decreasing measure of linear predictability. Such ordering is possible due to SVD (singular value decomposition) instead of standard EVD (Eigen-value decomposition). Of course, we can always order the estimated components as the post-processing by using several criteria, e.g., according to the decreasing value of normalized kurtosis, sparseness, linear predictability, Hurst exponent, entropy, values of normalized cumulants, etc.
You can extract an arbitrary group of components using, e.g., TICA, SIMBEC, Fast ICA algorithms. Which components are extracted is dependent on implemented criterion and initial conditions. We can run algorithm several times starting from various initial conditions to extract desired components.

Q15. From the observations which represent linear mixture of hundreds of sources I want to extract only some components with specific stochastic properties, for example, the sparsest components. Which algorithm should I use?

A15. There are several algorithms you can use in this case, for example TICA, FICA or SIMBEC.

Q16. I use ICALAB for image processing. I want to know if the preprocessing such as zero mean, whitening etc. are automatically done or I need to select them somewhere?

A16. Some preprocessing like zero mean and whitening are automatically done, other like high pass filtering, bandpass filters, differentiation can be done manually.

Q17. How to choose optimally the advanced parameters for SONS algorithm?

A17. The optimal choice of parameters depends, of course, on data you want to process. If your sources have some temporal structure, like speech signals, you can choose relatively large sub-windows; say from 500 to 2000 samples and the number of time delays from 1 to 2. If you want to perform independent component analysis, the subwindows should be relatively small, say, from 10 to 100 samples and you should use only few time delays.

Q18. I am actually testing and comparing the performance of various algorithms. What factors I should consider to make fair and unbiased comparison?

A18. This depends on criteria you use in your comparison, e.g.: performance index, complexity, computation time.

Please note that in evaluation of performance index for noisy data the following factors may play crucial role:

a. For noisy data the performance strongly depends on the condition number of a mixed matrix. For the condition number, say, larger than 20 all algorithms may give rather poor performance for large noise. To make a comparison "fair" we need to take the same fixed mixing matrix for all tests.

b. The main problem with noisy data is prewhitening, which dramatically deteriorates performance if the number of sources is equal to the number of sensors. In other words, the standard pre-whitening is generally not robust with respect to noise.

c. The performance of some algorithms strongly depends on chosen free parameters. For example, algorithms based on joint diagonalization like SOBI for noisy data strongly depend on used number of jointly diagonalized matrices. Please note that for SOBI or TICA for noisy data the number of time delays (jointly diagonalized matrices) should be at least from 100 to 200 for noisy data.

Q19. I found your ICALAB package and I use it. It works very well, but unfortunately it is limited to 100 channels. Is it possible to extend them to more channels, say, 1000?

A19.We allowed a limited number of channels for two main reasons:

a. The performance of most algorithms deteriorates with more than 100 sources and some algorithms are not reliable for very large scale problems.

b. Principally the visualization of the data in ICALAB is designed for less than 250 channels. We make efforts to extend our ICALAB package to virtually unlimited number of sources. If you need more channels please write to us.

Q20. I have difficulties to successfully separate or extract sources of some benchmarks included in ICALAB. What kind of tricks or algorithms should I use to separate them?

A20. Different algorithms are based on various criteria and assumptions regarding the sources, so, they may lead to different results. One simple explanation could be that the original sources are not completely independent or some of them are Gaussian, so some processing or filtering of the observed data is necessary to estimate a true mixing matrix. Using appropriate preprocessing, we are able to estimate the original sources for the most of provided benchmarks, even if they are not completely independent.
The ICA/BSS algorithms are pure mathematical formulas, powerful, but rather mechanical procedures: There is a misunderstanding that not very much is left to do for the user after the machinery has been optimally implemented. The successful and efficient use of the ICALAB strongly depends on a priori knowledge, on common sense and on appropriate use of the preprocessing and post-processing tools. In other words, it is in preprocessing of data and in postprocessing of models where the expertise is truly needed (see the book ).

back to ICALAB main page

! Disclaimer


Copyright ©2004 Advanced Brain Signal Processing Lab., BSI, Riken