Canonical Correlation

Canonical-Correlation Analysis is a general procedure for investigating the relationships between two sets of variables. It is a way of making sense of cross-covariance matrices. In simple terms, Canonical correlation analysis is used to identify and measure the associations among two sets of variables. Canonical correlation analysis (CCA) is a way of measuring the linear relationship between two multidimensional variables. Canonical correlation analysis determines a set of canonical variates, orthogonal linear combinations of the variables within each set that best explain the variability both within and between sets.

Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple inter-correlated outcome variables. It finds two bases, one for each variable, that are optimal with respect to correlations and, at the same time, it finds the corresponding correlations. In other words, it finds the two bases in which the correlation matrix between the variables is diagonal and the correlations on the diagonal are maximized. The dimensionality of these new bases is equal to or less than the smallest dimensionality of the two variables.

If we have two vectors X = (X1, …, Xn) and Y = (Y1, …, Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find a linear combinations of the Xi and Yj which have maximum correlation with each other.

[1] Wikipedia
[2] UCLA: Statistical Consulting Group
[3] Linkoping University

Leave a Reply