While PCA is particularly suitable for quantitative data, CA is recommendable for the following types of input data, which will subsequently be looked at more closely: frequencies, contingency tables, probabilities, categorical data, and mixed qualitative/categorical data.
In the case of frequencies (i.e. the ijth table entry indicates the frequency of occurrence of attribute j for object i) the row and column ``profiles'' are of interest. That is to say, the relative magnitudes are of importance. Use of a weighted Euclidean distance, termed the distance, gives a zero distance for example to the following 5-coordinate vectors which have identical profiles of values: (2,7,0,3,1) and (8,28,0,12,4). Probability type values can be constructed here by dividing each value in the vectors by the sum of the respective vector values.
A particular type of frequency of occurrence data is the contingency table, -- a table crossing (usually, two) sets of characteristics of the population under study. As an example, an contingency table might give frequencies of the existence of n different metals in stars of m different ages. CA allows the study of the two sets of variables which constitute the rows and columns of the contingency table. In its usual variant, PCA would privilege either the rows or the columns by standardizing: if, however, we are dealing with a contingency table, both rows and columns are equally interesting. The ``standardizing'' inherent in CA (a consequence of the distance) treats rows and columns in an identical manner. One byproduct is that the row and column projections in the new space may both be plotted on the same output graphic presentations (-- the lack of an analogous direct relationship between row projections and column projections in PCA precludes doing this in the latter technique).
Categorical data may be coded by the ``scoring'' of 1 (presence) or 0 (absence) for each of the possible categories. Such coding leads to complete disjunctive coding. CA of an array of such complete disjunctive data is referred to as Multiple Correspondence Analysis (MCA) (and in fact such a coding of categorical data is, in fact, closely related to contingency table type data).
Dealing with a complex astronomical catalogue may well give rise in practice to a mixture of quantitative (real valued) and qualitative data. One possibility for the analysis of such data is to ``discretize'' the quantitative values, and treat them thereafter as categorical. In this way a set of variables -- many more than the initially given set of variables -- which is homogenous, is analysed.