Cluster analysis (or clustering) is the machine learning task of locating meaningful groups within data such that similarity within clusters is greater than similarity between clusters. There exist a number of similarity measures which may be applied. In my work, I assume the non-parametric density-based definition where data points are assumed to be realisations from an unknown probability density. Clusters are defined as regions of high-density, separated by low-density regions.
For discrete and mixed observations, this definition is inappropriate since each set of possible outcomes for each of the discrete attributes would appear as a single region of high density. Thus, we first locate an appropriate continuous representation of such data before applying our clustering algorithms.
Further, in high dimensions, the reliability of density estimation is severely inhibited. For this reason, we seek to locate low-density separators which may be calculated based on the density of one-dimensional projections only. To ensure computational tractability, these separators are restricted to being linear.
For some datasets, linear separators are insufficient to partition clusters. To overcome this, we use kernel methods to map data into a feature space (possibly of very high or even infinite dimensionality) where the clusters are linearly separable. The linear separation of the mapped data in the feature space may then correspond to a non-linear separation in the original data space.