Home > lib > interactive_pcacrossval.m

interactive_pcacrossval

PURPOSE ^

PCA cross-validation aims to determine the "best" number of PCs to use in PCA

SYNOPSIS ^

This is a script file.

DESCRIPTION ^

 PCA cross-validation aims to determine the "best" number of PCs to use in PCA

 Sense of "best": the number of PCs that give minimum mean reconstruction error

 What is reconstruction error?
 - PCA scores are calculated using the loadings matrix: Y = X*L, where
   . X is the testing dataset (spectra horizontally)
   . Y is the PCA scores dataset
   . L is the loadings matrix (loadings vertically) calculated from the training dataset
 - Spectra can be reconstructed [with error] by X_hat = Y*L' = X*L*L'
 - The reconstruction error is calculated as error = mean_all_i(norm^2(X_hat_i-X_i)), where
   . mean_all_i(.) is the mean of all spectra in the testing dataset
   . X_i is the i-th spectrum (row) in the testing dataset
   . X_hat_i is the i-th reconstructed spectrum (row) of the testing dataset

 Why cross-validation?
 If you measure reconstruction error using the same dataset for training and testing, the error will always decrease
 as you add more PCs to Y.
 However, if we split the dataset into training and testing datasets, we will try to reconstruct samples that have been
 left out of training. It may happen adding PCs degrades the generalization of the model (loadings).

 For further information, consult Pirouette help file on "PCA cross-validation"

-----
 The meaning of k-fold:
 
 first, suppose you have a dataset with, say, 500 spectra.
 
   * 10-fold means that the cross-validation will split the 500 spectra 10 times into 450 training spectra and 50 
     testing spectra (btw, splitting is not sequential, spectra are taken randomly).
   * 20-fold means 20 different training and testing datasets of 475 and 25 spectra respectively.
   * 500-fold means 500 different training and testing datasets of 499 and 1 spectrum respectively, i.e., 
     500-fold, in this case, is equivalent to leave-one-out.
 
 

 j.trevisan@lancaster.ac.uk

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Thu 18-Feb-2010 12:47:47 by m2html © 2003