Home > lib > crossvalind.m

crossvalind

PURPOSE ^

CROSSVALIND generates cross-validation indices

SYNOPSIS ^

function [tInd,eInd] = crossvalind(method,N,varargin)

DESCRIPTION ^

CROSSVALIND generates cross-validation indices

   INDICES = CROSSVALIND('Kfold',N,K) returns randomly generated indices
   for a K-fold cross-validation of N observations. INDICES contains equal
   (or approximately equal) proportions of the integers 1 through K that
   define a partition of the N observations into K disjoint subsets.
   Repeated calls return different randomly generated partitions. K
   defaults to 5 when omitted. In K-fold cross-validation, K-1 folds are
   used for training and the last fold is used for evaluation. This
   process is repeated K times, leaving one different fold for evaluation
   each time.

   [TRAIN,TEST] = CROSSVALIND('HoldOut',N,P) returns logical index vectors
   for cross-validation of N observations by randomly selecting P*N
   (approximately) observations to hold out for the evaluation set. P must
   be a scalar between 0 and 1. P defaults to 0.5 when omitted,
   corresponding to holding 50% out. Using holdout cross-validation within
   a loop is similar to K-fold cross-validation one time outside the loop,
   except that non-disjointed subsets are assigned to each evaluation.

   [TRAIN,TEST] = CROSSVALIND('LeaveMOut',N,M), where M is an integer,
   returns logical index vectors for cross-validation of N observations by
   randomly selecting M of the observations to hold out for the evaluation
   set. M defaults to 1 when omitted. Using LeaveMOut cross-validation
   within a loop does not guarantee disjointed evaluation sets. Use K-fold
   instead.

   [TRAIN,TEST] = CROSSVALIND('Resubstitution',N,[P,Q]) returns logical
   index vectors of indices for cross-validation of N observations by
   randomly selecting P*N observations for the evaluation set and Q*N
   observations for training. Sets are selected in order to minimize the
   number of observations that are used in both sets. P and Q are scalars
   between 0 and 1. Q=1-P corresponds to holding out (100*P)%, while P=Q=1
   corresponds to full resubstitution. [P,Q] defaults to [1,1] when omitted.

   [...] = CROSSVALIND(METHOD,GROUP,...) takes the group structure of the
   data into account. GROUP is a grouping vector that defines the class for
   each observation. GROUP can be a numeric vector, a string array, or a
   cell array of strings. The partition of the groups depends on the type
   of cross-validation: For K-fold, each group is divided into K subsets,
   approximately equal in size. For all others, approximately equal
   numbers of observations from each group are selected for the evaluation
   set. In both cases the training set will contain at least one
   observation from each group.

   [...] = CROSSVALIND(METHOD,GROUP,...,'CLASSES',C) restricts the
   observations to only those values specified in C.  C can be a numeric
   vector, a string array, or a cell array of strings, but it is of the
   same form as GROUP. If one output argument is specified, it will
   contain the value 0 for observations belonging to excluded classes. If
   two output arguments are specified, both will contain the logical value
   false for observations belonging to excluded classes.

   [...] = CROSSVALIND(METHOD,GROUP,...,'MIN',MIN) sets the minimum number
   of observations that each group has in the training set. MIN defaults
   to 1. Setting a large value for MIN can help to balance the training
   groups, but adds partial resubstitution when there are not enough
   observations. You cannot set MIN when using K-fold cross-validation.

   Examples:

      % Create a 10-fold cross-validation to compute classification error.
      load fisheriris
      indices = crossvalind('Kfold',species,10);
      cp = classperf(species);
      for i = 1:10
          test = (indices == i); train = ~test;
          class = classify(meas(test,:),meas(train,:),species(train,:));
          classperf(cp,class,test)
      end
      cp.ErrorRate

      % Approximate a leave-one-out prediction error estimate.
      load carbig
      x = Displacement; y = Acceleration;
      N = length(x);
      sse = 0;
      for i = 1:100
          [train,test] = crossvalind('LeaveMOut',N,1);
          yhat = polyval(polyfit(x(train),y(train),2),x(test));
          sse = sse + sum((yhat - y(test)).^2);
      end
      CVerr = sse / 100

      % Divide cancer data 60/40 without using the 'Benign' observations.
      % Assume groups are the true labels of the observations.
      labels = {'Cancer','Benign','Control'};
      groups = labels(ceil(rand(100,1)*3));
      [train,test] = crossvalind('holdout',groups,0.6,'classes',...
          {'Control','Cancer'});
      sum(test) % Total groups allocated for testing
      sum(train) % Total groups allocated for training

   See also CLASSPERF, CLASSIFY, GRP2IDX, KNNCLASSIFY, SVMCLASSIFY.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Thu 18-Feb-2010 12:47:47 by m2html © 2003