More Website Templates @ TemplateMonster.com - September08, 2014!

Just Do it a Lot and Take the Average!

16th February 2016

As part of our MRes course, we are currently producing a poster on an aspect of the Simulation Master Class given by Professor Barry Nelson in January. The aspect that our group have chosen to focus on is the Sample Average Approximation (SAA), one of the many methods of Simulation Optimisation currently being researched. As mentioned in Optimisation and Error in Simul?ation, Simulation Optimisation is used when the objective function $f(x)$ cannot be calculated explicity but can be estimated. The particular cases in which SAA can be applied are those for which $f(x)$ is the expectation of some function that involves a random element, $\xi$, whose distribution is unknown and does not depend on $x$. $$f(x) = E[f(x,\xi)]$$

We did not cover SAA in detail in the Master Class, but the poster has made us look at it in more detail. In this I have found a very interesting topic mathematically, though it can be quit restricted in its application. The main paper I have read was a review of the subject by Sujin Kim, Raghu Pasupathyy and Shane G. Henderson from National University of Singapore, Virginia Technical University and Cornell University respectively.

The basis of the SAA method is to approximate $f(x)$ by a series of sample functions $f_n(x)$ that converge uniformly to the true objective function as $n\rightarrow\infty$. For this, we need a sample path of realisations of the source of randomness, $\boldsymbol{\xi} = (\xi_1,\xi_2,...,\xi_n)$. In practise, $\xi_i$ is the set of pseudorandom numbers generated by the simulation in the $i^{th}$ run. The sample functions are then defined by $$f_n(x) = \frac{1}{n}\sum_{i=1}^n f(x,\xi_i).$$ The hope is that as $n$ gets larger and larger, $f_n(x)$ tends to $f(x)$ in some respect. The property that we are looking for is uniform convergence, but somehow this needs to be linked in with the Law of Large Numbers. Fortunately for us, such a theorem does exist known as the Uniform Large of Large Numbers (ULLN). This states that if:

  • $D$ is a compact set
  • $f(x,\xi)$ is continuous at each $x\in D$ for almost all $\xi$ and a measurable function of $\xi$ at each $x$
  • there exists $d(\xi)$ with finite expectation and $|f(x,\xi)|\leq d(\xi)$

then $E[f(x,\xi)]$ is continuous in $\xi$ and $$\sup_{x\in D} \left|\frac{1}{n}\sum_{i=1}^nf(x,\xi_i) - E[f(x,\xi)]\right|\rightarrow 0$$ almost surely as $n\rightarrow\infty$. This is exactly what we require, and under certain conditions this is exactly what we get for the sample functions. It is crucial that the ULLN holds as it is this that allows the optimum value of the sample problems, $v_n^*$ to converge to the true problem's optimal value, $v^*$. Note that if $f(x)$ is convex on a compact set $D$, ULLN holds automatically.

The ULLN on its own is not sufficient for the method to give the optimal solution. The guide goes through a rather wonderful bit of theory that states what condtions are required for this to be true. In essence, we are after some form of convergence of the sets of optimal points of the sample functions, $\Pi^*_n$, to the set of optimal for $f(x)$, $\pi^*$. The convergence used is the distance between sets $A$ and $B$ defined as the largest distance of an element of $A$ to the set $B$. Under particular conditions, $\Pi^*_n$ converges to $\pi^*$ just as desired.

The best case scenario from the SAA performance point of view is when $f(x)$ has a single optimum point $x^*$. Obviously, this is guaranteed by convexity (which is often what is strived for in the field of optimisation), but it is not a requirement. In this situation situation, plus a few more conditions such as $f(\cdot,\xi)$ and $\nabla_xf(\cdot,\xi)$ being Lipschitz continuous almost surely on the compact set $D$ and $f$ satisfying quadratic growth close to $x^*$, then the distance between the optimal points of $f_n$, $X_n^*$, and $x^*$ is bounded in probability by $n^{-1/2}$. That is, for any $\epsilon>0$, there exists an $M>0$ such that $$Pr(\sqrt{n}||X_n^*-x^*||>M)\leq\epsilon$$ for all $n$. This is the best possible performance that SAA can achieve.

In cases where there are multiple optima, computational limitations mean that, despite the theory, the best we can hope for is a local optima of $f(x)$. This is problematic, and means that actually the method is no better that other simulation optimisers.

I do have a problem with SAA. Whilst it has some rather nice mathematics behind it, I am not quite sure of the point of it. It is very limited in its applicablitiy because of the many conditions that are required for it to work at all, never mind well. The properties required can also be difficult to prove. On top of this, the one thing that was always reiterated in the paper was that SAA is appropriate when Infinitesimal Perturbation Analysis (IPA) is valid. I discussed this in my blog post Optimisation and Error in Simul?ation. But Barry Nelson referred to (IPA) as a Gold Standard of Simulation Optimisation. If this IPA so good, why would you use something else when IPA is applicable? The paper even concludes that simple SAA does not perform well. However, perhaps alternative approaches based on the same idea will produce good results in the future.

References

[1] A Guide to Sample-Average Approximation, Sujin Kim, Raghu Pasupathyy and Shane G. Henderson (2011)