Scenario Generation ft. Copulas

In February, we started the Masterclasses in the STOR-i programme and the second we had, called “Modelling with Stochastic Programming”, was given by Dr. Stein Wallace from the Norwegian School of Economics. He covered a range of topics within Stochastic Programming and one of them was how to generate scenarios using copula-based methodologies. As I’ve never thought of copulas as a tool to generate scenarios, I decided to talk about it, briefly, in this blog post.

Stochastic Programming

Let me start by explaining what is Stochastic Programming. In recent years, Stochastic Programming has become a useful tool to analyse and model decision problems when there is uncertainty. These models are usually based on multivariate probability distributions that express this uncertainty in the input data. Moreover, most applications deal with discrete probability distributions which in turn are described by a list of scenarios (i.e., realisations) and their related probabilities. As in most of the cases the multivariate distributions don’t have a suitable form for an optimisation model, a process to transform the distribution to scenarios is needed. Such process is called Scenario Generation.

Usually, scenario-generation methods that describe the marginal distributions and the multivariate structure, respectively, using the first 4 moments and the correlation matrix (usually the Pearson’s correlation is applied here) are adopted. However, the correlation is limited as it often describes linear relationships. Thus, non-linear dependencies aren’t captured, as well as no information about the shape of the distribution is given. For elliptical shape distributions, such as the normal and student t distributions, there might not be a problem but, for more complex ones, these methods may prove to be inadequate. That is why we use copulas.

But what are copulas?

A copula is a function that allows us to separate the multivariate structure from its marginal distributions. Mathematically, we have

F\left(x_{1},\ldots,x_{n}\right) = C\left(F_{1}(x_{1}), \ldots, F_{n}(x_{n})\right),

where F is the n- dimensional cumulative distribution function (cdf) with marginal distributions F_{1},\ldots, F_{n} and C: [0,1]^n \rightarrow [0,1] is the copula. In addition, if F_{i}, \; i=1, \ldots, n are continue, C is unique. One important feature of copulas is that, if we change the margins, the copula won’t change. Additionally, if the copula has a certain statistical property, a transformation of the margins won’t affect it.

For scenario generation problems, we are interested in the empirical copula, which is given by

C\left(\frac{k_{1}}{n_{S}},\ldots,\frac{k_{n}}{n_{S}}\right)=\frac{\mid A\mid}{n_{S}},

where \mid A \mid is the cardinality of the set A= \left\{s: \text{rank}(x_{is})\leq k_{i}, \; \forall i \in \{1,\ldots,n\}\right\} and n_{S} is the number of scenarios.

Enough of maths. What are the advantages of copulas in scenario generation? Well, copulas allow us to disconnect, say, the margins from the multivariate structure and thus model these two independently from each other. So, we’re able to generate the margins using standard sampling methods, for example, for univariate distributions rather for multivariate ones. This dissociation allows for new possible scenario generation, such as

  • Combining different copulas and margins
  • Introducing asymmetry
  • Using principal components

Method

One possible method is presented in the paper referenced at the end of the post. The authors’ goal is to generate n_{S} samples from a given multivariate distribution of dimension n. The outcomes will, then, have a copula associated with them, called scenario copula. The steps are briefly presented below

  1. Create the scenario copula, that is, n_{S} scenarios. Each of the scenarios will consist of ranks of values to use from each of the n margins.
  2. Generate the values of each margin.

Having the copula (multivariate structure) and the values of the margins, we just need to associate the margins in the way specified by the coupling of ranks.

Conclusion

The use of copulas in scenario generation allows us to combine different approaches since now we’re able to separate the structure from the univariate marginal distributions. For instance, it’s now possible to sample from the distribution to obtain an approximation of the multivariate structure without having to rely on the same sample for the margins. In addition, the margins can be set up with the most suitable methods. Such methods may not be applicable in a multivariate setting.

What now?

If you found this post interesting and want to read more about this topic, the method is presented in this paper

A heuristic approach for generating scenarios for two-stage stochastic programs is presented in the paper

I hope you liked. See you on my next post!