This document provides a brief introduction to Global Sensitivity Analysis (GSA) and it provides a workflow to perform GSA in R using the SAFE (Sensitivity Analysis For Everybody) toolbox (see References 1-2), using as an example an actuarial pricing model.
We consider both the case where the model is run in R and where it is run in a different environment (such as Excel), in the latter case we guide the user through the steps to upload the simulated model results run in Excel.
The model is an actuarial pricing model where we test the influence of variations in four inputs (i.e. Frequency Trend, Severity Trend, Exposure Trend and Loss Development Pattern) on the uncertainty of the output (Losses).
But first,
Sensitivity Analysis (SA) is:
a set of mathematical techniques which investigate how uncertainty in the output of a numerical model can be attributed to variations of its input factors.
Benefits:
Better understanding of the model
Evaluation of model behaviour beyond default set-up
“Sanity check” of the model
Does the model meet the expectations (model validation)?
Prioritize investments for uncertainty reduction
Identify sensitive inputs for computer-intensive calibration, acquisition of new data, etc.
More transparent and robust decisions
Understand main impacts of uncertainty on modelling outcome and thus on decisions
Let’s say we want to test how the uncertainty of 4 model inputs (or assumptions) influence the variability of the model output.
The input factor is any element that can be changed before running the model. In general, input factors could be equations implemented in the model, set-up choices needed for the model execution on a computer, parameters and input data.
In our model example the input factors could be continuous and discrete variables, or the distribution of an input (in which case we want to investigate how changing the distribution of that input influences the uncertainty of the output).
The output can be any variable that is obtained after the model’s execution.
Before evaluating the model, we will simulate the inputs in their range of variability and then run the model so that for each simulation all the 4 inputs vary simultaneously (Input Sampling step). For every output of interest a probability distribution is obtained, after which sensitivity analysis with the method of choice is performed, which allows to obtain a set of sensitivity indices for each output (i.e. one per input, which shows the relative influence input factors have on the output) (Reference 4).
It depends on the question you want SA to answer.
In general, SA can have three purposes:
Ranking (or Factor Prioritization) to rank the input factors based on their relative contribution to the output variability.
Screening (or Factor Fixing) to identify the input factors, if any, which have a negligible influence on the output variability.
Mapping to determine the region of the input variability space which produces output values of interest (e.g. extreme values).
There are 3 main steps in the GSA workflow:
Input Sampling
Model Evaluation
Post Processing (actual GSA routine)
But before starting there are a few choices one should take:
Possibly use more than one SA method to verify the results consistency.
Install and load the packages below.
library(caTools)
library(calibrater) # Install from tar file, also available at: https://people.maths.bris.ac.uk/~mazjcr/calibrater_0.51.tar.gz
library(SAFER) # Install from zip
library(ggplot2)
The Input Sampling step can be done either in R or in Excel, if done in Excel skip to Step 5b.
The distribution of the inputs DistrFun
and their range DistrIn
can be chosen by expert judgement, available data (e.g. portfolio or market data) or literature.
DistrFun <- "unif" # Inputs distribution
DistrIn <- list( c(0, 1), c(0, 1), c(0, 1), c(0, 1)) # Range of each input
x_labels <- c("Freq. trend","Sev. trend","Exposure trend", "Dev. pattern") # Name of inputs
The number of model evaluations (N) typically increases proportionally to the number of input factors (M) and will depend on the SA method chosen too. As a rule of thumb, it may require more than 10 model evaluations per input factor (M) for the most frugal methods (e.g. Elementary Effect Test) and more than 1000 model evaluations per M for the more expensive methods (e.g. Variance-Based) (see References 3-4 for further details).
SampStrategy <- "lhs" # Here the sampling strategy for All At the Time (AAT) sampling is
# Latin hypercube (another option is random uniform)
N <- 500 # Sample size
M <- length(DistrIn) # Number of inputs
X_s <- AAT_sampling(SampStrategy, M, DistrFun, DistrIn, N) # Sample inputs space
colnames(X_s) <- x_labels # Set columns names
write.csv(X_s, file = "Input_samples.csv", row.names = FALSE)
Y <- actuarial_model(X_s) # Where 'actuarial_model' is your chosen model
X1 <- X_s[,1]
X2 <- X_s[,2]
X3 <- X_s[,3]
X4 <- X_s[,4]
Run the model in Excel. Then load the file with the output simulations (one row per simulation and one column per input sampled and per simulated output).
M <- 4 # Define number of input factors (if model was run in Excel)
DataSA <- read.csv("Results_anonym_500samples.csv", header = T, colClasses = c(rep("numeric",M)))
head(DataSA) # Display first rows to check format
## output X1 X2 X3 X4
## 1 0.12222604 0.25 0.25 0.50 0.2
## 2 0.68218255 1.00 0.00 0.25 0.2
## 3 0.00000000 0.25 0.50 0.50 0.0
## 4 0.01440512 1.00 0.50 0.50 0.6
## 5 0.32099952 0.00 0.50 0.75 0.6
## 6 1.86312468 0.00 1.00 0.75 0.4
If data contain NA or errors, as in this case, remove the corresponding rows.
idxn <- is.na(DataSA$output) # Get index of rows with NA
Y <- DataSA$output[!idxn] # Assign to Y output without NA, do the same for Xi:
X1 <- DataSA$X1[!idxn]
X2 <- DataSA$X2[!idxn]
X3 <- DataSA$X3[!idxn]
X4 <- DataSA$X4[!idxn]
X <- matrix(c(X1,X2,X3,X4), nrow = length(X1), ncol = M)
Use SAFER function scatter_plots
to visualise inputs/output
x_labels <- c("Freq. trend","Sev. trend","Exposure trend", "Dev. pattern")
sz_tx <- 12 # Font size for plots
N <- length(Y) # Get number of samples (now without NA)
colnames(X) <- x_labels # Set column names
scatter_plots(X, Y, prnam = x_labels) + ylab("Losses (in million ?)") +
xlab("Input value") + theme(text = element_text(size=sz_tx))
Question: From these scatter plots, which input factor would you say is most influential? Why?
Are there input factors that are not influential at all?
\[\begin{array}{l} -------------------------------------------------------------- \\ \\ -------------------------------------------------------------- \\ \\ -------------------------------------------------------------- \\ \end{array}\]
Let’s now apply Sensitivity Analysis: for example Regional Sensitivity Analysis (RSA), which aims at identifying regions in the inputs space corresponding to particular regions (e.g. high or low) of the output.
RSA requires to sort the output and then to spilt the output into different groups. Afterwards, we identify regions in the inputs space which produced output in each group.
So let’s divide the output into n_groups
number of groups, where each group contains the same number of samples.
n_groups <- 5; # Number of groups into which the output is splitted, default = 10
flag <- 2; # where flag: statistic for the definition of the RSA index (scalar)
# flag <- 1: stat = median (default)
# flag <- 2: stat = maximum
rsa_gr <- RSA_indices_groups(X, Y, n_groups,flag)
# Outputs
mvd <- rsa_gr$stat # mvd: maximum vertical distance between CDFs (sensitivity index) (see Steps 10-11)
idx <- rsa_gr$idx # idx: index which divides the output into different groups
Yk <- rsa_gr$Yk # Yk: output limit of each group
Let’s now replot the results with the function scatter_plots
where each group of inputs corresponds to a range of output (as estimated in Step 8) and is plotted with a different colour.
scatter_plots(X, Y, ngr = n_groups, prnam = x_labels) +
ylab("Losses (in million ?)") + xlab("Input value") +
theme(text = element_text(size=sz_tx))
Here the CDFs of each input are plotted (where the inputs are divided among different groups depending on the range of output they produce, as in Step 8).
RSA_plot_groups(X, idx, Yk, prnam = x_labels) + xlab("Input value") +
theme(text = element_text(size=sz_tx))