This document provides:
Global Sensitivity Analysis is a set of mathematical techniques which investigate how uncertainty in the output of a numerical model can be attributed to variations of its input factors.
The benefits of applying GSA are:
Better understanding of the model: Evaluate the model behaviour beyond default set-up
“Sanity check” of the model: Test whether the model "behaves well" (model validation)
Priorities for uncertainty reduction: Identify the important inputs on which to focus computationally-intensive calibration, acquisition of new data, etc.
More transparent and robust decisions: Understand the main impacts of input uncertainty on the modelling outcome and thus model-informed decisions
GSA investigates how the uncertainty of the selected model input factors influences the variability of the model output.
An 'input factor' is any element that can be changed before running the model. In general, input factors could be the equations implemented in the model, set-up choices needed for the model execution on a computer, the model's parameters and forcing input data.
In our example of the PiWind model, the input factors subject to GSA will be the wind intensity in the hazard footprint maps; a parameter of the vulnerability curves; and the exposure data in the modelled domain.
An 'output' is any variable that is obtained after the model execution.
In the PiWind model the output is the expected loss due to windstorms.
The key steps of GSA are summarised in the figure below.
Before executing the model, we will sample the inputs from their ranges of variability and then repeatedly run the model so that for each simulation all the inputs vary simultaneously (Input Sampling step). For every output of interest a probability distribution is obtained, after which GSA is performed, to obtain a set of sensitivity indices for each output. The sensitivity indices measure the relative influence of each input factor on the output (References 4,5).
In general, GSA can have three specific purposes:
Ranking (or Factor Prioritization) to rank the input factors based on their relative contribution to the output variability. This allows to prioritise resources to reduce uncertainty, so you know on which input factor(s) to focus.
Screening (or Factor Fixing) to identify the input factors, if any, which have a negligible influence on the output variability. The input factors which are found to have negligible influence can be fixed to their default values.
Mapping to determine the regions of the inputs' variability space which produce output values of interest (e.g. extreme values). For example this can be useful when you want to know which values of your inputs produce an output below or above a threshold of interest.
Before applying GSA, let's have an overview of the catastrophe model used here.
PiWind is a wind storm model for a small area of the UK, centred on the town of Melton Mowbray. The data is mocked up to illustrate data formats and functionality, and is in no way meant to be a usable risk model.
Lets plot the area peril cells on a map of the UK. For this model, the area perils are a simple uniform grid in a square.
The main components of the PiWind model are:
For each event PiWind generates hazard footprint, which calculates an appropriate hazard metric at each grid point across the entire area effected by the event. For example in the following we will use as hazard metric the maximum 3-second peak gust experienced at every location during the course of the windstorm (stored as “footprint hazard”).
As a matter of example, let's visualise hazard footprints of 5 different events.
Vulnerability curves link the hazard metric (here the 3-second peak gust) to a Mean Damage Ratio (MDR), i.e. the proportion of the total value (e.g. in terms of replacement cost) that would be lost for the asset being analysed. In reality, properties exhibit a high amount of variability in their damage to the same hazard, due to many unknown and un-modellable factors, even when located very close to each other. This is accounted for by defining a probability distribution of losses around the mean damage ratio at each hazard point (this is also known as “secondary uncertainty”).
The PiWind model has seperate vulnerability curves for Residential, Commercial and Industrial occupancies. Let's visualise these curves.
This is model specific logic that maps a set of exposure attributes into the model specific grid and vulnerability type. A unique mapping is made for each location, coverage and peril combination. This also provides informative messages about any exposures that will not be modelled. An exposure may not be modelled if there is insufficiently detailed address information, or if the exposure is not within the geographic scope of the model.
To run the model we need some test exposure data. Lets have a look at an example Location and Account file.
Calculates losses after taking into account the impact of insurance company policy terms and conditions to provide the net loss that the (re)insurance entity will ultimately be responsible for. The (re)insurance company enters a list of all the policies it has underwritten with information about the location and risk characteristics, such as occupancy type, age, construction material, building height, and replacement cost of the building, as well as policy terms & conditions. The catastrophe model will then run the entire event set across the portfolio, and calculate a loss from every event in the model to every policy.
The input factors in this case study are:
We use random sampling to generate N = 100 combinations of the uncertain input factors. To sample the inputs factors we have defined the distribution of the inputs (uniform) and their range (-20 ÷ 20% for footprint, -20 ÷ 20% for vulnerability and 10 ÷ 50% for exposure)
For each sampled input factors combination, we run the PiWind model and save the associated model output.
footprint input = multiplier to wind intensity [range: 0.8 - 1.2]
vulnerability input = multiplier to the probability of maximum damage (then the probabilities for each intensity are normalised) [range: 0.1 - 10]
exposure input = % of buildings which positions is swapped with a neighbour building three cells apart [range: 10 - 50%]
Let’s now compute the sensitivity indices: for example we can use a Regional Sensitivity Analysis (RSA) approach.
RSA requires to sort the output samples and then to split them into a certain number of groups (defined by the used). Afterwards, RSA identifies the sub-samples in the inputs space that produced the outputs in each group and compute the cumulative distribution function (CDF) of each sub-sample. Finally, the sensitivity indices are defined as the (mean) maximum vertical distance between the CDFs of the various groups (see schematic below).
Here we divide the output samples into 5 groups, where each group contains the same number of samples.
Below the CDFs of each group for each input are plotted, along with the sensitivity indices.
In order to assess the robustness of the estimated sensitivity indices, bootstrapping is performed (here we resample 100 times). The 95% confidence intervals of the sensitivity indices are plotted below.
The results show that we can confidently say that the vulnerability is the most influential input factor, followed by footprint, and then exposure.
Please cite this document as:
Noacco V., Pianosi F., Wagener T. (2020) Workflow to apply Global Sensitivity Analysis to the PiWind model. Available at: https://safe-insurance.uk/GSA_SAFE_OASIS_PiWind.html
This work has been funded by the UK Natural Environment Research Council (NERC):
KE Fellowship: NE/R003734/1