Out[1]:

Workflow to apply Global Sensitivity Analysis to the PiWind model

Valentina Noacco, Francesca Pianosi, Thorsten Wagener (University of Bristol)

License: MIT

This document provides:

  • a brief introduction to Global Sensitivity Analysis (GSA);
  • an introduction to the OASIS LMF PiWind model, which is a toy UK windstorm model (Reference 1);
  • a workflow to apply GSA to PiWind using the SAFE (Sensitivity Analysis For Everybody) toolbox (References 2-3).

What is Global Sensitivity Analysis? and why shall we use it?

Global Sensitivity Analysis is a set of mathematical techniques which investigate how uncertainty in the output of a numerical model can be attributed to variations of its input factors.

The benefits of applying GSA are:

  1. Better understanding of the model: Evaluate the model behaviour beyond default set-up

  2. “Sanity check” of the model: Test whether the model "behaves well" (model validation)

  3. Priorities for uncertainty reduction: Identify the important inputs on which to focus computationally-intensive calibration, acquisition of new data, etc.

  4. More transparent and robust decisions: Understand the main impacts of input uncertainty on the modelling outcome and thus model-informed decisions

How Global Sensitivity Analysis works

GSA investigates how the uncertainty of the selected model input factors influences the variability of the model output.

An 'input factor' is any element that can be changed before running the model. In general, input factors could be the equations implemented in the model, set-up choices needed for the model execution on a computer, the model's parameters and forcing input data.

In our example of the PiWind model, the input factors subject to GSA will be the wind intensity in the hazard footprint maps; a parameter of the vulnerability curves; and the exposure data in the modelled domain.

An 'output' is any variable that is obtained after the model execution.

In the PiWind model the output is the expected loss due to windstorms.


The key steps of GSA are summarised in the figure below.

Before executing the model, we will sample the inputs from their ranges of variability and then repeatedly run the model so that for each simulation all the inputs vary simultaneously (Input Sampling step). For every output of interest a probability distribution is obtained, after which GSA is performed, to obtain a set of sensitivity indices for each output. The sensitivity indices measure the relative influence of each input factor on the output (References 4,5).

Out[3]:

What can we learn from GSA?

In general, GSA can have three specific purposes:

  1. Ranking (or Factor Prioritization) to rank the input factors based on their relative contribution to the output variability. This allows to prioritise resources to reduce uncertainty, so you know on which input factor(s) to focus.

  2. Screening (or Factor Fixing) to identify the input factors, if any, which have a negligible influence on the output variability. The input factors which are found to have negligible influence can be fixed to their default values.

  3. Mapping to determine the regions of the inputs' variability space which produce output values of interest (e.g. extreme values). For example this can be useful when you want to know which values of your inputs produce an output below or above a threshold of interest.

Before applying GSA, let's have an overview of the catastrophe model used here.

PiWind model structure

Out[4]:

PiWind is a wind storm model for a small area of the UK, centred on the town of Melton Mowbray. The data is mocked up to illustrate data formats and functionality, and is in no way meant to be a usable risk model.

Lets plot the area peril cells on a map of the UK. For this model, the area perils are a simple uniform grid in a square.

Main components of PiWind:

The main components of the PiWind model are:

1. Hazard footprint maps:

For each event PiWind generates hazard footprint, which calculates an appropriate hazard metric at each grid point across the entire area effected by the event. For example in the following we will use as hazard metric the maximum 3-second peak gust experienced at every location during the course of the windstorm (stored as “footprint hazard”).

As a matter of example, let's visualise hazard footprints of 5 different events.

2. Vulnerability curves:

Vulnerability curves link the hazard metric (here the 3-second peak gust) to a Mean Damage Ratio (MDR), i.e. the proportion of the total value (e.g. in terms of replacement cost) that would be lost for the asset being analysed. In reality, properties exhibit a high amount of variability in their damage to the same hazard, due to many unknown and un-modellable factors, even when located very close to each other. This is accounted for by defining a probability distribution of losses around the mean damage ratio at each hazard point (this is also known as “secondary uncertainty”).

The PiWind model has seperate vulnerability curves for Residential, Commercial and Industrial occupancies. Let's visualise these curves.

3. Exposure data:

This is model specific logic that maps a set of exposure attributes into the model specific grid and vulnerability type. A unique mapping is made for each location, coverage and peril combination. This also provides informative messages about any exposures that will not be modelled. An exposure may not be modelled if there is insufficiently detailed address information, or if the exposure is not within the geographic scope of the model.

To run the model we need some test exposure data. Lets have a look at an example Location and Account file.

Out[14]:
ACCNTNUM LOCNUM POSTALCODE STATECODE COUNTYCODE LATITUDE LONGITUDE BLDGSCHEME BLDGCLASS OCCSCHEME ... WSCV5DED WSCV6DED WSCV7DED WSCV8DED WSCV9DED WSCV10DED WSSITELIM WSSITEDED WSCOMBINEDLIM WSCOMBINEDDED
0 11111 10002082046 LE13 0HL 1 1 52.766981 -0.895470 RMS 0 R ... 0 0 0 0 0 0 0 0 0 0
1 11111 10002082047 LE13 0HL 1 1 52.766980 -0.895366 RMS 0 R ... 0 0 0 0 0 0 0 0 0 0
2 11111 10002082048 LE13 0HL 1 1 52.766978 -0.895248 RMS 0 R ... 0 0 0 0 0 0 0 0 0 0
3 11111 10002082049 LE13 0HL 1 1 52.766961 -0.895474 RMS 0 R ... 0 0 0 0 0 0 0 0 0 0
4 11111 10002082050 LE13 0HL 1 1 52.766958 -0.895353 RMS 0 R ... 0 0 0 0 0 0 0 0 0 0

5 rows × 50 columns

Out[15]:
ACCNTNUM POLICYNUM POLICYTYPE UNDCOVAMT PARTOF MINDEDAMT MAXDEDAMT BLANDEDAMT BLANLIMAMT
0 11111 Layer1 2 500000 5000000 0 0 0 0.3
1 11111 Layer2 2 5500000 100000000 0 0 0 0.3

4. Financial module:

Calculates losses after taking into account the impact of insurance company policy terms and conditions to provide the net loss that the (re)insurance entity will ultimately be responsible for. The (re)insurance company enters a list of all the policies it has underwritten with information about the location and risk characteristics, such as occupancy type, age, construction material, building height, and replacement cost of the building, as well as policy terms & conditions. The catastrophe model will then run the entire event set across the portfolio, and calculate a loss from every event in the model to every policy.

Workflow for the application of GSA to the PiWind model

Step 1: Define input factors subject to GSA

The input factors in this case study are:

  • footprint (wind intensity is varied)
  • vulnerability (probability of maximum damage is varied)
  • exposure (% of building position that are wrong is varied)

Step 2: Sample inputs space

We use random sampling to generate N = 100 combinations of the uncertain input factors. To sample the inputs factors we have defined the distribution of the inputs (uniform) and their range (-20 ÷ 20% for footprint, -20 ÷ 20% for vulnerability and 10 ÷ 50% for exposure)

Examples of varying the input factors

Out[19]:

Step 3: Run the model with the perturbed inputs

For each sampled input factors combination, we run the PiWind model and save the associated model output.

Step 4: Check model behaviour by visualising input/output samples

Load input and output data

footprint input = multiplier to wind intensity [range: 0.8 - 1.2]

vulnerability input = multiplier to the probability of maximum damage (then the probabilities for each intensity are normalised) [range: 0.1 - 10]

exposure input = % of buildings which positions is swapped with a neighbour building three cells apart [range: 10 - 50%]

Step 5: Compute sensitivity indices with Regional Sensitivity Analysis

Let’s now compute the sensitivity indices: for example we can use a Regional Sensitivity Analysis (RSA) approach.

RSA requires to sort the output samples and then to split them into a certain number of groups (defined by the used). Afterwards, RSA identifies the sub-samples in the inputs space that produced the outputs in each group and compute the cumulative distribution function (CDF) of each sub-sample. Finally, the sensitivity indices are defined as the (mean) maximum vertical distance between the CDFs of the various groups (see schematic below).

Out[24]:

Here we divide the output samples into 5 groups, where each group contains the same number of samples.

Below the CDFs of each group for each input are plotted, along with the sensitivity indices.

Step 6: Assess robustness by bootstrapping

In order to assess the robustness of the estimated sensitivity indices, bootstrapping is performed (here we resample 100 times). The 95% confidence intervals of the sensitivity indices are plotted below.

The results show that we can confidently say that the vulnerability is the most influential input factor, followed by footprint, and then exposure.

References

Please cite this document as:

Noacco V., Pianosi F., Wagener T. (2020) Workflow to apply Global Sensitivity Analysis to the PiWind model. Available at: https://safe-insurance.uk/GSA_SAFE_OASIS_PiWind.html

Acknowledgements

This work has been funded by the UK Natural Environment Research Council (NERC):

KE Fellowship: NE/R003734/1