My colleague Claire Souch recently discussed the most important step in model blending: individual model validation. Once models are found suitable—capable of modeling the risks and contracts you underwrite, suited to your claims history and business operations, and well supported by good science and clear documentation—why might you blend their output?
Blending Theory
In climate modeling, the use of multiple models in “ensembles” is common. No single model provides the absolute truth, but individual models’ biases and eccentricities can be partly canceled out by blending their outputs.
This same logic has been applied to modeling catastrophe risk. As Alan Calder, Andrew Couper, and Joseph Lo of Aspen Re note, blending is most valid when there are “wide legitimate disagreements between modeling assumptions.” While blending can’t reduce the uncertainty from relying on a common limited historical dataset or the uncertainty associated with randomness, it can reduce the uncertainty from making different assumptions and using other input data.
Caution is necessary, however. The forecasting world benefits from many models that are widely accepted and adopted; by the law of large numbers, the error is reduced by blending. Conversely, in the catastrophe modeling world, fewer points of view are available and easily accessible. There is a greater risk of a blended view being skewed by an outlier, so users must validate models and choose their weights carefully.
Blending Weights
Users have four basic choices for using multiple valid models:
- Blend models with equal weightings, without determining if unequal weights would be superior
- Blend models with unequal weightings, with higher weights on models that match claims data better
- Blend models with unequal weightings, with higher weights on models with individual components that are deemed more trustworthy
- Use one model, optionally retaining other models for reference points
On the surface, equal weightings might seem like the least biased approach; the user is making no judgment as to which model is “better.” But reasoning out each model’s strengths is precisely what should occur in the validation process. If the models match claims data equally well and seem equally robust, equal weights are justified. However, blindly averaging losses does not automatically improve results, particularly with so few models available.
Users could determine weights based on the historical accuracy of the model. In weather forecasting, this is referred to as “hindcasting.” RMS’ medium-term rate model, for example, is actually a weighted average of thirteen scientific models, with higher weights given to models demonstrating more skill in forecasting the historical record.
Similarly, cat model users can compare the modeled loss from an event with the losses actually incurred. This requires detailed claims data and users with a strong statistical background, but does not require a deep understanding of the models. An event-by-event approach can find weaknesses in the hazard and vulnerability modules. However, even longstanding companies lack a long history of reliable, detailed claims data to test a model’s event set and frequencies.
Weights could also differ because of the perceived strengths of model components. Using modelers’ published methodologies and model runs on reference exposures, expert users can score individual model components and aggregate them to score the model’s trustworthiness. This requires strong scientific understanding, but weights can be consistently applied across the company, as a model’s credibility is independent of the exposure.
Finally, users may simply choose not to blend, and to instead occasionally run a second or third model to prompt investigations when results are materially different from the primary model.
So what to do?
Ultimately, each risk carrier must consider its personal risk appetite and resources when choosing whether to blend multiple models. No approach is definitively superior. However, all users should recognize that blending affects modeled loss integrity; in our next blog, we’ll discuss why this happens, and how these effects vary by the chosen blending methodology.