论文标题
汽车保险单的数据科学方法评估风险评估
A Data Science Approach to Risk Assessment for Automobile Insurance Policies
论文作者
论文摘要
为了确定合适的汽车保险保费,需要考虑三个因素,即与驾驶员和汽车有关的风险,与保单管理相关的运营成本以及所需的利润率。然后,溢价应该是这三个值的一定函数。我们专注于使用数据科学方法评估风险评估。而是使用传统的频率和严重性指标,而是预测新客户将使用当前和过去政策的历史数据提出的总索赔。鉴于策略的多个功能(驾驶员的年龄和性别,汽车的价值,以前的事故等)可以尝试根据这些功能专门基于这些功能提供个性化的保险单。我们可以计算所有过去和当前政策的平均索赔,并具有相同的功能,然后平均以这些索赔率。不幸的是,可能没有足够的样本来获得强大的平均值。相反,我们可以尝试包含“相似”的策略,以获得足够的样品以稳健的平均值。因此,我们将面临个性化(仅使用非常相似的政策)和鲁棒性(扩展域足以捕获足够的样本)之间的权衡。这就是所谓的偏见变化权衡。我们对此问题进行建模,并确定两者之间的最佳权衡(即提供最高预测准确性的余额),并将其应用于索赔利率预测问题。我们使用真实数据演示了我们的方法。
In order to determine a suitable automobile insurance policy premium one needs to take into account three factors, the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a Data Science approach. Instead of using the traditional frequency and severity metrics we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.) one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are "similar" to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the Bias-Variance Trade-off. We model this problem and determine the optimal trade-off between the two (i.e. the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.