🏆 Announcing our community engagement prizes!
🗨 Join the office hours on Discord! (Wednesday 2PM CET)
🔦 Overview
In this challenge, you will act as an insurance company, where you build a pricing model and compete against other players (other insurance companies) for profit. In other words, the player that maximises competitive profit is the winner.
The market in this challenge will be a cheapest-wins market
. That means every insurance company offers every customer an annual premium price, and the customer will always pick the company that offers the cheapest price to them (e.g., using a price comparison website).
In order to create your pricing model, you are given historical insurance data: 60K real historical car insurance policies for 4 consecutive years. Each policy concerns 1 vehicle, its drivers and an accident history over 4 years. This data has been provided by a large car insurance provider in a European country and is a uniform sample for their entire portfolio.
You are asked to produce a model to price contracts for incoming policies for the 5th year.
The company that makes the most profit in this market, wins the challenge.
💵 Cheapest-wins market
As a player or team you represent one insurance company. You will have to provide a premium quote for every policy (or customer) you encounter. So does every other company. But the customer will pick the cheapest price offered to them. This is illustrated bellow:
Now that companies 1 and 2 each have a set amount of revenue, they have to pay out the cost of the claims accociated with policies 1-4. So it will look like:
So, once the claims are taken into account in the cheapest-wins market, we can see that Company 1 wins as it has the most competitive profit.
🏇 Leaderboards
As an insurance company in this market, you will be responsible for a portfolio of policies that pay you annual premiums. In exchange, you cover their risk. If they make a claim, you will have to pay.
Therefore, to make money, you must:
- Estimate the expected loss for your portfolio, so your contracts are profitable
- Come up with a pricing strategy that allows you to compete with others, so that you win contracts
To reach these goals we provide you with two leaderboards.
Root Mean Squared Error (RMSE) leaderboard
This leaderboard always displays your best RMSE submission.
It uses your predict_expected_claim
function and measures how well you can estimate the risk of policy IDs. It functions by measuring the root mean squared error (RMSE) of your premiums compared to the cost of the claim.
The optimal way to minimise RMSE is done by a model that predicts the expected claim for each contract most accurately.
This allows you to compare the quality of your loss prediction model with your competitors however, there is no explicit reward for performing well on this leaderboard.
This leaderboard is refreshed instantaneously upon submissions, and will always use the same data, disjoint from the other leaderboard.
Please note: your RMSE score is computed in 4 stages. When you submit a model, your model makes predictions for:
- Year 1 with access to data from year 1.
- Year 2 with access to data from years 1 - 2.
- Year 3 with access to data from years 1 - 3.
- Year 4 with access to data from years 1 - 4.
Predictions from steps 1 - 4 are then used in the standard RMSE formula to compute your final RMSE score.
In this way you are to use past data to inform the present. For example predictions for year 3 can be informed by what happened in years 1 - 2.
Competitive profit leaderboard (Updated every Saturday at 10pm CET)
By default this leaderboard uses your most recent successful submission, but you can choose another submission as well thought this form that is also displayed at the top of the leaderboards.
It leaderboard uses your predict_premium
function. It measures your average competitive profit in a market of size 10 when playing against other players.
- Each week uses a new set of data to ensure that you don't price the same policy many times, like a real market.
- This measures
competitive profit
. That means your profit is averaged over many markets that you play in. - To make sure that results are stable, we keep putting you in markets until your leaderboard rank no longer changes from market to market.
Note: This leaderboard is updated every Saturday at 10PM CET with your most recent submission.
If you do consistently well on this leaderboard, you will likely do well in the final evaluation.
⚖ Evaluation metric
The final evaluation metric for this challenge is competitive profit
: how much money does your company make in a realistic market. However we also provide a leaderboard using root mean squared error.
Root mean squared error (RMSE)
Given a set of list of claims \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\) and a set of expected claims \(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\) predicted by your model, the RMSE is computed as:
\(x = {-b \pm \sqrt{b^2-4ac} \over 2a}\)
Competitive profit (the final metric)
The evaluation process is as follows:
- Compute average profit rank. First the average competitive profit that your model makes in a market of size 10 (i.e. with 9 other random players) is computed. This gives your model a profit rank.
- Compute realistic
competitive profit
. In a realistic market models that don't perform well don't exist (i.e. go bankrupt). So to compute the realisticcompetitive profit
, we place your model in a market of size 10 with 9 other models picked from the top 10% of the the ranking obtained in step 1.
Two important notes:
- The profit rank in step 1 is not used in the leaderboard. Only the ranking in step 2 is used in the leaderboard.
- Rankings from both steps are generated as a result of many many runs of different random markets. On average you can expect your model to have competed against every other model present at least once.
🚓 Market rules
There are two rules that your submissions to the profit leaderboard, and final submissions must follow:
- Non-negative training profit. Your models must be profitable on the training data. That is, the sum of your premiums must not be less than the sum of the claims.
- Participation rule. Your model must participate (i.e. win at least 1 policy) in 5% or more of the markets it is placed in.
📊 Weekly Market Feedback
In a real insurance market, every time you participate, you will get some feedback. In this game, each week thousands of markets are run! and you will get feedback about your performance in those markets.
You get two types of feedback:
- A plot and some KPIs
- Summary statstics about policies you have won
Below you can see examples of both of these. For more details on how they are computed please see here.
Example feedback plot (See here for details)
One example of the six feedback tables (see here for details)
💾 Dataset
You can download the dataset from the resources tab.
The dataset contains a total of 100K real historical car insurance policies over 5 years in the recent past.
This has been provided by a large car insurance provider in a European country and is a uniform sample from their entire portfolio.
You can find the data dictionary under the resources tab.
The majority of the data concerns third-party liability but there are also other types of car insurance (e.g. theft) present.
For this challenge, the data is split in the following way:
Training data
This is 60K policies with 4 years of history (~240K rows). It can be downloaded from the resources tab.
RMSE leaderboard
This contains 5K policies with 4 years of history (~20K rows).
10 weekly profit leaderboards
This contains a total of 30K policies with 4 years of history (~115K rows). It is split into 10 weeks such that:
- Weeks 1 - 5 each use approximately 7K rows of data from 15K policies with 4 years of history
- Weeks 6 - 10 each use approximately 20K rows of data from 30K policies with 4 years of history
No row of the data appears twice throughout the 10 weeks of leaderboards.
Test data
The final test dataset, where the final evaluation takes place, includes 100K policies for the 5th year (100K rows). To simulate a real insurance company, your training data will contain the history for some of these policies, while others will be entirely new to you.
📨 How to submit
There are two methods of submission available in this challenge:
- Submission through Google Colaboratory notebooks.
- Submission from a
.zip
file.
Submission via Colaboratory notebooks
- Python. Visit the starter notebook here.
- R. Visit the starter notebook here. Here is an end-to-end walkthrough for a simple R model.
The notebooks are self-contained and submissions are made through the notebooks themselves.
Submission via a ZIP file
Visit the starter-kit available here and follow the instructions.
Once you have prepared your .zip
file, you can then submit it using the link in the top right corner of the page here.
Note: baseline models are automatically excluded from the profit leaderboards.
🏆 Prizes
Amount | Prize sponsor | |
---|---|---|
1st | $6000 USD | |
2nd | $3500 USD | |
3rd | $1500 USD | |
4th | $1000 USD |
additional community engagement prizes
In addition to the 4 cash prizes for performance in the challenge. We also have community engagement prizes which are:
- A shiny mug
- A great T-shirt
For more information on how to win these please see there.
📅 Timeline
Launch: 18 December 2020 - 5:00pm UTC
Deadline: 07 March 2021 - 11:59pm UTC
Team formation deadline: 31 January 2021 - 11:59pm UTC
🌐 Research sponsors
This research is supported in part, by the following institutions.
📞 Contact
Participants
Getting Started
Notebooks
5
|
0
|
|
0
|
2
|
|
4
|
0
|
|
15
|
3
|
|
8
|
2
|