Ebonus : from BIG to SMALL DATA (Usage-Based-Insurance)
Friday, 14 March 2014

 As the cost of technology has steadily declined over the last three years prompting insurers to explore usage-based insurance, issues of data storage and security have begun to pose serious problems. Anticipating a rapid increase in the number of “connected cars “ (vehicles with wireless internet access), many suppliers of telematic solutions have been considering instituting standards appropriate for the “Big Data” era while avoiding creating costly, energy-guzzling data-storage facilities.  

click here for french version


“Scoring” or “Continuous Recording” ?

To question how the data is recorded and secured, we must first examine the source : the issuer of such data.  In the field of telematic insurance, two solutions have been tried in terms of data issuers or data loggers:

 - “continuous recording” solutions that route a multitude of information collected throughout each journey to a computer server. This enormous volume of data is then treated and analyzed by powerful computers that provide a general picture of drivers’ behavior.  This is the main advantage of this option.  The analysis of these data allows the insurer to define a general marketing strategy and make a commercial offer targeted to the identified risks. However, due to the high cost of storage and data security, this type of solution is only possible in experimental phases, with a reduced number of vehicles. For commercial application on a larger scale, the number of parameters recorded is generally limited to simple data such as time of departure and arrival, driving time, or the position or acceleration and deceleration of the vehicle.

 - “scoring” solutions that calculate a grade inside the box itself based on criteria predefined by the insurer (for example: “do not travel between midnight and 6am”). If this option is particularly optimized in terms of the volume of data, the insurers will not have any information at their disposal with which to change their offer. This is the main problem with the scoring option. Not to mention that often the definition of the parameters involved in the scoring algorithm initially requires a large volume of data; data that generally are gathered during a phase of preliminary tests carried out by means of continuously-operating recorders.

We observe from the quantity of data transmitted that these two types of recorders are completely different and do not respond to the same need.  It’s the challenge of Big Data to define the quantity and the level of detail of the data needed to meet the current and future needs of insurers. 

 «It becomes urgent to ask oneself the right questions and define one’s real need, knowing that this need--and this is the difficult part-- will rapidly evolve over time just like the behavior of motorists. »


 Statistics : An efficient way to model road risks

To predict and assess risks using large volumes of data is not the easiest of tasks. Fortunately, in this area, a number of mathematical tools such as statistics can provide effective solutions for identifying trends, detecting differences, or understanding certain behavior.  Road safety organizations and insurers annually submit their figures and make their decisions using this same method. In addition, statistical tools are a simple and effective way to reduce the volume of data to store, begging the question of whether it would make more sense to aggregate all the data inside the boxes before transfer.



  Furthermore, this is the method favored by ebonus, and for three good reasons  :

  • The first reason was just mentioned: with virtually the same information, the volume of data to transmit is minimal, thus reducing the cost of transferring, storing, and securing the data.
  • The second reason relates to privacy. By aggregating the date directly in the box, before transmitting it, there can be no invasion of privacy.  In France, this is the conclusion the CNIL has reached in its recommendations published April 8, 2010 .
  • The third and final reason is that the aggregation of data onboard the vehicles is not affected by the limited size of the memory of telematics boxes. By directly aggregating the data on the fly, statistical updates can be performed every second and over an infinite period of time; whereas, with continuous recording, the data are collected at defined periods (every minute or every kilometer, for example) and for a limited period of time (1 month for example).



Finally, if these statistics are saved for a longer time period, they should be able to help insurers expand their legal protection contracts by introducing, in cases of at-fault accidents, additional support for customers who can show a previous history of exemplary driving.




Ebonus : from BIG DATA to SMALL DATA ?

In terms of driving, evaluating a risk with precision often requires correlating several variables like trip duration, speed, acceleration, or road type. Ebonus responds perfectly to this need by providing the insurer with statistical distributions in several dimensions, comprised of the 9 following variables:


- day of travel (day of the week)
- time of travel (time of day)
- type of road
- speed
- longitudinal and transversal accelerations
- braking
- length of trips
- distance traveled
- calendar period (holidays,...)


 What’s more, each of these distributions can be reconfigured remotely as desired, redefining the following (list not exhaustive):

- Nv:  number of variables (1, 2, 3, ...)
- Tv:  type of variables (speed, road type, …)
- Iv:   intervals of values associated with each of these variables (« Road », [0..30km/h], [4pm..6pm],…)
- Fe:  the frequency of sampling (1Hz, 10Hz, …)
- Da:  the total duration of acquisition before transfer (1 day, 1 month, 1 year,…)

 according to the formula  : D={ Nv, [Tv], [Iv], Fe, Da }

Thus, an insurer who wishes to gather specific data before launching an offer on an untapped market segment can easily begin by proposing an offer based on mileage, for example, while recovering a distribution of 3 or 4 variables to validate his project.

Likewise, for a field test, it is completely possible to configure the aggregation module to transmit the statistical results every 10 minutes, while for a commercial offer, this same information transfer could be done every month or every quarter, for example.

If we now look more closely at the volume of data that makes up these statistics, we quickly see that the storage of a 3-dimensional distribution like the one defined below (see the following illustration) :

 Di ={ 3, [Speed, Road type, Time ], [(0,50,90,110,130 km/h), (« Roads», « National – Regional/State» « Areas»),(Day [7am..9pm], Night [9pm..7am])], 1second, 30 day}


requires no more than 160 bytes, the size of an SMS, regardless of the duration of total acquisition time. (In other words, it only takes 160 bytes, the same as a text message, to upload these statistics, no matter how long it took to gather the data.)  This means that with one single MMS of 50 kilobytes, it is possible to gather 300 3-dimensional distributions from the vehicle at practically no cost, and one 50 gigabyte hard drive will be largely sufficient to save this number of distributions for 1 million connected cars.  

Suffice it to say that with ebonus, we’re much closer to  LOW DATA than to BIG DATA.



Finally, storing data in a raw form or aggregate form does not have to be liable to the same regulatory constraints. In France, for example, the CNIL limits the storage of raw geo-location data whose exploitation could be above and beyond the needs of insurers  to only the time necessary to calculate the insurance premium (1). If this constraint significantly reduces flexibility, it is just as legitimate to ask the question: what is the interest of storing massive amounts of data in a format that we may not be able to use in the future?


More informations :

e-bonus, a Pay-As-You-Drive technology optimized for data centers and which respects your private life.

Contact us from now..


Next >