The more there is data sharing and open data, the more likely it is that some private data is unintentionally revealed. In 2006 AOL released a list of 20 million Web search queries from which the user no. 4417749 was uniquely identified as Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga [1]. In 2008, a team of researchers developed a method to de-anonymize the Netflix Prize dataset containing anonymous movie ratings of 500,000 Netflix subscribers using public IMDb ratings [2]. In early 2018, Fitness tracking app Strava gave away locations of secret US army bases [3].
These are examples where organizations created services or published data with good intentions, but due to various reasons revealed too much. This happens to some extent all the time all around us – one just has to look carefully. In this post we present a real world example where a company offering personal care services in Finland revealed too much on their public web page.
This post is structured as follows: First the company’s business model is introduced. Then the necessary details of the “leak” are laid out, and finally a brief analysis of the data is presented.
The company
The company offers personal care services in Finland and increasingly also internationally in 100+ locations. The business itself is traditional personal care service, but the twist is the individual shops do not take reservations. Instead the customers are required to enter the shop and join an electronic priority queue to be served.
A priority queue is a type of queue where a customer with higher priority is served before customers with lower priority irrespective of their time spent in the queue. If two customers have the same priority, they are served according to the order in which they joined the queue.
The priority of a customer is defined by their membership type. Using for example the Amex card colors, platinum members have the highest priority, then come the gold members, green members and finally non-members. In addition to the expedited service, the membership also comes with extra discounts and some number of free products and services of customer’s choice. The cost of a membership is between 400 and 800 euros annually.
The data
To save the customers the trouble and frustration of walking into an already crowded shop the company created a web service for checking the current queue length and composition at a particular shop. For example, a customer could see online that currently his favorite shop has five people in queue and three of them have higher priority than he has. Having only green membership, he could look for another shop nearby that has only non-members in the queue. This way he could greatly reduce the waiting time.
However, the company gave out much more information of the customers in the queues than just the counts by membership type. If one bothered to look at the JSON object of the queue data before June 2017, one could also find each customer’s internal ID, name and time spent in the queue. Such details enable one to perform a simple analysis on the company’s customer base. This is done in the next section.
The analysis
During the first half of 2017 the company’s services were queued for roughly 220,000 times i.e. 1,200 times per day. These visits were made by some 20,000 unique member customers and perhaps around 80,000 unique non-members. The unique members were 70 % green, 28 % gold and only 2 % platinum type customers. This makes sense, since the price of the green membership is little more than what a non-member would pay for using the services once a month.
Although platinum membership grants the customer ten free services more than a green membership, the number of times a customer uses the services remains rather constant, as shown in Table 1. Given that platinum membership is twice as expensive as green membership, the customers end up wasting hundreds of euros.
Table 1: Quantiles of the number of times a unique (member) customer queued in one of the company’s shops during the first half of 2017.
In fact, almost all members pay more in the form of membership than what is the value of the services they use: for green members the median overpayment could be around 200 euros, for gold 250 euros and for platinum 500 euros annually. The overpayment is the price of having expedited service less the amount of free products they receive from the company (whose total value is 20-60 euros).
Having expedited service does not remove the fact that if the shop is full, even a platinum member has to queue. On the other hand, if the shop is not full, even a non-member receives service instantly. However, the higher the customer’s priority, the less probable it is to wait longer than a customer with lower priority. The improvement is largest when a non-member becomes a green member: the probability of shorter waiting times is increased by 20 %, as seen in table 2.
Table 2: Probability of reduced waiting time compared to customers of different membership types. The customers need not to be in the same queue.
Upgrading from green membership to gold increases the probability of shorter waiting times by roughly 10 %. Further upgrades bring virtually no improvement into waiting times. This is obvious also from table 3, where the waiting times of gold and platinum members are practically identical. Again, the largest benefit comes to a non-member customer who becomes a green member: a drop of roughly 40 % in waiting times.
Table 3: Quantiles of waiting times in minutes by membership type.
To finish the analysis, we note that the most common name found in the data is Mikko K., which is shared by over 70 unique members. Why the company shared this information in the first place is beyond my comprehension.
Alkuperäinen kirjoitus on julkaistu LinkedIn-palvelussa.