Risk ManagementRisk-Based Security for Executives

AT&T Privacy Policy Change Risks and the De-Anonymization of Data

AT&T recently modified their privacy policy where they will be selling aggregated data of their subscribers usage information for marketing purposes.

This is following in the footsteps of other carriers. In their blog post they state that this data in anonymized. To anonymize the data AT&T claims they remove “name, address and telephone number that can reasonably be used to identify you”.

However, they keep information such as city, state and zip as well as cell and Wi-Fi location data, websites visited, applications installed amongst other seemingly innocuous data points.

On the surface one would assume this is sufficient, they have removed all data that would “reasonably” be used to identify you.

In the world of privacy and Big Data, “reasonable” can be a weasel-word, a term used to make the consumer feel better, but in reality provides the company a license to access and share more information than you would suspect.

The data that makes up what is termed Personally Identifiable Information (PII) is no longer static. Data that can identify you has become a moving target, it used to be over information such as name, Social Security Number, address etc, but it has become increasingly easy to uniquely identify people with composite pieces of otherwise seemingly innocuous data.

Three Data Points to Define You

Research has been done showing that armed with only a person’s ZIP code, birth date and sex, 87% of Americans can be uniquely identified. For example in my ZIP code there are only 207 males that share my birth year and only 1 male with my specific birth date.

In an online survey I could ask you highly personal questions and claim your answers are completely anonymous, but that we will need generic and anonymous demographic data that will be “aggregated” including your zip, birth date and sex and most would not be suspicious.

To take it a step further, I don’t even have to ask you your zip code, I can guess it with 98% accuracy simply with your IP address which many times is also included in “anonymous” data collections.

Device Identifiers Also Identify Users

But the data collected and provided to marketers can also include location data that is collected from both cell information and “Wi-Fi location”.

In addition to collect “Wi-Fi location” data they are also collecting the unique MAC address on Wi-Fi routers in a given area, that is how Wi-Fi positioning works, it looks at the unique MAC address and signal strength of Wi-Fi routers in a given area.

The MAC address that would appear most for users would be both their work and home routers providing a pretty good indication of a users home and work address.

Additional identifiers including your home and work Wi-Fi hardware may also appear in this anonymized data, including the IMEI, IMSI, MEID, IP addresses amongst others.

Although these identifiers may not specifically identify you directly, they are linked to your device and making the connection between the two is not difficult, many mobile applications will actually pass some of these identifiers to remote servers where they are stored.

For example Aldo Cortesi succesfully de-anonymized Apple UDIDs that were being used by Open Feint by linking the unique device ID used for the device to Facebook profiles.

Although Apple has moved away from the use of UDIDs the data still exists and given that roughly 70% of iOS applications were sending this data to remote servers for storage it is still a problem.

Given that most of the UDID data was passed via unsecured connections odds are that this data was also stored unencrypted, so a simple data breach such as what happened with Blue Frog can help link app data and users.

Your Mobility Traces Make You Unique

In another study from MIT anonymous data that was collected over fifteen months and included one and a half million users was analyzed.

In the study they found that a specific users “mobility traces” were highly unique, by leveraging only four spatio temporal points collected they could identify 95% of the individuals, even when location data was course.

Unique in the Crowd: The privacy bounds of human mobility

 

Although it seems like the process for reidentification of anonymous data is complex and time consuming, when implemented by large scale systems designed to identify patterns in big data we begin to see that our anonymous and aggregated data may not be so anonymous once researchers are able to connect the dots.

How Secure Is Shared Data?

As our data is shared with third parties it also raises an interesting question as to what data is being shared specifically and how that information is being secured by these third parties.

If there were a data breach by one of these parties and this information fell into the wrong hands could it be used for malicious intent? I sent a request to AT&T asking for examples of the aggregate data and some additional questions, I have not heard back yet.

Related Articles:

P.S. Have you met John Powers, supernatural CISO?

Previous post

Carberp Botnet Lifecycle Infographic

Next post

There is a Lot More to Metadata than You Know

Ken Westin

Ken Westin

Your Pundit of Paranoia