Sometimes we take for granted that “anonymous data” is actually anonymous, but is it really anonymous? According to this article from Wired, no:
“Last year, Netflix published 10 million movie rankings by 500,000 customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using. The data was anonymized by removing personal details and replacing names with random numbers, to protect the privacy of the recommenders.
Arvind Narayanan and Vitaly Shmatikov, researchers at the University of Texas at Austin, de-anonymized some of the Netflix data by comparing rankings and timestamps with public information in the Internet Movie Database, or IMDb.
Their research (.pdf) illustrates some inherent security problems with anonymous data, but first it’s important to explain what they did and did not do. “
Read the rest of the article here