To determine what types of practices are invasive enough to compel users to complain we requested data from several outlets for users’ complaints: the Federal Trade Commission (FTC), the Privacy Rights Clearinghouse (PRC), the California Office of Privacy Protection (COPP), and TRUSTe. All four organizations gave us quantitative data for complaints made in the five year period between 2004 and 2008, inclusive.
TRUSTe gave us aggregate information such as number of complaints per year, by type. The FTC, PRC, and COPP gave us data for individual complaints, such as date, company, and type of complaint. In addition to these full data sets, we also received random samples from the FTC and PRC that also included the free text fields in which the users explain why they are complaining. The FTC and PRC removed any personally identifiable information before disclosure.
FTC Data Request
Consumer complaints filed at the FTC are categorized with codes relating to the various statutes it enforces. In addition to the statute code, each complaint is also tagged with a statute violation code. The FTC data is not organized hierarchically, so records that have been categorized in the General Privacy category, may have a violation code from one of the GLB violations, such as GLB2. The records may also be double coded with multiple statutes or violations.
The treemap visualization below shows the categories within the FTC complaint database. Larger colored areas represent Statute Codes (like "telemarketing" or "general privacy"), and individual boxes represent industry specific Product Codes. Clicking on the image will reveal the specific category names. Each category is represented by a box of equal size (categories are not weighted by volume).
We made a request to the FTC under the Freedom of Information Act (FOIA) for all complaints filed in the General Privacy (GP), Gramm-Leach-Bliley (GLB), and CAN-SPAM (CS) statute codes for the five year period between 2004 and 2008, inclusive. This query returned 51,532 records. Of these we focused primarily on the General Privacy (GP) statute, which comprised of about 7350 complaints. The largest portion of this dataset was the GP5 - “other” violation code.
To get a better understanding of the user complaints we sent more FOIA requests for the free text fields of a sample of complaints within the GP5 violation code. One request sought the free text for a random sample of 200 complaints marked with the GP5 violation code in which the website complained about was in the top 10 of our list of most visited websites.
Our analysis of the quantitative data revealed a significant number of complaints about data brokers and websites that serve as portals to them, such as ZabaSearch. Therefore, we also requested free text fields for a random sample of 200 complaints with the GP5 violation code in which the company complained about was one of the following: ZabaSearch.com, intelius.com, whitepages.com., addresses.com, or anywho.com.
The quantitative analysis of GP5 revealed a majority of complaints were concerned with user control and the public display of their personal information. Thus, users seem to be most concerned with their ability to control the collection and use of information about them.
PRC Data Request
The PRC also categorizes the complaints it receives from users. They have 40 different categories, such as Collection Agencies, Genetics, and Wiretapping. We requested the records of all complaints made within the same five year period, from 2004-2008, in the two categories most pertinent to our research: Cyberspace and Database/Info Broker. We received 2202 records from this request. These records did not include any fields containing information about the user.
We also requested the free text fields for a sample of complaints from the PRC within the same two categories. We received 250 records. These free text fields were stripped of all personally identifiable information before disclosure.
Free-text Complaint Coding
We categorized the free text complaints using a set of tags that matched the concern of the user as well as the type of data involved and the type of company. We ran through a pilot set of the complaints with a limited set of tags, discussed our findings, and then developed a revised set of tags that better captured the types of data involved and the concerns of the users. The revised set of tags was then applied to all the complaints. Two people did the coding, with 10% overlap. Within this overlap, we had an average agreement of 92% across all the tags, which is evidence of a high degree of inter-coder reliability.