How to find and remove PII from Google Analytics

Personally Identifiable Data (PII) is data that can be used to identify a specific individual. Across the clients we’ve worked with an alarming number either have unknowingly uploaded PII to their Analytics platform or are in breach of GDPR regulations. We’re going to dive into detail about PII and how to audit your Google Analytics account for it to help you become more compliant and avoid some nasty ramifications.

Disclaimer: Do not take this article as legal advice. These are just guidelines to help make you more compliant. PII and GDPR compliance are complicated issues and you should also consult a legal professional to ensure you are fully compliant.

If Google finds PII in your analytics account there are typically two things that happen:

  • Google may delete all data in the affected account
  • Google may delete data from the affected time period

In the UK if the ICO (Information Commissioner’s Office) finds out you’ll be in direct violation of GDPR as you will not have explicitly stated in your Privacy Policy details of this PII being transferred to Google and why. This could land you with a hefty fine of up to 4% of global annual turnover or €20 million whichever is greater.

Two very compelling reasons to get your house in order, the risk to revenue because of lost data and a very hefty fine…

Throughout this article we’ll cover the following topics:

  • What is considered PII
  • How to audit your Google Analytics account for PII
  • How to stop PII getting sent to GA and remove it from your account

What is considered PII (Personally Identifiable Information)?

As we’re dealing primarily with Google here it makes sense to see what they have to say about the subject. Google has defined in this article some of the best practices to avoid sending personally identifiable information as well as a handy description. Here’s an excerpt:

 

“To protect user privacy, Google policies mandate that no data be passed to Google that Google could use or recognize as personally identifiable information (PII). PII includes, but is not limited to, information such as email addresses, personal mobile numbers, and social security numbers. Because laws across countries and territories vary, and because Google Analytics can be used in many ways, consult an attorney if you are in doubt whether certain information might constitute PII or not.”

Verified Data have created a PII Reference Guide detailing a lot of the most common PII types. They’ve done a fantastic job of clustering these on a global level and by country level as well. I’d keep this close when you’re auditing your account. It’s a fantastic resource!

How to audit your Google Analytics account for PII

Google suggests checking the following areas to avoid sending PII. 

We’ll walk you through how to check these reports in Google Analytics below:

  • User IDs
  • Page URLs and Titles
  • PII Entered by users
  • Data Import
  • Additional Analytics Features
  • Geolocation

User IDs

One of the best features of Universal Analytics is the ability to assign a User ID to your users. This allows you to join together the online/offline journey and identify your customers on a more granular level. Though always remember:

It’s terribly easy to use customers’ emails, user logins and phone numbers as a User ID. This is against Google’s terms of service and is PII.

To check the values of your User IDs go to Google Analytics > Audience > User Explorer. Here you’ll be able to check which values your business is using.

The safest option is to use your CRM’s customer identifier which tends to be an arbitrary number. It’s also possible to use PII as long as it has been encrypted before it lands in Google Analytics, the only exception is that you cannot use Protected Health Information. The minimum hashing requirement as set by Google is SHA256 and they recommend using a salt of at least 8 characters.

Page URLs

This is where I find the most PII in client accounts. This is typically down to forms using GET requests instead of POST requests on forms. This will append all of the form information filled out to the URL as a series of parameters. Google obviously uses your page path information to generate content reports. Typically 99% of information in forms is personal so you can immediately see the problem.
Navigate yourself to the all pages report shown in this beautiful screenshot below:

You then want to open up the advanced search option and start hunting. As mentioned before there are quite a few different items that can be classified as PII. I’ve provided a few examples below to help you on your way.

Searching For Emails

Use the advanced search feature which allows you to search by Regex and plug the following in:

([a-zA-Z0-9_\.-]+)@([\da-zA-Z\.-]+)\.([a-zA-Z\.]{2,6})

Phone Numbers

You can use normal search functionality for this one, plug these in to see if there are any parameters that match:

  • phone=
  • mobile=
  • tel=
  • telephone=
  • mob=

Names

Same as above mon amie:

  • firstname=
  • lastname=
  • surname=
  • name=

Postcode & Address

  • postcode=
  • zip=
  • zipcode=
  • address=
  • house=

Password

  • password=
  • pass=
  • login=

Try your own on for size and give your page paths a manual check to see if there’s anything PIIshy around…. (I’ll see myself out).

PII Entered by Users

This may be a little bit trickier to find depending on your current analytics setup. There are a few different places where users may enter PII. The top two are search fields on your website and the second are form fields on your website.

If you followed the steps to identify PII in the All Pages report then you should have captured the most offending instances. However, if you are capturing your site search or form fields, these appear in separate reports.

Form field data as mentioned can be captured in one of two ways:

  • Through some event data if you are sending form field data to GA
  • In the URL if your form is using a GET request instead of a POST

POST is always the preferred method of submitting forms. If your developers have used a GET request to send form data then this will end up as part of the URL and get captured in your page reports. Ask your developer to change your form if this is the case. Typically you can just update the form method to POST instead of GET to make this change.

To view the data your users are searching navigate to your Behavior Report > Site Search > Search Terms. You can then use the same steps we used for the All Pages report to unearth those gremlins.

Data Import

Data Import allows dimension widening in your reports. Through this process, you can add an additional product or user information to your reports through a custom upload. This can be done through the Google Analytics interface or by a web app. Make sure you’re not uploading anything you shouldn’t be in regards to your customer data. Some more info on Data Import can be found here.

Data will most likely be uploaded against Custom Dimensions when using this feature but you’ll need to work with your analytics team to identify where these are. You can also check the Data Import Interface in the admin section of Google Analytics to see where you are uploading data to and then check these reports. Use similar searches like before to find PII.

Additional Analytics Features

There are numerous features that we can leverage in GA and it’s important that we check at least the most popular for any PII. We recommend that for each of the following areas you either manually search for PII using the interface or alternatively you could create an automated report using the Google Sheets add on and run some logic over the resulting data.

All Custom Dimensions

Adding custom dimensions is a powerful way to understand your customer’s behaviour and characteristics in much deeper detail. CD’s can be added during a browsing session or from your back-end using the measurement protocol. Companies sometimes add postcodes, usernames and other restricted information.

Campaign Dimensions: Source, Medium, Keyword, Campaign, Content

Campaign parameters are sometimes overlooked when analysts audit for PII. While not a big offender compared to others on the list PII can sometimes creep into these reports as campaign parameters tend to still be built by humans. These reports can be found under Acquisition > Campaigns > All Campaigns.

Event Dimensions: Event Category, Event Action and Event Label

Event data will be any custom data you are collecting across your web setup. It’s easy to set up some form of tracking that may accidentally pull in unwanted information from the DOM. Depending on the setup you may need to dig deep into GTM to find the root cause of these issues. Check the event report under Behaviour > Events > Overview for any troublemakers.

Geo-location & IP Address

As standard Google will collect geo-location information about visitors on your website. This is derived from the visitors IP address and tends to be only an approximation of location. For example, ISPs will sometimes route traffic through a switch that may be in a neighbouring city making this information much less accurate. Also, companies will sometimes have their own predefined regions or sales areas. You can improve the accuracy of location or upload your own areas through a data upload or through some GTM magic but there are a few rules you need to follow to avoid PII.

You should avoid any “fine-grained” information that may expose an individual. This is considered an area that is less than 1 square mile, anything that contains longitude or latitude information and also specific postcodes. To someone reading this from London you might say this is over the top but it becomes a lot easier to identify individuals living in the Highlands with this information.

It’s also worth noting that under GDPR IP addresses are considered PII. Luckily enough it’s a pretty straightforward process to anonymize IP data using either GTM or hard-coded Universal Analytics. Himanshu Sharma has created an excellent guide on the process which you can find here.

How To Stop PII Being Sent To Google Analytics and Remove It

If you’ve found some PII in your account through this process then it’s imperative that we first of all stop it from being sent to your analytics account and secondly we need to make sure we fix the issue at the root cause. This is important as this information may be getting sent to other 3rd parties that are present on your site.

We’ll stand on the shoulders of giants here as there has been great content already created on the topic. Simo Ahava has created an excellent guide to stopping PII from entering Google analytics. Brian Clifton has also expanded upon the above solution which can be found here.

Obviously, these articles only stop further information from arriving in Google Analytics. What about the stuff that’s already there you ask? Google have recently created a data deletion request feature in Google Analytics. This can be found in Admin > Property > Data Deletion Requests. This allows you to remove fields between certain dates that have PII.

I hope that was useful. Any additions you’d like to see or any questions just drop them in the comments below.

Feel free to reach out directly and I’d be happy to help!