Google Analytics 4 and IP Address
A closer look at the privacy and product implications of the deprecation of IP Address logging in GA4.
Google continues to make significant and substantial privacy-related changes to its platforms, announcing back in March that it will no longer log or store IP addresses in its new Google Analytics platform (GA4).
Details were released as part of a wider announcement regarding the deprecation of Google’s legacy Universal Analytics (UA) platform, which will stop collecting new data in July 2023, with all customers transitioning over to the new GA4 platform thereafter. As of March 2022, GA4 no longer logs or stores IP addresses.
GA4 was introduced back in October 2020, and so marketers have had time to plan for the transition, although a looming hard deprecation date is likely to cause a degree of anxiety among UA customers. However, the elimination of IP address logging came as somewhat more of a surprise. Legacy versions of analytics have always logged IP addresses, with GA4 initially opting to use only anonymised IP addresses. Google has now gone one step further by removing IP address logging altogether.
Outdated Legacy
Google sees its legacy analytics platform as increasingly outdated, primarily due to its reliance on third party cookies and methods of tracking that are increasingly out of sync with current consumer and regulatory privacy demands, and this move can be seen as a continuation of Google’s wider product re-engineering efforts to address these concerns. Google does of course argue that its GA4 platform will in fact offer better and more powerful analytics capabilities, regardless of the elimination of IP address logging, with Google Analytics product director Russ Ketchum noting that, ‘we’re not logging or storing IP addresses because we no longer need to’.
Schrems II?
It is probably not entirely coincidental that we see Google making substantial changes to the Google Analytics platform IP address logging capabilities at a time when this exact combination of product and data type is experiencing significant legal scrutiny across Europe, as the Schrems II fallout gathers pace. Throughout 2022 we have seen the Austrian, French and Italian DPAs make sweeping and significant statements about the legality of Google Analytics use as it pertains to EU>US international data transfers, specifically citing IP address logging as being of concern. Privacy advocates such as NOYB/Max Schrems are also signalling confidence that many other regulators in Europe are set to follow suit over the coming months. These developments are starting to have a small, but tangible impact on some marketer’s appetite to use the GA product at all in Europe, with more clearly compliant (EU based) competitors perhaps seeing a tiny chink in the armour of Google Analytics hegemony in this space.
Given this context, it seems reasonable that Google would seek to limit its potential exposure to legal issues in the EU by removing problematic identifiers where possible from its analytics platform. In the EU context, Google now derives geolocation data from IP addresses using EU based servers, before forwarding what it calls ‘coarse’ data (city, continent, country, region) to the analytics server. In this way GA4, in Europe at least, appears to be taking a ‘proxyfication’ type approach as recently envisaged by the CNIL, essentially using EU proxy servers for location ‘pseudonymisation’.
The elephant in the room here is of course the Trans-Atlantic Data Transfer Privacy Framework (Privacy Shield 2.0), still anticipated to make an appearance in the coming months. For if Schrems II type considerations have in fact played into Google’s decision to eliminate IP address tracking, then this doesn’t inspire confidence in Google’s expectations for certainty on EU>US data transfers coming any time soon.
How GA 4 will maintain its reporting functionality without logging IPs:
IP addresses have a range of functions in analytics reporting, and can also play a role in marketing features. Common uses of IP addresses:
- Determine a user’s current geographic location for content customization or regional compliance (e.g. GDPR or CCPA applicability)
- Ingredient for statistical IDs (IP address + user agent string, for instance) or device fingerprinting
- Ingredient for cross-device mapping (linking multiple cookies and/or MAIDs together as likely belonging to the same person or household)
- Frequency capping
- Attribution reporting
In short, Google’s transition to more contemporary forms of user path analysis have rendered the IP address far less useful, especially for traditional analytics and conversion reporting. Google now uses an approach they refer to as data modeling (as in ‘modeled conversions,’ ‘modeled location attributes,’ etc.) - spec details and comparisons with legacy UA can be found here. Data modeling involves using a wide range of use case specific attributes to make inferences about individual device path, location, transactions, etc. Data modeling is not deterministic in the way that cookies or MAID based tracking was, but companies like Google can now make these inferences with a very high level of confidence, and they are much more reliable in environments where cookies and MAIDs are less consistently available.
Impact on geolocation data accuracy?
Google, of course, implies that such GA4 modelling processes will more than make up for analytics capabilities sacrificed in the deprecation of IP address logging, although it does not provide any substantive discussion concerning this. As discussed, Google derives ‘coarse geolocation data’ from IP addresses on a proxy server, sending only city, country, continent and other high level data to the analytics server. It is therefore likely that geolocation data analytics performance will be impacted by this change.
A 2018 study (Clifton, B & Wan, H) compared the geolocation analytics performance of UA using standard IP address (aip=off) and anonymised (last octet removed) IP address (aip=on), across 3 million global sessions on an identical test set-up. The study found that there were no notable differences between the measures at continent and country level, but a significant difference at city level, with an average of 76.7% accuracy for anonymised IP against non-anonymised configurations. We cannot necessarily extrapolate these findings into the modelled data approach used by GA4, but data models that rely on geolocation accuracy at a more precise granularity than the city level may struggle with an IP-less GA4.
Is modeled data a work around for cookies?
Broadly speaking, yes, in as much as this data replaces the essential reporting capabilities that marketers have derived from cookies for the last 20 years. But modeled data is also a black box that lies entirely server side. We do not know how unique the data are to a device or person at this stage. In theory, a company could expand and contract device level fidelity at will, without leaving any particular evidence of the change on the consumer’s browser or the client’s dashboard. Statistical ID providers have commonly chosen different levels of fidelity depending on the use case, each of which have a specific balance between certainty and scale to consider. The closer to ‘certainty’ an inferred ID is (with 85% being a critical threshold), the closer to ‘cookie-like’ it becomes, with important ePrivacy and GDPR implications.
We also know that individual user IDs can be fed into GA4, and the modeled data can therefore be combined with explicit personal data, and the inferred paths would therefore become personal data under the GDPR. But if a company does not feed IDs into GA4, is the resulting data personal data? We do not have a clear answer at this stage. A conservative approach would be to assume the data is personal data until Google makes clear statements on this point and substantiates a ‘no personal data’ assertion.
One final note on IP addresses: while GA4 will not ‘log’ IP addresses, it still seems that GA4 ‘uses’ IP addresses, at least to make initial inferences about the user, including anti-fraud determinations and location. The location inferences in particular will remain even after the IP is dropped.
Compliance considerations:
- The elimination of IP addresses is clearly helpful from a GDPR compliance standpoint, but it seems risky at this stage to assume this means personal data has been eliminated from the platform. Until we hear otherwise from Google, we cannot assume that GA4 does not involve the processing of personal data.
- From the perspective of international transfers, we must therefore continue to assume that using GA4 in Europe will still constitute an international transfer of data to the US, and all post-Schrems II compliance obligations will still need to be satisfied (Export Mechanism, TIAs and possible Supplementary Measures) for the processing.
- However, removal of IP addresses will likely relieve a fraction of pressure on international transfer risk assessments, since IP address is a widely used ‘selector’ for data collection in US intelligence surveillance programmes and the regulatory action in Europe currently has focussed on this particular data type.
- NOYB has a queue of complaints outstanding with various EU DPAs, that most likely will continue to focus on GA data transfers as their illustrative example. We will await the next tranche of DPA rulings on the use of GA as it pertains to international data transfer to see if any evidence surfaces to suggest regulators will view GA4 any differently now that IP address is not being logged. At the time of writing, the Italian DPA is the latest supervisory authority to reprimand publishers about the illegality of GA. The ruling from late June still makes reference to IP address as a key problematic data type, with no reference to product changes in this area.
- GA4 can be used with a wide range of data insertions and marketing service integrations, so clear rules should be established with marketing teams to ensure that GA4 is used in a manner consistent with the privacy team’s compliance position.
- Particularly privacy risk-sensitive businesses may consider abandoning GA altogether. The French Supervisory Authority CNIL has gone as far as publishing a list of alternative audience measurement tools that have already demonstrated to the CNIL that they can be configured in a GDPR/ePrivacy compliant manner. Although most tools listed are developed by EU providers, CNIL does still note in a disclaimer that companies should verify the international transfer arrangements with the measurement providers themselves.
- CNIL quite interestingly has also recently suggested another possible solution to the continued and compliant use of GA - the utilisation of a proxy server. A proxy server can avoid any direct contact between the user's browser and Google’s servers, essentially functioning as a pseudonymisation engine, meeting the following criteria prescribed by CNIL:
- the absence of transfer of the IP address to the servers of the measurement tool;.
- the replacement of the user identifier by the proxy server;
- the deletion of the referring site information external to the site;
- the deletion of any parameter contained in the URLs collected (for example the UTMs, but also the URL parameters allowing the internal routing of the site);
- the reprocessing of information that can participate in the generation of a fingerprint
- the absence of any collection of identifiers between sites ( cross-site ) or deterministic (CRM, unique ID).
- deletion of any other data that may lead to re-identification
Unfortunately, the CNIL currently appears to have set such a high bar for pseudonymization that it seems unlikely that GA would be able to provide meaningful analytics reporting from such a setup.