Google Analytics’ Bot & Spider Filtering
Google Analytics recently released a feature called “Bot and Spider Filtering”. You might have read about it already but haven’t had a chance to implement it or process how it might affect your organization or even your effectiveness as a digital analyst. E-Nor recommends turning on Google Analytics’ Bot & Spider Filter as a best practice for all Google Analytics accounts (except for raw data Views – which we’ll explain at the end of the post).*
As an added bonus, it is super easy to do!
What is Bot & Spider Filtering?
Bot and Spider Filtering filters out sessions from bots and spiders (as defined by Google) from your Google Analytics data. It’s important to note though that like traditional Google Analytics filters, the Bot & Spider filter works on a “going forward” basis; in other words, past data will be unaffected. It also does not actually prevent bots or spiders from visiting your site (only filters them out from your data).
Typically, traffic from bots and spiders will be negligible and barely noticeable in your Google Analytics data. For example, in E-Nor’s recent test using the new filter, bots and spiders accounted for less than 0.5% of all sessions.
So then why should you care? There are cases where bot and spider activity can seriously skew results in Google Analytics. Unpredicted spikes for small durations can throw data for small time-frames out of whack and result in significant time being spent diagnosing spikes that turn out to be bot or spider traffic.
How E-Nor Diagnosed Spikes in Google Analytics Sessions
The key to what triggers our investigation(s) were:
- Was the activity out of the norm? Did it stand out compared to the same metrics in prior periods?
- Could we assign a reasonable explanation to it? For example, was content changed on the site or could some outside event lead to the activity?
When the answers are: 1. Yes, the activity is abnormal; and 2. No, we can’t figure out a reasonable explanation as to why – we need to go on a Google Analytics fishing expedition…
Case 1: We observed a Top Pages report with several new, unexpected entries. None of the typical reasons why the report would have new top pages had occurred – like an actual new page, a very popular campaign linked to a page or sudden public interest in the topic of a particular page. The organization this report belonged to primarily provided informational content, so their concern was the level of engagement with that information. These weird occurrences were a big deal.
E-Nor examined metrics for the pages in question (and for sessions including those pages). We observed the following anomalies compared to past data, to other pages during the same time period. We subsequently concluded that bot activity was probably responsible – even though this activity at first seemed atypical for bots!
We checked many metrics, and the ones that stood out were:
- Pageviews and entrances increased by atypical amounts compared to prior time periods for several pages.
- The bounce rate dropped dramatically, e.g. from 74% to 42%.
- These same pages were also the top landing pages when they had not been in prior periods.
- Sessions involving these pages include very high pageviews to the page in question.
- Session duration for sessions including the pages increased abnormally, e.g. 4:36 to 28:37 minutes.
- Pages per Session increased a huge amount from 2 to 13 pages per session.
- Browsers with Browser Versions were unusual with sessions including these pages coming primarily from Internet Explorer versions 7, 8 and 6, rather than the typical IE 11, IE 9 and Chrome for this site.
- Locations were primarily Russia, Indonesia, Argentine, Thailand, Mexico and other countries atypical for this site, where sessions typically occur mostly in the United States.
Case 2: For a B2B high tech company, we observed, again, a deviation from the prior period visible across many pages in a management report. Our report to senior management indicated huge interest from a major customer in jobs, rather than products, and huge interest from another customer who hadn’t been in the “top customers” report in the past. For the latter, we learned that a news item caused the unusual activity from that customer, so that was explained. For the former, we did not discover a solid reason for the change, so we went on a Google Analytics fishing expedition.
Our expedition across many, many metrics revealed these anomalies compared to prior periods:
- The Source/Medium for the affected sessions was “(direct) / (none)”, which was suspicious because the Landing Page was not one someone would bookmark or type into the browser.
- All sessions had the same Landing Page.
- The Bounce Rate for that page skyrocketed.
- The City and Region were both “(not set)” while the Country was the United States.
- The Operating System was also “(not set)”.
- The Browser was also “(not set)”.
With its high bounce rate and frequent occurrence of “none” and “(not set)”, Case 2 was an example of actypical activity likely to be indicative of bots.
References & More
For more information, always go to the source: Google’s announcement about Bot & Spider Filtering
And please don’t forget to annotate your Google Analytics data and please don’t apply the Bot & Spider Filter to your Raw Data View.*
*What is a Raw Data View?
A Raw Data View is a Google Analytics View with no configuration. For example, no Filters are applied and no Goals are set. The Raw Data View acts as a back-up, first, in case, we need to validate configuration in other Views (it might be easier to compare to a Raw Data View), and second, in case a View becomes too complex to reverse engineer, it might be easier to just copy the Raw Data View and then apply configuration, like Filters and Goals, fresh.