Posts Tagged ‘sampling’

Nov 07
2014
google-analytics-sampling-art

This is the video to our sampled data post for those of you who prefer visual over reading!
For sites with heavy traffic, Google Analytics might sampled your data, which may make it hard to gain insights (the data might even become unusable). Don’t worry though, there are plenty of other options to work around that challenge. The video explains a couple.

Do you have any solutions? Tell us in the comments.

Oct 09
2014

google-analytics-sampling-art

Google Analytics is a very powerful tool, but I think we’d all agree it might be a bit much to expect it to process monster amounts of data on-demand and return reports instantaneously without a tiny tradeoff. Some heavier trafficked sites expecting instant reports from their oceans and oceans of data (which obviously take time to generate) instead find themselves running into sampling issues in their reports – where GA is forced to make it’s calculations based on a smaller sample size of the overall data to get a report instantly. The problem is, sometimes that sample might not be statistically significant or sufficiently representative of the data, so any insights contained in the data aren’t…well…accurate.

In general, sampling isn’t an issue if all you’re looking at are the standard out-of-box reports, because they are all unsampled. However, when leveraging GA’s segmentation capabilities (which is where the real beauty of deep insights resides), whenever a data set is greater than 250,000 or 500,000 sessions within a selected time period, sampling might come into play.

Sampled data is just that….it’s sampled. It’s not fully representative of the actual data. While Google Analytics has an intelligent algorithm to ensure that sampling minimizes adverse effects on the data, the reality is that a dataset that is a 5% sample of your actual data really isn’t usable. How you determine what is usable and what is not, really depends on the nature of your data, and the type of analysis being performed, but in general, it’s best to keep the sample size as high as possible. These reports will undoubtedly be used as a reference point for marketing decisions, so it’s important that they’re accurate and provide actionable insights.

Is the Core Reporting API a solution to this dilemma? Not entirely. Sampling isn’t solved with just this API, even if you have GA Premium because the API has the same sampling thresholds applied to it as GA standard.

So what to do?

Hold tight, the following are 4 solutions to help you get clean data and clean insights again!

1. Reduce the date range.

The first solution is to reduce the date range. When looking at a report (for which you’ve met or crossed the sampling threshold), the interface displays that the report is being sampled. Instead of looking at the entire month all at once, it may help to look at a smaller timeframe, such as a week. This way, only a subset of the data is being viewed and thus, the report that is pulled contains less sessions, which keeps us under the sampling threshold. You would have to look at subsequent weeks one at a time, which is a bit mundane, but once this is done, you can aggregate this and other date ranges of the same report outside of GA, into a single report. Read onto the next solution find out what to do with all those reports.

Note: The only way to export unsampled data directly is to be a Google Analytics Premium customer. There are some third-party tools available for non-GA premium users discussed below. These tools are designed to reduce but not eliminate the effects of sampling.

2. Query Partitioning.

One way of reducing the effects of sampling is to break up the timeframe into smaller timeframes. For example, a year of data can be pulled as 12 separate months (12 separate queries), or a month of data can be pulled as 4 separate weeks (4 separate queries). For example, instead of pulling data for all of 2014, I can pull Jan. 2014, and then pull Feb. 2014, and so on. Obviously, we all have better things to do… A featured called query partitioning, available in tools such as ShufflePoint and Analytics Canvas (more details below), does the above for you in an automated fashion. The tools partition the query and programmatically loop through the desired timeframe, aggregating the report back together once done. This way, when you pull the report, the tool would appear as if making one query but in reality it’s making the number of queries behind the scenes, based on how you configure granularly you define the query partitioning. It may take some experimenting to find a balance between speed and accuracy (sample size).

More detail about the tools:

  • ShufflePoint has a drag-and-drop interface that supports Google Analytics and a few other Google products. The nice thing about ShufflePoint is that it uses Excel’s web-querying capability, so you can write SQL-like queries to retrieve your data, make built-in calculations and display the data essentially any way you want.
  • Analytics Canvas is another tool which allows you to connect to the Google Analytics API without coding. Analytics Canvas uses a “canvas” in which you can construct a visual flowchart of the query and subsequent transformations and joins of your data,to show what series of modifications will take place. It also allows for automating data extraction from BigQuery. If you using Google Sheets for your data, Analytics Canvas has an add-on in Chrome that allows you to create dashboards within Sheets.

Both of these tools have the functionality of extracting your data from Google Analytics and analyzing and creating reports.

3. Download Unsampled Reports.

If you are a Google Analytics Premium user, you can download unsampled reports (you will have to export them). Google Analytics just announced an exciting new feature available on Premium accounts called Custom Tables which allows you to create a custom table with metrics and dimensions of your choice (although there are some limitations). In other words, you can essentially designate a report that would otherwise be sampled, as a “Custom Table” which is then available to you as an unsampled report, similar to the out-of-box reports. You can create up to a 100 Custom Tables. This is awesome because you won’t have to worry about the sampled data for the reports you use often.

4. BigQuery.

If you have Google Premium, it integrates with Google BigQuery which allows for moving massive datasets and super fast SQL-like querying. It works over the Google cloud infrastructure and is able to process data in the order of billions of rows. GA Premium allows for your data to be exported daily into BigQuery. In the Core Reporting API, the data is sampled at the same threshold as in GA Standard. BigQuery allows you to access unsampled hit level data instead of the aggregate level data within the user interface, which in turn opens doors for very powerful and previously impossible analysis!

Here is an examples of the type of analysis possible with BigQuery to help illustrate its use.

  • What is the average amount of money spent by users per visit?
  • What is the sequence of hits (sequence of clicks, pages, events, etc)?
  • What other products are purchased by those customers who purchased a specific product?

For more details, visit here.

There you have it, four solutions to help you deal with sampling! Happy Reporting!

Aug 28
2014

10mill-hits

It’s easy to sign up for cool tools without really reading the full terms of service, especially when those services are complimentary. Google Analytics is one of those “cool tools”! Really robust, really useful, sometimes so useful, our business depends on it. But do we really know the limits?

You’re garnering leads, conversion rates are increasing, the Key Performance Indicators you’ve chosen are displaying progress, and the hits to your website are sky-rocketing! Life is good when suddenly an error message starts flashing on your reporting dashboard – you’ve exceeded 10 million hits per month!

Why did you hit the data limit? Well, it’s part of the terms of service we neglect to read :( . Google Analytics terms of service states, “…the service is provided without charge to you for up to 10 million Hits per month per account.”

Is it really that big of a deal? If you’re hitting the limit, probably. Reason being, most likely at this volume, your data is important, and Google Analytics will automatically start sampling your data. That means not all your data will be available in your interface, which could lead to inaccuracies and limit the insight you can derive.

All the work and investment you put in your digital properties – websites, mobile website, mobile apps, to become one of the leading businesses in your industry – it’s obviously important to accurately analyze your data and have as much access to it as you can.

You have 3 overlying solutions:

1. Upgrade to Google Analytics Premium

The first is the easiest most straightforward way to overcome these limits. Upgrading to Google Analytics Premium gives you 1 billion hits per month, not to mention access to “amped-up” reporting features for your business. Plus, you won’t be going at analyzing your data alone – it also includes technical and implementation support, which at this volume, can be a big help.

It is indeed a paid service, but it’s worth it to avoid the hassle and get the proper processing power and support you need for your growing organization. For more information and consultation about Google Analytics Premium, contact E-Nor.

2. Limit Hits to Google Analytics by Setting Your Own Sample Rate

Since Google Analytics standard only allows a number of hits, the second option you have is to report/send fewer hits to your analytics account. A “hit” on a site is a pageview, event or any other transaction. The reason this may help is that a single visit could potentially be registered as multiple hits. That is, say a user interacts with your website and views multiple pages, fills out a form and purchases a product, these interactions will be reported as multiple hits, regardless of all these actions being associated with one unique visitor or even one visit. Setting your own sample rate will minimize the hits, but put the control in your hands (rather than letting the system choose the sampling rate for you when you’ve passed their limit).

This is done at the code level. Talk to your developers about setting a new sample rate using the _setSampleRate method in the tracking code. This method allows you to choose the number of visits after which you want to count. For instance, you choose to track one visit after every 10 visits, so your sampling rate is 10. Say you have 100,000 visits a day. If you track with a sample rate of 10, your visits will be reported as 10,000 hits. Thus, allowing you to reduce your overall data limit.

3. Selective Tracking To Minimize Hits

If you’re only interested in certain types of hits, you may be able to get rid of what’s unnecessary. For example, since every Event counts as a hit, you may want to re-strategize and figure out what Events are really necessary. Instead of tracking every unique user-interaction as an Event, pick and choose what’s important to track so that the tracking code doesn’t record excessive hits and inflates your data limit.

If you are a video heavy website, you may not need to track everything, such as when the user starts the video, when they pause the video, have reached midway, and/or reached the end. You might want to simply track when the user starts the video or when they’ve completed watching the video. This cuts downs the number of times tracking code is fired and lowers the hits.

If you’re an ecommerce website, and heavy on products, you don’t have to track every single metric involved, maybe only product pages visits, shares, social interactions, Add-to-Carts and so on. Once again pick and choose which metrics will best reflect your website’s performance with selective metrics.

Stay tuned for our more in depth article on tips to deal with sampling.