Factors Affecting Web Analytics Accuracy and Performance

One of the questions I continually get asked to address for people is, “Why do the stats from provider A not match the stats from provider B?” Often this is when comparing Woopra to another analytics provider like Google.

I have touched on this topic in the past a couple of times, but we’re going to expand on it further today:

Different Tracking Mechanisms

There are a variety of different analytic systems in use today. Some, like Woopra, employ a JavaScript which sends information back to a central datacenter. Some, like Mint, store information in your own local database. Still others, like Analog, rely on reading the server log files that are collected by and stored directly on most Web servers. And finally, some services have you install a simple image file which keeps count of the number of times a page is loaded.

No method is wrong, but there are certainly differing degrees of accuracy and reliability associated with each, as well as differing amounts of data collected. For the purposes of this discussion I’m going to assume that we have chosen to use a JavaScript based tracking system and then compare and contrast it to other methods.

JavaScript Benefits vs. Server Log File Analysis

JavaScript is a language that can allow your site to gather relevant information from visitors. However your Web server (like Apache or Microsoft IIS) by default also gathers some raw statistics as Web pages are accessed. These are stored in a document called a “log file”.

  • Server log files do not, without customization, collect the following examples of data generally collected via JavaScript:
    • User names, custom data (customer purchases, comments, etc.), outgoing click data, and more
    • Information appended by a service provider (like Woopra) such as business name associated with visitor IP, etc.
  • Log files are susceptible to inaccuracy due to caching and proxy servers, and also because they record visits from bots (like Google, Yahoo or Spam-bots) as if they were human.
  • Processing of log files can take many hours, and puts strain on the server doing the analysis. (Not good if it’s your Web server.)
  • Log files can eat up gigabytes of space on server hard drives, and the constant collection of data uses more power and consumes your machine’s resources.

Frankly, my recommendation is to turn off your servers log files to save resources because 99% of people will never, ever use them anyway.

JavaScript Benefits vs. Local Stats Collection

There are systems you can download and install on your local machine to track visitors to your Website. At first glance, these seem like a nice option because there may only be a one time fee and a quick installation. However, these systems have a number of significant shortcomings as opposed to using a third party service based on JavaScript tracking.

  • As with log file analysis, collecting stats on your own server will put tremendous load on it. There will be a noticeable decrease in your servers capacity, especially under heavy load. If you have to upgrade your machine to maintain performance, it’s not really free.
  • Any sort of high traffic site effect such as Digg, Slashdot or Lifehacker will almost certainly crash a server trying to serve pages and track every visitor.
  • There are long term storage and archiving issues associated with stats data. You must be prepared to deal with them for years if you wish to maintain historical data. This is an unenviable task.
  • These systems are far less sophisticated and collect less information than a service provider can.

JavaScript Benefits vs. Image Counting Statistics

The first Web Analytics provider architectures relied on simple image downloads to increment a counter. It worked quite well in the early days, though there are significant drawbacks now:

  • Images can be subject to caching which would miss repeat visits, or sometimes even new visitors coming from an ISP who has cached the image to save bandwidth.
  • Without executing JavaScript there is certain information that simply can’t be gathered. Additionally, an image file can’t “ping” back to the service provide to report on if the visitor is still on a page or what their status is.
  • JavaScripts can be used to perform multiple functions with only one simple inclusion, for example the Woopra Chat feature, but adding an image link can do nothing other than increment the counter.

JavaScript’s Potential Drawbacks

JavaScript isn’t a perfect solution either. There are a few potential pitfalls.

  • JavaScript could be disabled in a visitor’s Web browser. The good news is that the general population of the internet does not disable it. Fewer than 3% fall into that category, and I’m certain this is far fewer than those who are affected by the problems with other methods.
  • Many analytics providers execute JavaScript slowly. This is because they typically serve it from a central server within their network, and then track the visits from cheap, underpowered machines. The delays introduced in this process cause data not to be collected on pages with quick abandon rates.

Incidentally, that last reason is why all 3rd party JavaScripts you use on your site should be placed at the very bottom of the page. This way, if the service provider is running a little slow, it doesn’t prevent your page from loading quickly.

To Sum Things Up

There is absolutely no perfectly accurate system; however, at Woopra, our system of dedicated servers in Tier 1 data centers with premium backbone connectivity does all of the heavy lifting for our clients. We keep the servers loaded to well within their limitations, ensuring they respond rapidly when collecting visitor data (meaning fewer dropped visits and faster page loading) and keeping the load off your server.

We’ve also partnered with the 3rd largest CDN provider on the planet to host our Javascript payloads meaning that every visitor, from anywhere on Earth, is served by the closest possible point of presence – speeding up end visitor page loading and the collection of their data.

Finally, unlike other service methodologies, Woopra continually receives a ping from visitors on your site (because Woopra is a LIVE and real-time service provider), enabling us to be 100% certain that visitors are actually still on your site. We believe that all of this adds up to the most accurate user reporting in the business.

Got a question for John P? Hit him up on Twitter: @johnpoz

Leave a Reply