Google Analytics vs Server Logs
Mar 6, 2014 by Paul White
Every Media Property measures their stats in one way or another. Properties use their stats to set advertising rates, and measure their growth. Unfortunately some properties still prefer to look at stats which show them numbers they like rather than the truth. Here I will break down the two types of stats tracking ( analytics, and Server Logs ), and how each should be used.
What are Stats based on Server Logs?
Every request to your website is recorded in your server logs. The server log files are typically ASCII readable, with 1 line per log entry. It doesn't matter what is requested from your server ( photo, webpage, style sheet, video, ect ) each will be logged into the server logs. A stats server then reads the server logs and organizes the information into a format that is more easy for users to view ( charts, tables ). It also gives you information on How many visitors, and how many impressions (Page Views) your website receives.
What is wrong with Stats based on Server Logs?
The problem is every request to your website is not always from organic visitors ( real people). Your website is also visited by search engines, and various bots ( many of them with malicious intent). Since the only visitors your advertisers care about are the real ones, its important to only meassure your organic visitors. But Stats Servers based on your Server Logs don't have that ability. This is where an Analytics program like Google Analytics comes in.
How does Google Analytics work?
How big of a gap is there in the data?
As you are probably wondering how much of a difference is there between your Servers Logs, and the data that Google Analytics reports? Years ago there wasn't that big of a difference, but as more spiders and bots have come online things have changed.
The following charts are for the same website within the same date time period ( Feb 1st - Feb 28th, 2014)
SmarterStats (based on Server Logs)
Comparing the data
Lets take Feb 3rd 2014.
Google Says I had 743 Unique Vistors to my website.
Server Logs say I had 2159 Unique Visitors to my website.
When setting advertising rates I would want to use the 743 number from google analytics. The 2159 number is not useless. This number indicates the number of unique IP Addresses that visited the website that day. Search Engines often use dozens of IPs when scanning your site. Rouge Bots tend to use even more. The number above is likely the result of Rogue Bots scanning the site for security weaknesses, and or attempting to comment spam on the submit forms.
How to reduce the number of Rouge Bots to your site
If your website is built on a common framework ( WordPress, Joomla) or uses common web tools to manage the data ( PHPmyAdmin ), it is likely attacked daily for known weaknesses in these frameworks. However these Rogue Bots often operate from IPs that are outside of the USA. The easiest way to eliminate these bots is to firewall them from your server. If you don't want to firewall every country outside the USA, you can also just firewall specific countries that tend host these IPs.
I firewall the following countries from my server, which has cut down 95% of the Bots and Spam.
When discussing your websites traffic with potential advertisers always use your Google Analytics data. But don't ignore your Server Logs, as they can often alert you to security and performance issues within your website's architecture.Related Categories > WebSites > Traffic and Stats