Tuesday, November 19, 2013

Crunching Billions of Log Lines With LogParser

Yesterday three different people came to me asking what type and how many browsers are our "users" using to access our content. Since our products load on about 50 - 80 million pages a day we have a pretty good sample size. And a hell of a lot of log files to collect and analyze. Since I already summarize this info on each server daily it was rather simple to gather these summaries and tally them for the month.

These calls are from one of our services that handles an average of about 50,000,000 calls per day. Of course these calls are spread across several web servers, so I took the daily summary from each server & combined them per day, then took these daily summaries and combined them for last month (October 2013). In total this represents only about 5-8% of our total daily calls, but this particular service is the first called by many of our products so it is the best place from which to gather info like user agent distribution.

UserAgent UAHits UAPercent
Internet Explorer 516,408,427 34.05%
Chrome 318,859,924 21.02%
Firefox 262,120,296 17.28%
Apple IOS 165,269,836 10.90%
Safari 136,577,103 9.01%
Android 77,221,373 5.09%
Other Mobile 10,372,620 0.68%
Other User Agents 9,694,239 0.64%
Search Bot 5,097,159 0.34%
Opera 3,938,622 0.26%
Monitoring 5,527,439 0.36%
IEMobile 2,698,171 0.18%
BlackBerry 1,161,637 0.08%
No User Agent 1,119,479 0.07%
Monitoring2 307,882 0.02%
Gaming Device 152,633 0.01%
CMS 44,012 0.00%
wget 6,034 0.00%
curl 1,784 0.00%

Total Hits 1,516,579,920

For our users Internet Explorer reigns supreme, but the percentage of hits for IE is down quite a bit from my sample last spring where it represented just over 38% of the total. Since then IOS slipped a little from 11.3% to 10.9%, and Android rose from 3.12% to 5.09%. In total nearly 17% of our users access our content with "mobile" devices (includes phones, tablets, etc.). I suspect this is a little lower than the average of some corners of the Internet, but since the majority of our users access our content during the day on weekdays (makes me question their productivity while at work....) it's no surprise it's fairly low & desktop browsers are higher.

I've written much about my love affair with Microsoft's logparser, and that love continues. All of these over 1.5 billion log lines were crunched with logparser using only the servers' CPU's which served the content and are running 24/7 anyway. The bottom line is this info was gathered (along with a lot of other info) for free! That's right, instead of spending thousands or even tens of thousands of dollars for fancy third party log analyzing tools I've leveraged available CPU time, free software (logparser itself) and a little ingenuity to collect a great deal of useful information from a rather large amount of log files.