Tuesday, November 19, 2013

Crunching Billions of Log Lines With LogParser

Yesterday three different people came to me asking what type and how many browsers are our "users" using to access our content. Since our products load on about 50 - 80 million pages a day we have a pretty good sample size. And a hell of a lot of log files to collect and analyze. Since I already summarize this info on each server daily it was rather simple to gather these summaries and tally them for the month.

These calls are from one of our services that handles an average of about 50,000,000 calls per day. Of course these calls are spread across several web servers, so I took the daily summary from each server & combined them per day, then took these daily summaries and combined them for last month (October 2013). In total this represents only about 5-8% of our total daily calls, but this particular service is the first called by many of our products so it is the best place from which to gather info like user agent distribution.

UserAgent UAHits UAPercent
Internet Explorer 516,408,427 34.05%
Chrome 318,859,924 21.02%
Firefox 262,120,296 17.28%
Apple IOS 165,269,836 10.90%
Safari 136,577,103 9.01%
Android 77,221,373 5.09%
Other Mobile 10,372,620 0.68%
Other User Agents 9,694,239 0.64%
Search Bot 5,097,159 0.34%
Opera 3,938,622 0.26%
Monitoring 5,527,439 0.36%
IEMobile 2,698,171 0.18%
BlackBerry 1,161,637 0.08%
No User Agent 1,119,479 0.07%
Monitoring2 307,882 0.02%
Gaming Device 152,633 0.01%
CMS 44,012 0.00%
wget 6,034 0.00%
curl 1,784 0.00%

Total Hits 1,516,579,920

For our users Internet Explorer reigns supreme, but the percentage of hits for IE is down quite a bit from my sample last spring where it represented just over 38% of the total. Since then IOS slipped a little from 11.3% to 10.9%, and Android rose from 3.12% to 5.09%. In total nearly 17% of our users access our content with "mobile" devices (includes phones, tablets, etc.). I suspect this is a little lower than the average of some corners of the Internet, but since the majority of our users access our content during the day on weekdays (makes me question their productivity while at work....) it's no surprise it's fairly low & desktop browsers are higher.

I've written much about my love affair with Microsoft's logparser, and that love continues. All of these over 1.5 billion log lines were crunched with logparser using only the servers' CPU's which served the content and are running 24/7 anyway. The bottom line is this info was gathered (along with a lot of other info) for free! That's right, instead of spending thousands or even tens of thousands of dollars for fancy third party log analyzing tools I've leveraged available CPU time, free software (logparser itself) and a little ingenuity to collect a great deal of useful information from a rather large amount of log files.

Friday, May 17, 2013

Improving Web Services (Part II)

Last year I wrote about how we improved web server performance with some fairly small changes, and about how I keep an eye on these metrics with reports I create analyzing my web server logs with Microsoft's LogParser. This is a follow-up to those.

Recently we did an upgrade to our platform. One of the "improvements" was our amazing DBA (he is truly amazing!) did was to tighten up some of the SQL stored procedures used for returning dynamic data to our video players (playlists, video meta data, etc.). These "player services" get hit around 300,000,000 to 400,000,000 times per day, so even a small improvement can have far-reaching impact.

As I'm sure is common across regions of the web traffic is lower at certain times. Ours is no different, so I leverage lower CPU load in the middle of the night to crunch my web server logs across my fleet of web servers. As this RRD Tool graph shows CPU load is considerably lower overnight, except when the server is processing its own log file analysis. Which takes about an hour or so on each server. It's also worth noting that average response times are not negatively affected during this time - I know, I keep a close eye on that!

Among the various pieces of data gleaned by this log processing is the time (in milliseconds) each response takes, as recorded by the server. This is very valuable information to me as I can definitively know impacts of various factors; like systems deployments (such as the one that spurred this post...), performance under various load conditions (peak times vs. slow times), performance during operations or maintenance windows (crunching logs, system updates, patches, etc.), and last but not least when people come to me saying anecdotally  "customers are saying our system is slow..." I can show them with absolute certainty, both historically and at any point in time (I have some really good methods of running ad hoc reports to get up-to-the minute stats), how our system is performing or has performed.

So any time we roll out a change of any kind I look at the data to understand the impact(s), if any. After this deployment of the new and improved SQL stored procedures I'm seeing approximately a 30% decrease in response times. That's a huge improvement!

Besides loading faster (at the client side) this is also causing a noticeably lower load on both the front end web servers and database servers. Therefore we have more available capacity or head room with the same number of servers, or I could potentially shut down some of our AWS EC2 servers saving money. Now we have set the bar even higher for performance of our systems, and any future "improvements" or modifications can be accurately measured against this.

I love the fact that I have such good insight into these systems and can measure any impact of changes or varying load with great accuracy!

Sunday, March 31, 2013

Using Log Parser to Extract User Agents From Web Log Files

I've been meaning to write a follow-up (or two...) to my I LOVE LogParser post from a few months ago. The time has finally arrived.

Every day I collect and analyze (at least at a high level) somewhere around 1/2 a billion (yes, billion) web server log lines. And about that many more from a couple CDNs. Needless to say that's quite a bit of information. Besides the mundane stuff like number of hits, bandwidth served, hits per file type, etc. I've recently buckled down and written a few pretty good scripts with LogParser to extract and count User Agents.

I know even this isn't all that sexy or sophisticated, and that numerous companies selling analytic have already solved this, but, since I have the data right at my finger tips why should I pay someone else tens of thousands of dollars to analyze my logs and give me a pretty web interface. Yeah, I'll admit that would be nice, but for what I'm after I'm able to get what I need with just a little elbow grease.

This pursuit actually began several months ago when my boss came to me and asked how many and what types of Android devices were hitting our services. Since our product runs on numerous sites around the web we get all kinds of traffic. And, of course, many people (our partners, etc.) all say your product has to run on this Android device, or that Android device. But with so many out there all running so many different OS versions it's crazy. This image (from this phandroid.com article) shows it quite well.

Figure 1 - Android Fragmentation.
At this point I must give credit where credit is due. The LogParser queries below are by no means unique, nor are they original. The best one I found was a little dated. So I took it, made it a little prettier, and adapted it for three distinct outputs. First, is my daily User Agents Summary report (below). This is a high level report showing us the type and distribution of browsers, which are hitting our sites. While others publish  similar information regularly this applies directly to us, to our products, which gives us good, reliable information we can both use to convey to our partners, but also so we know where to focus time and energy on development and QA resources.

The numbers in this summary report and others in this post come from a single web server (I have lots more) for one day (March 28, 2013 UTC). So, this is current as of this posting. (See below for the exact LogParser query I'm using for this summary report.)

UserAgent UAHits UAPercent
Internet Explorer 2,611,065 38.03%
Chrome 1,322,187 19.26%
Firefox 1,241,372 18.08%
Apple IOS 775,693 11.30%
Safari 620,618 9.04%
Android 214,479 3.12%
Other Mobile 25,171 0.37%
Opera 19,262 0.28%
Other User Agents 15,342 0.22%
IEMobile 6,188 0.09%
No User Agent 5,061 0.07%
BlackBerry 4,481 0.07%
Search Bot 3,407 0.05%
Gaming Device 590 0.01%

This summary is great and very useful, but we certainly need some detail. Since the detail report is over 100 lines long I'm only going to show about a dozen lines here. Again, this detail is great for us so we know which versions of which browsers are being used to access our content at any given point.

UserAgent UAHits UAPercent
IE 9 1,170,034 16.99
Firefox 19 968,039 14.06
Chrome 25 846,618 12.30
IE 8 815,316 11.84
Safari 5 622,317 9.04
iPad 553,625 8.04
IE 10 331,290 4.81
Chrome 26 289,195 4.20
IE 7 286,986 4.17
iPhone 234,653 3.41
Android 4.1 72,227 1.05
Android 2 64,688 0.94
Android 4.0 54,877 0.80

Finally, the thing I was really after - what type of Android devices are being used to access our content? Just like the detail report this is only a partial list.

UserAgent UAHits PercentOfAndroid
Other Android Devices 52,470 24.11
Samsung Galaxy S III 25,903 11.90
Motorola Droid RAZR 16,558 7.61
Samsung Galaxy Tab 12,285 5.64
Samsung Galaxy S II 9,589 4.41
Samsung Galaxy Nexus 7 6,772 3.11
Samsung Galaxy Nexus 6,458 2.97
Samsung Galaxy Note II 6,270 2.88
HTC EVO 6,177 2.84
B&N Nook 4,271 1.96
Motorola Droid Bionic 3,736 1.72
Asus Transformer Pad 3,279 1.51
Motorola Droid RAZR M 3,140 1.44
Motorola Droid X 2,479 1.14
Motorola Droid 4 2,308 1.06
Motorola Xoom 2,278 1.05
Kindle Fire 1,498 0.69
Kindle Fire HD 728 0.33

It's no surprise that there are dozens and dozens of Android devices that are all browsing the web and hitting sites like ours. One little surprise is that Barnes and Nobel's Nook registered higher than the Kindle Fire. So many devices so little time.

Here's the Log Parser query I'm using for the User Agent summary (above).

select case strcnt(cs(user-agent),'Android') when 1 THEN 'Android'
else case strcnt(cs(user-agent),'BlackBerry') when 1 THEN 'BlackBerry'
else case strcnt(cs(user-agent),'iPad') when 1 THEN 'Apple IOS' when 2 THEN 'Apple IOS'
else case strcnt(cs(user-agent),'iPhone') when 1 THEN 'Apple IOS' when 2 THEN 'Apple IOS'
else case strcnt(cs(user-agent),'iPod') when 1 THEN 'Apple IOS' when 2 THEN 'Apple IOS'
else case strcnt(cs(user-agent),'Opera') when 1 THEN 'Opera'
else case strcnt(cs(user-agent),'Chrome') when 1 THEN 'Chrome'
else case strcnt(cs(user-agent),'Safari') when 1 THEN 'Safari'
else case strcnt(cs(user-agent),'IEMobile') when 1 THEN 'IEMobile'
else case strcnt(cs(user-agent),'MSIE') when 1 THEN 'Internet Explorer'
else case strcnt(cs(user-agent),'Firefox') when 1 THEN 'Firefox'
else case strcnt(cs(user-agent),'Googlebot') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Yahoo!+Slurp') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'bingbot') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Yandex') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Baiduspider') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'loc.gov') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'crawler@alexa.com') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Mobile') when 1 THEN 'Other Mobile'
else case strcnt(cs(user-agent),'PlayStation') when 1 THEN 'Gaming Device'
else case strcnt(cs(user-agent),'Nintendo') when 1 THEN 'Gaming Device'
else case strcnt(cs(user-agent),'curl') when 1 THEN 'curl'
else case strcnt(cs(user-agent),'wget') when 1 THEN 'wget'
else case strcnt(cs(user-agent),'-') when 1 THEN 'No User Agent'
ELSE 'Other User Agents' End End End End End End End End End End End End End End End End End End End End End End End
AS UserAgent, count(cs(User-Agent)) AS UAHits, MUL(PROPCOUNT(*),100) AS UAPercent
INTO D:\Reports\UserAgent_Summary.csv
FROM D:\Logs\.log