Loading

Friday, October 5, 2012

AWS VPC VPN with SonicWALL NSA and PRO Series Firewalls

Recently Amazon announced, (see also) "You can now create Hardware VPN connections to your VPC using static routing."  This is great news as it greatly expands the type of devices from which a point-to-point IPSec VPN can be created to your Virtual Private Cloud.  Previously only dynamic routing was supported, which required BGP and a device (like Cisco ISR).  Now VPC supports static routing, greatly expanding the types of devices through which a VPN can be connected.  Now devices like Cisco ASA 5500 firewalls, and even Microsoft Windows Server 2008 R2 (or later) can be used.  And, as I finally got working, SonicWALL firewalls (I connected with a NSA 2400, but I'm sure others will work as well).

Here's what I did to get my statically routed point-to-point IPSec VPN setup between my Amazon Virtual Private Cloud (VPC) and a SonicWALL NSA 2400.

First, create a VPC.  Here is a great step-by-strep guide to create a VPC: How to Create an Amazon VPC.

In the VPC Management Console click on VPN Connections, select your VPN (you may only have one), then click Download Configuration. Next to Vendor select Generic, then Download.


This file contains all the critical information you'll need, like pre-shared keys, IP addresses, etc.


Connect to your SonicWALL's web interface and perform the following.

Step 1 - Create Address Object
Go to Network, select Address Object.  In the Address Objects section, click the Add button and configure with these settings:
  • Name: VPC LAN (this is arbitrary)
  • Zone Assignment: VPN
  • Type: Network
  • Network: the subnet portion of the VPC CIDR
  • Netmask: the subnet mask portion of the VPC CIDR

Step 2 - Create New VPN Policy
From VPN, Settings add new policy, using the following information:

  • General Tab
    • Authentication Method: IKE using Preshared Secret
    • Name: Any name you choose
    • IPsec Primary Gateway: IP address from downloaded config
    • IPsec Secondary Gateway: Secondary IP address from config
    • Shared Secret: Shared secret from config
  • Network Tab
    • Local Networks: Select appropriate setting for your environment
    • Destination Networks: VPC LAN from previous step
  • Proposals Tab
    • Exchange: Main Mode
    • DH Group: Group 2
    • Encryption: AES-128
    • Authentication: SHA1
    • Life Time: 28800
    • Protocol: ESP
    • Encryption: AES-128
    • Authentication SHA1
    • DH Group: Group 2
    • Life Time 28800
  • Advanced Tab
    • Set as required for your environment.

Once all the settings are correct you should be able to see the tunnel status in both your SonicWALL and AWS Console. Test connections over the tunnel using ICMP ping or other methods.

VPN Status from SonicWALL
VPN Status from AWS Console



One Guy, Two Blog Posts


I'm not much into self-aggrandizement, but WTH, here goes.

This evening while watching a disappointing baseball game I wrote a little about what I do at work, which includes overseeing a network of systems that served over 58,000,000 page views yesterday (a new single-day record for us with much more traffic soon...). To load our content on this many pages my systems answered somewhere around 1 billion calls in 24 hours. This is a pretty good overview of what I do while sitting behind a computer all day, and what I've been working on for the last four years.

A few of my fellow employees used to work for a quasi-competitor of ours and went to lunch recently with some of their former colleagues who, when they found out our company only has one "IT guy" were amazed. Who is that masked man, anyway....

For those fellow geeks out there this is a little more insight into my world.

Enjoy!

I LOVE LogParser

Recently I wrote about some code improvements we deployed that had a huge impact, causing our web services calls to be answered much quicker and saving us a fair amount of money each month. In that post I talked about using Cacti to graph server statistics (like CPU and IIS stats), and service response times. That post also included this spreadsheet showing how we lowered service response times considerably.

Figure 1 - Spreadsheet comparing average load time before and after code deployment.
This post is about how I use Microsoft's LogParser to extract very useful information from my IIS web server logs. About LogParser Microsoft says:
Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.
Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.
I use LogParser for a variety of tasks, but this is all about getting some good summary data from my IIS logs. (See Log Parser Rocks! More than 50 Examples! for some great examples of using LogParser - I've used this as a reference a lot!)

Over night when load to my IIS servers is considerably lower than during the day I have scheduled jobs that summarize information from the logs, then zips them & writes them to S3 for longer-term storage. For a little perspective my IIS servers handle somewhere around a cumulative total of 500,000,000 to 600,000,000 (yes, over 1/2 a BILLION requests) per day.  Since every request is logged (and I log every possible parameter in IIS) that's over 1/2 a billion log lines in total. Each and every day. Each of my IIS servers logs between 10-25 million requests per day - these log files come in at about 5-10 GB per server per day.

Figure 2 - Windows Explorer view showing rather large IIS log files on one server.
I guess I should address the question of why I log all requests and why I spend the time to analyze the logs. Besides the fact that I'm a bit of a data junkie, and that it gives me bragging rights that I collect so much data, the main reason is that it gives me a lot of very useful information. These summaries tell me number of requests, bandwidth served, average response time (both by hour and by minute - see spreadsheet above), number of and distribution of server responses (I.E. HTTP responses like 200, 302, 404, 500, 503, etc.), website referrers and much more. Then I use this summary data to compare servers - I can compare them to one another for a day, and over time. Of major significance is the fact that I can get a fairly quick and granular view of the impact to our systems of changes like code deployments (see my post, "Improving Web Server Performance in a Big Way With Small Changes" for more on this).

I'm not going to bore you with all my logparser commands. After all I'm not doing anything special & what most of them do has been well documented elsewhere. I am going to show a couple of things I do with logparser, both in my daily summarization of logs and on-demand to view the immediate effect of changes.

Note: I'm using line breaks in the following commands for readability.

Hits By Hour
logparser -i:W3C -o:CSV
"SELECT TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 3600)) AS DateTime,
count(*) as Requests,
MUL(PROPCOUNT(*), 100) AS Percentage,
SUM(sc-bytes) AS TotalBytesSent,
MIN(time-taken) AS MinTime,
MAX(time-taken) AS MaxTime,
AVG(time-taken) AS AvgTime,
Div(TotalBytesSent,Requests) AS BytesPerReq
INTO D:\LogFiles\IIS\Reports\%IISTYPE%\%IISTYPE%_%IISDT%_%COMPUTERNAME%_%RTIME%_HitsByHourSummary.csv
FROM D:\LogFiles\IIS\%IISSITE%\ex%IISDT%*.log
GROUP BY DateTime ORDER BY DateTime"
Hits By Minute
logparser -i:W3C -o:CSV
"SELECT TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 60)) AS DateTime,
count(*) as Requests,
MUL(PROPCOUNT(*), 100) AS Percentage,
SUM(sc-bytes) AS TotalBytesSent,
MIN(time-taken) AS MinTime,
MAX(time-taken) AS MaxTime,
AVG(time-taken) AS AvgTime,
Div(TotalBytesSent,Requests) AS BytesPerReq
INTO D:\LogFiles\IIS\Reports\%IISTYPE%\%IISTYPE%_%IISDT%_%COMPUTERNAME%_%RTIME%_HitsByMinuteSummary.csv
FROM D:\LogFiles\IIS\%IISSITE%\ex%IISDT%*.log
GROUP BY DateTime ORDER BY DateTime"
Breakdown
  • logparser -i:W3C -o:CSV - this tells logparser the input file is W3C and to output results as CSV
  • SELECT TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 60)) AS DateTime - this little gem breaks down the results per minute (using 60) or per hour (3600); you could specify any number of seconds for this - at times when I'm feeling randy I'll break it out into 5 minute intervals with 300
  • count(*) as Requests - this gives me a total count of the log lines
  • MUL(PROPCOUNT(*), 100) AS Percentage - assigns a percentage of total calls to the specified time period
  • SUM(sc-bytes) AS TotalBytesSent - total bytes sent (server to client) from server
  • MIN(time-taken) AS MinTime - lowest time (in miliseconds) for a response
  • MAX(time-taken) AS MaxTime - highest time (in miliseconds) for a response
  • AVG(time-taken) AS AvgTime - average time (in miliseconds) for a response (this is the one I'm really after with this command - more on this in a minute...)
  • Div(TotalBytesSent,Requests) AS BytesPerReq - bytes per request or how much data was sent to the client on average for each request (another very useful one)
  • INTO D:\LogFiles\IIS\Reports\%IISTYPE%\%IISTYPE%_%IISDT%_%COMPUTERNAME%_%RTIME%_HitsByMinuteSummary.csv - into summary file (variables are used in scheduled job and should be self explanitory. One to note, however, is the %IISDT% [DT is for date] variable which I've documented in Get Yesterday's date in MS DOS Batch file)
  • FROM D:\LogFiles\IIS\%IISSITE%\ex%IISDT%*.log - from file (once again variables used in batch process)
  • GROUP BY DateTime ORDER BY DateTime - finally group by and order by clauses
Results
Here's a look at the results of this query (the second one, hits by hour) with a summary showing the particularly useful average time per request.

Figure 3 - results of logparser hits by hour query.
Wait, There's More
I use the hits by minute query for both summary data and to take a near real-time look at a server's performance. This is particularly useful to check the impact of load or code changes. In fact, just today we had to push out some changes to a couple SQL stored procedures and to determine if these changes had a negative (or positive) impact on the web servers I ran the following:
logparser -i:W3C -o:TSV "SELECT TO_LOCALTIME(QUANTIZE(TO_TIMESTAMP(date, time), 60)) AS DateTime, count(*) as Requests, SUM(sc-bytes) AS TotalBytesSent, Div(TotalBytesSent,Requests) AS BytesPerReq, MIN(time-taken) AS MinTime, MAX(time-taken) AS MaxTime, AVG(time-taken) AS AvgTime FROM D:\LogFiles\IIS\W3SVC1\u_ex%date:~12,2%%date:~4,2%%date:~7,2%*.log GROUP BY DateTime ORDER BY DateTime"
NOTES
  • First, notice I'm outputting it as a TSV and not specifying an output (with the INTO command) file. This is because I want the results displayed to the command prompt where I'm running it and want it tab separated to make it easier to view.
  • Second, and I'm quite proud of this, is "u_ex%date:~12,2%%date:~4,2%%date:~7,2%*.log". These variables parse today's date and format it in the default YYMMDD used by IIS logs. See the results of echo u_ex%date:~12,2%%date:~4,2%%date:~7,2%*.log command.
Figure 4 - results of echo u_ex%date:~12,2%%date:~4,2%%date:~7,2%*.log
So, when we pushed the code change today I ran the above command on each of my front-end servers to determine if there was a negative (or positive) impact. By breaking the results, particularly the average time per response, out by minute I could see if there was any impact immediately after the code was pushed. Which there wasn't - at least that's what I determined after a little while of thinking there had been a negative impact of the code push....

Notice the lines at 13:25 and 13:39 where the average time per response increased to 149 and 273 respectively. It turns out the minute or so while logparser was running (even though while watching task manager there didn't seem to be much of a noticeable CPU load hit) the average response time climbed quite a bit.
Figure 5 - results of logparser by minute command
So, by using logparser I'm able to summarize hundreds of millions of IIS log lines each day, and on-demand when needed to keep a good pulse on just what my servers are doing.


Improving Web Server Performance in a Big Way With Small Changes


A week ago today we pushed out some code to our IIS and MS SQL servers that has had a huge impact.  In a good way - a very good way.

First, a little background. I work for an online video company that has "widgets" and video "players" that load on thousands of websites millions of times a day.  Each time one of these entities loads numerous calls get made to our systems for both static and dynamic content. One of the highest volume services is what we call "player services," where calls are made to display playlists, video meta data, etc. based on our business rules. For example, can a particular video be displayed on this website, etc. Anyway, these player services get hit a lot in a given day, about 350,000,000 times (or more) a day. That's over 1/3 of a billion (yes, billion, with a B) times a day to just this one service (this represents about 1/3 of the total hits to the systems I oversee.  You do the math...).

We do have a scalable infrastructure which has steadily been growing over the past few years.  In fact, we've grown tremendously as you can see by this Quantcast graph.

Figure 1 - Traffic growth over the past few years.
Over the years we've stumbled a few times, but have learned as we've grown. (I remember when we had our first Drudge link & it caused all kinds of mayhem. Recently we had five links on Drudge simultaneously and barely felt it.) We've also had many break-through's that have made things more efficient and/or saved us money.  This one did both.

We have fairly predictable daily traffic patterns where weekday's are heavier than weekends, and during the day is heavier than at night. Using a couple of my favorite tools (RRDTool & Cacti) I am able to graph all kinds of useful information over time to know how my systems are performing. The graph in figure 2 shows the number of web requests per second to a particular server. It also shows the relative traffic pattern over the course of a week.

Figure 2 - Cacti RRDTool graph showing daily traffic pattern over one week for one server.
I use a great little Cacti plug-in called, "CURL BWTEST Response Time Graph Template" to monitor response times for various calls. As figure 3 shows, our average service response times would climb during peak traffic times each day. I monitor these closely every day and if the response times get too high I can add more servers to reduce the response time to an acceptable level.

Figure 3 - Cacti CURL graph showing service call response time.
Here's the exciting part, finally. As you can see in figure 3 we no longer have increasing response times during the day when our load increases. How did we do it, you ask? Well, a couple of really sharp guys on our team, our DBA and a .NET developer, worked on several iterations of changes to both SQL stored procedures and front-end code that now deliver the requested data quicker and more efficiently. Believe it or not this took a little work. Earlier I alluded to stumbling a few times as we've grown, and that happened here.  A couple weeks ago we made a few additions to our code and deployed it. The first time we basically took down our whole system for a few minutes until we could back it out. Not only did that scare the crap out of us it made us look bad.

So we went back to the drawing board and worked on the "new and improved" addition to our code. After a couple more false starts we finally cracked this one. Without boring you with the details our DBA made the stored procedures much more efficient and our front-end developer made some tweaks to his code & the combination was dynamite.

As figure 3 shows, we aren't suffering from increased response times for calls under heavier load conditions. This in and of itself is fantastic; after all, quicker responses equals quicker load times for our widgets and players which equals a better user experience, which ultimately should translate to increased revenue. But this isn't the only benefit of this "update." The next improvement is the fact that overall the responses are quicker. Much quicker. As you can see in figure 4 on Sept. 14 (before the code deployment) the average response time (broken out per hour) increased dramatically during peak traffic times and averaged a little over 100 ms for the day. After the deployment, looking at data from Oct. 4, not only did the average response time stay flat during the higher traffic times, but it is down considerably overall. Responses now average 25 ms at all hours of the day, even under heavy load. This is a great double whammy improvement! In fact, Oct. 4 was a much heavier load day than 9/14, so not only were the responses quicker, the server handled way more traffic.

Figure 4 - Spreadsheet comparing average load time before and after code deployment.
So far I've discussed the benefits on the front-end web servers, but we've also seen a dramatic improvement on the back-end database servers that service the web servers. Figure 5 shows the CPU utilization over the past four weeks of one of our DB servers and how since the deployment a week ago it has averaged almost half what it did before the deployment. This enabled me to shut down a couple of database servers to save quite a bit of money. I've also been able to shut down several front-end servers.

Figure 5 - Cacti graph of database server CPU utilization over 4 weeks.
Due to this upgrade I have been able to shut down a total of 12 servers, saving us over $5000 per month; made the calls return quicker causing our widgets and players to load faster - at all hours of the day; and made our infrastructure even more scalable. Win, win, win!

For more information, and if you're feeling especially geeky see my post, "I LOVE LogParser" for details on how I am able to use Microsoft's logparser to summarize hundreds of millions of IIS log lines each day, and on-demand when needed to keep a good pulse on just what my servers are doing.

Wednesday, October 3, 2012

Enable Quick Launch in Windows 7, Windows 2008 and 2008 R2

Call me old fashioned.  Say I'm stuck in the past.  Whatever.  I just don't like a lot of the things Microsoft has done to the Windows interface/desktop over the years.  Every time I get a new computer or start up a new server certain things have to be done to make it usable, i.e. not annoying to me.  One of those things is to enable the Quick Launch bar.

So to add Quick Launch to any Windows 7 or Windows 2008 server do the following.
  1.  Right-click an empty area of the taskbar, select Toolbars, then click New toolbar.
  2. In the dialog box, enter %AppData%\Microsoft\Internet Explorer\Quick Launch, then click Select Folder.
Now you can customize Quick Launch by displaying small icons, no labels, moving it, etc.

Monday, October 1, 2012

Windows Command Line WhoIs

I regularly find myself trying to find the owner of a domain or needing other information, like authoritative name servers.  A few command line whois.exe programs exist out there for Windows, but the one I like best is the one by Mark Russinovich at Sysinternals.  You can visit the previous link to download or use wget http://download.sysinternals.com/files/WhoIs.zip at the command line (assuming you have wget for Windows.)

One little trick I like to do is place whois.exe in the Windows system32 directory (c:\windows\system32 for example), because this directory is normally in your system path.  This way whois can be executed in a command prompt from any directory.

Here's a look at whois microsoft.com: