W3C extended log format fields and IIS 6.0

In a previous article, I gave an overview of the World Wide Web Consortium (W3C) extended log format, in relation to Internet Information Services (IIS) 6.0.

This time, I'd like to cover what each field provides, again in relation to IIS and a Web site, for statistical and debugging purposes.

What fields are available

Again, we've covered what fields are available in the W3C extended log format in a previous article.

Read A brief overview of IIS 6.0 and the W3C extended log format for more information about these fields.

Essential fields

In the W3C extended log format, there are a number of essential fields, which are listed below.

  • Date ( date )
  • Time ( time )
  • URI Stem ( cs-uri-stem )

At the very least, there are these three fields which should be tracked.

Date and Time should be tracked in order to determine the time at which a request took place. Even though you may be using daily logs, keeping Date in means you can do daily analysis over multiple logs.

At the very least, URI Stem is also essential, in order to determine which files are being called, and how often.

Essential fields 2

Of course, the first list isn't quite enough, which is why most of the time you'll want to have the following as well.

  • URI Query ( cs-uri-query )
  • Protocol Status ( sc-status )
  • Referer ( cs(Referer) )

While URI Stem works for older, static sites, URI Query is equally essential for today's dynamic sites, which rely on parameters being passed and accepted.

Protocol Status gives us an idea of whether the request was successful or not, which enables us to determine which requests were invalid, or may be causing other errors.

Depending upon your views on things, Referer is essential, or not. I've decided to include it, since it helps determine where invalid requests are coming from, as well as where valid requests are coming from. This can also help determine the path individuals are taking through your own site.

Slightly less essential fields

Now that we've gone through the essential fields, in two parts, we can take a look at what fields are slightly less essential. Not everyone may find a need for these fields, but most of the time I think you'll want to include them.

  • Client IP Address ( c-ip )
  • User Agent ( cs(User-Agent) )

One could argue that Client IP Address and User Agent are very essential indeed. However, depending upon the purpose of the site, they may not be.

For example, for an Intranet, or personal site, there may be no reason to track either of these fields, since the client used to view the site, as well as the IP addresses, could be a controlled listing.

However, for sites that are available for a larger, external, audience, these two fields are indeed quite essential. For the first reason, however, I include these both here.

Client IP Address can be used to determine which IPs, if any, are resulting in traffic, either good or bad. User Agent can be used to determine what software is being used to view the site, and thereby allow for proper testing.

Semi-essential fields

Semi-essential fields are fields that I believe are essential, for more in-depth analysis of traffic and server status. These fields are listed below.

  • Bytes Sent ( sc-bytes )
  • Bytes Received ( cs-bytes )
  • Time Taken ( time-taken )

As we discussed in the previous article, sc is server to client, and cs is client to server. While Bytes Received isn't as essential as Bytes Sent, since the majority of traffic will be the latter, I've grouped them here. Both are semi-essential, since they both help determine not only how much is being transmitted, but also who is pulling down a lot of traffic (combined with Client IP Address).

Time Taken ties in directly with these, as well, since typically the more Bytes going around, the longer it's going to take. So, most of the time, files with a larger Bytes Sent value will also have a large Time Taken value. However, Time Taken may also mean that a file is performing poorly, for more dynamic files.

Even more semi-essential

We've got a few more fields to cover that are semi-essential, before we list 'the rest.'

  • Server Port ( s-port )
  • Method ( cs-method )
  • Protocol Substatus ( sc-substatus )
  • Protocol Version ( cs-version )

Method is a nice enough field, since it helps determine how the content is being requested. Is someone GETting the file? Is information being POSTed back? Is something just requesting the HEAD?

Server Port is semi-essential, or not, depending what ports are being used. For most sites, it'll just be port 80, but you may have other posts used as well.

Protocol Version can be used to give some insight into what software is being used to view files, which may help with determing what technologies, such as compression, can be used.

Finally, Protocol Substatus may provide additional information, depending upon the Protocol Status that was returned. Win32 Status, which we haven't mentioned yet, is similar, but I wouldn't consider it as essential, which is why I'm not including it here.

The rest

Finally, we've got the rest. Of course, these are how I'm rating these, so if someone can convince me otherwise, please do. Otherwise, these are the fields, in order solely of how they are listed in IIS, that are less essential than all the rest.

  • User Name ( cs-username )
  • Service Name ( s-sitename )
  • Server Name ( s-computername )
  • Server IP Address ( s-ip )
  • Win32 Status ( sc-win32-status )
  • Host ( cs-host )
  • Cookie ( cs(Cookie) )

User Name is a good field, if you're using technology that stores this. For example, many Intranet sites, or the like. In these cases, this can help you determine who's doing what. However, for the average external site, this field won't provide any helpful information.

Service Name is the site name used in IIS, which is typically something in the format of W3SVC#. This can be helpful when going over multiple log files, but otherwise may not provide helpful information. Likewise, Server Name provides the name of the computer. This may be helpful after moving from one computer to another, if you've changed the name, in order to determine performance changes, but otherwise ... most sites won't have a need for this.

Ditto for Server IP Address, which, for most sites, will be pretty consistent. Again, this may provide some help in determining performance, or otherwise comparing one server to another (such as over multiple logs, from the same day).

Win32 Status may provide additional information about a status code, but Protocol Status may be enough.

Host can help rebuild a URI, if you have multiple domains pointing to a single site, or otherwise see which is performing better.

Finally, you've got Cookie. Depending upon your site, this may be very essential indeed, but I'm not that big of a fan. However, if you're site is relying upon cookies, this will obviously be worth that much more.

Final thoughts

Hopefully, this gives an idea of the benefit of various W3C extended log format fields. Of course, this doesn't cover actual analysis of these fields, which we'll cover in another article.