Log file analysis of our Windows-based, Apache, Web sites

In our previous articles, we walked through installing Apache to a Windows XP home computer.  This time, we'll be setting up our log files for analysis, and installing a way to view the log file information.

Log files are created by Web sites to track page views and visitors.  For example, if we go to a page on one of our local Web sites with Firefox, like http://website.localhost/, it adds the following lines to a file called access.log.

127.0.0.2 - - [18/Feb/2006:09:29:43 -0600] "GET / HTTP/1.1" 200 94
127.0.0.2 - - [18/Feb/2006:09:29:43 -0600] "GET /favicon.ico HTTP/1.1" 404 291
127.0.0.2 - - [18/Feb/2006:09:29:43 -0600] "GET /favicon.ico HTTP/1.1" 404 291

This tells us that someone requested the root page at 127.0.0.2 (which is website.localhost), and was able to get the file, and 94 bytes of it (the total size of the file, in this case).  They also requested a file called favicon.ico twice, but were unable to download the file (Apache returned a 404, or file not found, error).  The size of the file they received from those requests was 291 bytes.

While we can certainly read through this file line by line, there's an easier way to handle these.

Before we install a program to look through these, let's go ahead and setup our log files.  In order to do this, we're going to open up the httpd.conf file once again.

Setting up our log files

If you followed our previous guides, you may have a shortcut in the C:\home\ folder.  Otherwise, you can access it via the Start menu, or by going to the file through Windows Explorer.

In the httpd.conf file, we're going to search for logs.  If we do this enough, we'll end up at a couple lines that states:

# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <VirtualHost>
# container, error messages relating to that virtual host will be
# logged here.  If you *do* define an error logfile for a <VirtualHost>
# container, that host's errors will be logged there and not here.

So, if we want to, we can declare where we want to store log files in the VirtualHost container (which, recall, is at the end of the httpd.conf file).  If we go down a little further, first, we see some information about the format the logs file be stored in (LogFormat).

Find this line:

CustomLog logs/access.log common

and change it to:

#CustomLog logs/access.log common

In other words, we've commented it out.  Then, scroll down a couple lines and change

#CustomLog logs/access.log combined

to:

CustomLog logs/access.log combined

In other words, we've uncommented this line.  By commenting one line, and uncommenting another, we've effectively increased what is being stored in our log files.  If we save the httpd.conf file and restart Apache (using the Services control panel), and hit one of our pages again, we'll notice a much longer line of information.

127.0.0.2 - - [18/Feb/2006:09:37:55 -0600] "GET / HTTP/1.1" 200 94 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1"

We now have referrer information and user agent information.  The first is helpful for understanding where people came from to get to the page, and the second is useful for understanding who is making the request.

Now that we've added more information to the logs, we're going to change where the log files for our subdomains is stored.

The first thing to note is that if we tell Apache to store our files in a folder that doesn't exist, it won't be able to create the folder, and won't be able to start.  So, we need to determine where we want to store our log files.  For now, let's go ahead and create a folder for each of our Web sites, in the current log file folder.  With our current settings, this would be C:\Program Files\Apache Group\Apache\logs\.  In this folder, create a website and a website2 folder.

Now, go to the VirtualHost containers at the bottom of the httpd.conf file.  Add the following bold lines to the current content (which is not bold).

<VirtualHost 127.0.0.2>
ServerName website.localhost
DocumentRoot C:\home\website\public_html
ErrorLog logs/website/error.log
TransferLog logs/website/access.log

</VirtualHost>

<VirtualHost 127.0.0.3>
ServerName website2.localhost
DocumentRoot C:\home\website2\public_html
ErrorLog logs/website2/error.log
TransferLog logs/website2/access.log

</VirtualHost>

Save the httpd.conf file, and restart Apache.  If Windows tells you that it can't start the service, make sure that you've created the folders and typed everything in correctly.  If we look in the website and website2 folders, we'll see that empty error.log and access.log files have been created.  There is also one of each of these files in the main logs folder.  If we now hit our Web sites, we'll notice that the size of the log files will increase slightly, and we'll have information about what content we viewed.

Now, let's hit each of our three Web sites a couple times.  http://localhost/, http://website.localhost/, http://website2.localhost/

Now, we have effectively created some data, albeit data from only a small number of pages.  Yet, it is data nonetheless, and since it's so small, we'll be able to easily look through our log files for information.

Before we go on, however, note that if you ever need to delete log files, you can do so by stopping Apache, deleting the files, and then starting Apache.

Installing a log file analyzer

For now, we're going to install a fairly simple log analyzer, Analog.  You can download Analog at http://www.analog.cx/, or the nearest mirror (which I highly recommend you use).  For now, I recommend downloading the zip file, if possible.  Remember to download this into the folder you downloaded the Apache installer into.  The download is about 2 MB.

In this case, I've downloaded Analog 6.0.  Once you've downloaded Analog, extract the main Analog folder to C:\home\.  We'll do this temporarily (just like how we'll only be using Analog temporarily) for ease.

Once you've extracted the folder her, open up C:\home\analog 6.0\analog.cfg with Notepad.  The first uncommented line will read as follows.

LOGFILE logfile.log

We're going to put the following lines in it's place.

LOGFILE "C:\Program Files\Apache Group\Apache\logs\access.log" http://localhost
LOGFILE "C:\Program Files\Apache Group\Apache\logs\website\access.log" http://website.localhost
LOGFILE "C:\Program Files\Apache Group\Apache\logs\website2\access.log" http://website2.localhost

Now save this file and run analog.exe.  Once it finishes, it will have created an errors.txt file, and a Report.html (both of which will be overwritten every time analog.exe is run).

Assuming you made the minor change, you should be able to open the Reports.html file up and see the number of times you hit the Web site.  If we hit the site some more and run analog.exe, we'll see that our new visits are recorded.

And with that, we've successfully installed a very basic log file analyzer.  Unfortunately, it doesn't give us the most interesting of results.  At this point, however, we'll leave things as they are.  Note that if we add any additional Web sites, we'll need to go back into the analog.cfg file and add them in as additional LOGFILE lines.

June 4, 2006: You may also find WebLog Expert Lite (currently 3.6) to also be of some use.  This program can be found at http://www.weblogexpert.com/ and comes in both a free (Lite) version and a commercial version (non-Lite).  Setup of the program is so easy that it hardly requires a tutorial.

View all of the steps to creating a local Web server, for development.