UNIX Tutorials, Tips, Tricks and Shell Scripts

SHADOW: The Free Apache Web Server Access Log Analyzer Tool - Part I


The SHADOW Apache Access Log Analyzer Tool, as its name indicates, is used to parse the access log from a website running the Apache web server. I want to make this clear upfront in case you were looking for a tool that analyzes a different Apache log file.

When I originally wrote the SHADOW shell script over 10 years ago, before "website analytics" was such a hot buzzword, it was not intended to be an Apache access log analyzer tool per se...meaning...it is a tool and it certainly does analyze access logs from an Apache web server, but the original (big picture) intent was to simply write a shell script that would parse out some key information about visitors who came to the website.  The desired information included:

  • How did the visitor find the website (organic search, direct, paid search, referral or social)?
  • If the source was paid search...
  • ...where exactly did they come from?
    ...what was the actual search term they typed in to trigger the website's ad to be displayed?
    ...from a back-end advertising campaign perspective, what paid search keyword phrase was matched to the actual search phrase the visitor typed in?
  • Had they visited the site before?  If they were a return visitor, how many times have they been to the site and how long has it been since their original and last visits?
  • What network domain were they connecting from?
  • What did they do while they were on the site? In other words, what pages did they view and what actions (if any) did they perform?
  • Most importantly, did the visitor perform the desired objectives for the site? For example, did they make a purchase or subscribe to a newsletter or blog?

The most important item from this list is the second bullet related to paid search.  Since this shell script was developed to use with a shiny new website...okay, maybe more new than shiny...the quickest method for reaching and "inviting" the right people was by using paid search advertising, which was also just in it's infancy at the time.

You may be wondering why I am writing a series of articles about an Apache access log analyzer shell script.  There are two primary reasons:

(1) Educational: It's not that I believe that this rather lengthy shell script for tracking website visitors is an example of superior shell scripting skills and techniques, remember it was written 10 years ago when my shell scripting knowledge could be best described as primitive, but it is a shell script that serves as a complete application or solution.  This differs from the type of articles I typically write which have an in-depth focus on a specific shell programming topic, such as for and while loops shell programming constructs.

I recognize and will be the first person to acknowledge that parts of the SHADOW shell script could have been written better, which also would have certainly reduced the lines of code needed to achieve the same high-level objectives, but if you have a sincere desire to learn UNIX and Linux shell scripting then investing some time to understand how and why SHADOW works would likely prove beneficial to you.  It may inspire you to write your own log file analyzer or possibly influence design decisions as you write your own shell scripts.  Here are some of the shell scripting concepts and constructs demonstrated by the SHADOW shell script:

  • command line argument processing
  • shell script variables
  • runtime logging
  • single and multi-line (if-then-else) conditional statements
  • how to read and write to files (includes opening and closing files)
  • while and for loops
  • command return codes
  • sending email from a shell script

(2) Free and Simple Apache Log Analyzer: Regardless of the obvious improvements that could be made to the SHADOW shell script, it serves as a functioning albeit basic application that has performed its job well over the years and will continue to do so until it has been re-written and made available to a wider audience in its new life as an online website analytics tool (web app).  If you are searching for a ready-to-go and free tool for analyzing Apache access log files and have a basic understanding of the UNIX or Linux Operating System AND shell scripting then this relatively simple shell script may be the right solution for you.

To be truthful with you, I honestly did not expect this tool to last as long as it has but (as you probably can understand) sometimes in a startup environment you go with what works for as long as it make sense since there are typically outstanding tasks that have no current solutions.


How SHADOW Works: The Non-Technical Overview

SHADOW is a relatively basic shell script that analyzes and parses Apache access logs.  It was never intended or designed to be a soup to nuts full-featured website analytics tool.  It was intended and designed to be a simple, but automated and useful, Apace log parser and single report generator (that's right, only one report).  The information provided by the single SHADOW report, as alluded to earlier, is used to understand key information about website visitors, and more importantly those visitors who arrived by way of paid search advertising.  Considering SHADOW's simplicity, I will only provide an outline of what it does do since listing what it does NOT do would take a significant amount of time.

Before looking at what SHADOW does, it would be helpful to understand a brief list of SHADOW-related terms:

SHADOW - the Apache log parser and analyzer tool written in shell script
FOOTPRINTS - the pages viewed by a website visitor, and the time each one was viewed
RETURN VISITOR (RV) - a visitor that has been to the website before

What SHADOW does...at a High-level:

  • runs on both UNIX and Linux...it is a shell script after all!
  • creates just ONE daily website traffic report containing an information block for each website visitor for that day, regardless if the visitor was there once or multiple times
  • the information block for each website visitor contains the following:

    • the visitor header section, which is used to determine...
    • ...how the visitor found the site - organic search, direct, paid search, referral or social
      ...if the visitor came from a search engine's search results page, and what keywords did the visitor use (if supplied by the search engine)
      ...if the visitor came from paid advertising, and what ad did the visitor click through on and if applicable what were keywords were used
      ...the visitor's network domain (if it can be determined)
      ...if the visitor is a RETURN VISITOR (RV) or not
    • the visitor's FOOTPRINTS for the entire day appear in a single grouping regardless of how many times they visited the site
  • ignores visits from bots, crawlers and alike and excludes them from the daily website traffic report
  • let's you know how many people visited your website that day
  • automatically sends the website traffic report out via email for human consumption and analysis


Is SHADOW Right for You?

If you would benefit from knowing the information presented in the previous section for each visitor to your website, then SHADOW could be right for you.  It works well if you:

  • want to know how visitors found your website and where they are connecting from
  • need to determine the effectiveness of your paid search marketing campaigns
  • RELATED TIP: Search engines have started to hide search terms used by the website visitors who find your site using organic search, even if you use the search engine's own (in-house) web analytics tool.
  • want to understand what visitors are doing while on your website...based on their FOOTPRINTS
  • need to know if a visitor has been to your website before (is a RETURN VISITOR), when they visited, and how many times they visited
  • do not have the time, resources or desire to learn or use a complex interactive web analytics tool and would prefer to just receive a daily email to track website traffic and visitors
  • do not want to add HTML code to your website so that you can perform website traffic analysis

    • SHADOW, in its present incarnation, may NOT be the right Apache log analyzer tool for you if you are interested in knowing additional website traffic analytics data such as:

      • total pageviews
      • average number of pageviews viewed during a visit
      • average visit duration
      • bounce rate
      • % of first time visitors
      • visitor demographic or geo information
      • device category breakdown (desktop, mobile, tablet)
      If you are interested in more detailed metrics about your website traffic such as these, then SHADOW's next incarnation may be a more attractive solution for you...


      The Evolution: SHADOW Becomes an Online Web Analytics Tool

      Using SHADOW for over a decade to track website visitors and also to assist with overall website traffic analysis has provided extensive insight regarding what improvements could be made, without going overboard, to make the tool even more useful and easier to use.

      SHADOW: The Apache Web Server Access Log Analyzer Tool will experience a rebirth as a web app and will become SHADOW3: The Automated Website Analytics Tool

      These are the features currently planned for SHADOW3:

      • the same functionality currently provided by SHADOW, the "offline" Apache log analyzer shell script, which is covered in the non-technical overview section above
      • unlike SHADOW, there will be no script to download and run
      • no HTML code to install on your website
      • online real-time search interface to retrieve all historical website traffic data for a specific visitor
      • limit the daily website traffic report to a subset of website visitors that meet a certain criteria or perform a certain task during their visit, such as subscribe to a blog or make a purchase
      • when desired, all of a website visitor's FOOTPRINTS for every day they have been to the site will appear in a single view/report (in SHADOW, this is a manual operation)
      • configurable triggers that generate automated ad hoc reports sent via email
      • automatic parsing of website visitor source and search related data

      Although these are the primary enhancements planned for SHADOW3, any suggestions you have for additional functionality would be considered and appreciated!  You can send them using the contact information located in the section below entitled "How to learn more about SHADOW and/or use it to meet your needs in the website traffic analytics space."


      What will be included in the next article in the SHADOW shell script series

      The next article in this series, which has been creatively entitled "SHADOW: The Free Apache Web Server Access Log Analyzer Tool - Part II" (it took much intellectual and creative energy to add the second "I" after "Part"), will include:

      • the source code for the SHADOW shell script with explanations
      • How SHADOW Works: The Technical Overview (emphasis on "Technical" vs the "Non-Technical" overview above)
      • SHADOW Installation and Configuration Guide so that you can setup and use it in your environment
      • excerpts from actual SHADOW daily website traffic report with explanations
      • status of the SHADOW3 launch
       
      How to learn more about SHADOW and/or use it to meet your needs in the website traffic analytics space

      If you would like to be notified when a new article in the SHADOW shell script series is posted or receive information about the availability of SHADOW3: The Automated Website Analytics Tool, please provide your contact information below.  You can also send suggestions or feedback for SHADOW3, or its predecessor SHADOW, to shadow@livefirelabs.com.



      Do you need a better understanding of shell scripting concepts and constructs? Either of these online courses is a good place to start...

      UNIX and Linux Operating System Fundamentals contains a very good "Introduction to UNIX Shell Scripting" module, and should be taken if you are new to the UNIX and Linux operating system environments or need a refresher on key concepts.

      UNIX Shell Scripting is a good option if you are already comfortable with UNIX or Linux and just need to sharpen your knowledge about shell scripting and the UNIX shell in general.

      Both courses include access to a real server in our Internet Lab for completing the course's hands-on exercises, which are used to re-enforce the key concepts presented in the course. Any questions you may have while taking the course are answered by an experienced UNIX technologist.

      Thanks for reading, and happy shell scripting!