Categories: ,
Posted by: bjb

I needed to install awstats into an existing web installation recently, and finding the info needed for that was a bit annoying. The documentation I could find gets into the nitty gritty without giving you the big picture.

So here is the big picture for awstats. Because it is meant to be a “big picture”, I’m putting the configuration discusson last. I want to cover the overall view of how the system works before getting into configuration specifics.

Overview for awstats

awstats is a script for analyzing web server logs (it has been extended to analyze other types of logs like mail logs). It analyzes the logs, and stores the statistics, and you can see the results as graphs and charts on a web page. It is a venerable old tool (meaning it doesn’t quite fit into modern ways of handling log files, init scripts, script parameters or whatever), and also is designed to be lean so it can analyze quite large logfiles without bogging down the whole system (so the parser for the log lines is a bit simple and can get confused — this just means that line is thrown away but the rest of the file does get processed).

awstats.pl is a perl script. On my Debian system it got installed into /usr/lib/cgi-bin/awstats.pl. It can run as a cgi-bin script, but doesn’t have to.

After configuring, you use it in two stages:

  1. analyze the web server logs
  2. generate the results page.

Stage1

In stage 1, you run awstats.pl -update on the log file. This will produce a bunch of .txt files. There will be a .txt file for each time period (usually a month, but could be a year). There will generally be a .txt file series for each set of logfiles for a domain or virtualhost. If one log file spans two calendar months (say, covers Jan 28 — Feb 3), then it will produce two .txt files — one for January and one for February. When you process the next logfile (that might span Feb 3 — Feb 10), then no new .txt files will be created but the existing one for February will be updated.

Generally, the documentation assumes you will not be trying to “catch up” with your old log files. If you want to run your old log files through awstats, you will need to analyze them in chronological order, as awstats.pl is meant to run on the same logfiles over and over, and only process the new items since last session. It does this by storing a date and comparing each log record to the date to find out if it is old or new. I wrote a script that processes all the old log files in order (catchup.py).

Also, as far as I know, awstats doesn’t understand compressed files, so you will have to uncompress the logfiles before analyzing them. My script handles that too, but for that it needs write permission in the logfile directory.

The “chronological ordering” requirement implies that all the things that log to that log file better agree on the time. If one app is logging in local time (say -0500) and another is logging in UTC time, then generally only the records that are 5 hours later will be picked up by awstats.pl. The other records will be regarded as “corrupted” and ignored.

You can run this stage as a cgi script — but it can also be run by a cron job. Running it as a cron job means you don’t have to give your web server user permission to write to its DocumentRoot. Running it as a cgi script means you can see the very latest statistics (right up to the moment before you run the update) — but if you don’t do it often enough, you may miss analyzing some of the web server logs (eg, if they get rotated before you run awstats.pl on them). If that happens you have the relatively painful task of trying to fix the mess, or just abandoning stats for those months. You could run it as a cron job and still allow web users to run it as well, to avoid losing info when logs are rotated.

Stage 2

Once you have the web server logs digested into statistics in .txt files, then you can view the results. There are two ways to view the results:

  1. dynamically, via a cgi script
  2. statically, as pre-generated static html pages

To see the results dynamically, you need to configure your web server to call the cgi script.

To see the results statically, you need to make a place for the generated html, and then call awstats.pl -output for each report you might want to have available. There are quite a few reports, and you need to do it for each time period as well. awstats supplies a script (in my Debian system it went here:
/usr/share/awstats/tools/awstats_buildstaticpages.pl) that will generate all the reports for a given time period (i.e. month) so you just have to loop over the months. And virtualhosts, if you’re doing it for more than one web server/domain name.

Configuration Considerations

There are two things to configure with awstats: one is awstats itself (a config file for each “web site”) and one is the web server that you will use to view the results (if that is how you are going to view the results). Below, I discuss only configuration of awstats itself.

The awstats.pl script is configured with files in /etc/awstats/awstats.domainname.conf (again, this is for my Debian system). You would copy the awstats example conf file to a file with your domain name in the middle, eg:

cp /etc/awstats/awstats.conf /etc/awstats/awstats.sourcerer.ca.conf

And then edit the file to have the configuration you want.

awstats works best if you have a separate series of web server logfiles for each host for which you want graphs. If you have some virtualhosts, you might want to configure them each to have their own log files.

On my Debian system using apache2 for a web server, all the log files go into the same directory /var/log/apache2. The catchup.py script can handle this — and it would be easy to make a set of cron commands that will each update a different virtual host. At the moment, I have all the stats files and static html files going into one directory — one for stats, one for all the static html files. Maybe I should have a directory per virtualhost for the html files, though — they are getting quite numerous. A directory per virtualhost means you can more easily apply different access policies to the different domains.

The things I changed in the awstats.conf file for my purposes were:

LogFile
LogFormat
SiteDomain
HostAliases
DirData

There are lots of other options, but customizing those was enough to get some charts to start wtih.

LogFile is used if you don’t specify -LogFile on the command line. The catchup script uses the -LogFile argument on the command line, but the cron jobs that keep the stats updated can probably use the most recent logfile name domain-access.log.

LogFormat — it’s important to match the LogFormat to the actual format that your logfiles are written in, or every line will be classified as corrupted. I used format 1 for my apache2 logfiles. awstats has 4 predefined log formats, or you can specify a custom log format in exquisite detail field by field.

SiteDomain is the name of your site as your web server knows it

HostAliases is a list of other names for “self” for the web server (for the domain being analyzed).

DirData is the directory where the statistical output will go (all the .txt files).

The web sites I administer have hired a service to monitor themselves, and I added those user agents to the robots file (/usr/share/awstats/lib/robots.pm) in order to count them. Adding them to SkipHosts just meant they weren’t counted and didn’t show up in the stats at all.

Last words

Hopefully that will give you an idea of what you’re aiming for as you follow the other, more detailed explanations of how to set up awstats. Remember, when you lose the data in a logfile and have to leave it behind — it’s only stats. You’ll manage without them. The stats are approximate anyway — very little attempt is made in the program to give an exact account of the activity. Records are thrown away and not counted almost every time awstats is run — so don’t sweat it if you lose a log file or two on the way. Once the cron jobs are set up and time passes, you’ll get fairly good coverage of the activity of the web server. Keep tuning your web app (eg, ensure that times logged use the same timezone across all apps), look at the config options for awstats and tune your .conf files for more interesting reports, and eventually you’ll have a great resource for security monitoring, marketing analysis and for web site usability and effectiveness reviews.

In fact, you probably can just set up awstats and let it accumulate statistics over time — don’t bother with the catchup script. I did it because I did have old logs, and I wanted to see what a year’s worth of web log statistics looked like in awstats and other packages. It did help me with choosing a web log analysis package, and in choosing among the various extra options for awstats not discussed above — but it was also time-consuming.

Categories: ,
Posted by: bjb

I found a site where there is some not-just-good, but right-out excellent twisted documentation.

http://krondo.com/blog/?page_id=1327

Categories: , , ,
Posted by: bjb

The official Debian kernel building tools are a thing of wonder. But, it didn’t do what I wanted, which was to build the exact version of the kernel that I’m running. I guess it is only ever used to build the latest version.

debian bug 649394

Here is the best documentation I found for this task. It refers to this which is also pretty good.

Also, reportbug failed (it was unable to get the list of open bugs for this package from the Bug Tracking System) — I used debian-bug in debian-el package (as noted at the bottom of this page). To actually send the mail, use ctrl-c ctrl-s in the mail buffer (or ctrl-c ctrl-c if you want to send the email and exit emacs).

UPDATE:

Maybe I misunderstood … maybe the -5 is not the patch level I’m aiming for. We shall see.

UPDATE:

No, the -5 is the “ABI” level, and has nothing to do with the Debian patch level. So there was no bug. I was supposed to build with all the patches. Live and learn …

Categories: , , ,
Posted by: bjb

X starts and promptly exits.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=646987

But only under the Xen Hypervisor.

This time the keyboard device is there even under the hypervisor, but xinit “cannot invoke xkbcomp” under the hypervisor. It’s there in /usr/bin/xkbcomp, but xinit cannot “invoke” it under the hypervisor while it can invoke it when it’s not running under the hypervisor. Mysterious.

Categories:
Posted by: bjb

I had to shop for a power supply for my desktop. I visited a couple of computer part retailers and looked at their lists of power supplies — there are a lot of brands out there! and they make competing claims about what to look for. “One rail! Better than two or more!” “4 rails! Better than 1 rail!” “80+ Bronze” “780 Watts — Peak” “620 Watts — Continuous” So I looked for some help in interpreting all this.

I found jonnyguru.com. What a great site! They explain all the features you might want in a power supply, and explain why vendors make the competing claims (like 1 rail vs 4 rails) in plain English. Check out the FAQ for everything you need to know, concisely. They have some very nice reviews too — worth a read just to admire the review.

Categories:
Posted by: bjb

Today is International Software Freedom Day. Enjoy your free software! While you still have it.

Consider becoming a supporting member of the Free Software Foundation and/or the Electronic Frontier Foundation if you want to do something more concrete towards supporting free software.

Categories:
Posted by: bjb

I recently worked on some c code on an embedded platform with this declaration:

__attribute__((critical)) void somefunc (void) {
    function body ...
}

I’d never seen anything like that.

It turns out that “critical” (and “atomic” and a few other keywords) are part of the open mp spec, where multiprocessing support is being built in to compilers. This has been moving into gcc since 2005 (at least, that’s when I see the mention of “omp” in the changelogs).

Dunno when it will be available on x86 though … it didn’t work on my desktop:

bjb@blueeyes:~/junk/foo$ gcc try1.c -o try1
try1.c:12: warning: 'critical' attribute directive ignored
try1.c:23: warning: 'critical' attribute directive ignored
bjb@blueeyes:~/junk/foo$ gcc --version
gcc (Debian 4.4.5-8) 4.4.5
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

bjb@blueeyes:~/junk/foo$

There was no warning on the customer platform, but it’s presence did not produce a different executable than source code without it.

Categories: ,
Posted by: bjb

The next “activation date” came and went, with no DSL service. The reason given is that my address did not exactly match the address on my phone bill. I said the address as it should be into the phone when I ordered the service, but the customer service rep(s) on the other end insisted on putting the letter after the number as a “unit” number or “suite” number. I said I don’t write 999 unit A, I write 999A. But they didn’t know how to enter that into their (or their supplier’s) system.

I wonder why that is, when I had DSL service through NCF, which uses TekSavvy for it’s upstream, and that worked first try?

Why is TekSavvy so certain this time that the activation will work, when it failed the last two times?

Why did TekSavvy decline my offer to fax my phone bill (with address) to their office?

If they’ve done something to ensure that it will work this time — why didn’t they do that the first time?

Categories: ,
Posted by: bjb

I’m having a rocky start with TekSavvy. In spite of ordering DSL service a while ago, the order got messed up and I’ve had to call a few times to try to sort it out. Today, NCF cutoff day, I find that the order has been so badly mangled that I have to start over from scratch and be internetless for a week.

I have spoken to 2 newbie customer service reps out of the three calls I’ve made (they volunteered the info). One insisted that the only way to switch between payment types was to start over (but he didn’t tell me that it would delay the start date). He didn’t cancel the first order … that confused TekSavvy no end, I even got a call from them asking about it. But the person who called didn’t make it better so that when I spoke to the next newbie (today), we had to start over - again.

Categories: , ,
Posted by: bjb

I’ve ordered a new DSL supplier and have cancelled the old one — the transfer date is June 24. So if I go offline June 24, that might be why. I’ll be back.

I haven’t got my new static IP address yet, nor my new IPv6 subnet. Stay tuned! Hopefully I’ll find out what they are before June 24 (so I can put them in DNS on time).

NCF (National Capital FreeNet) has been great — but I wanted a native IPv6 supplier. So, I’m trying out TekSavvy. TekSavvy is NCF’s upstream, as it happens.

I will try to stay in touch with NCF by visiting the fora and asking/answering questions there, if I see anything I can respond to.