06/27: Enough git for julython
I’ll be giving a talk tonight at OPAG called “Enough git for julython”. That’s right, julython is coming up in a few days. I want to help boost the Ottawa, Canada participation by removing a possible barrier to participation: I will be showing people how to use git and github.
The talk is at Shopify tonight, Thursday June 27, at 19:30.
A site that promotes alternatives to the software and cloud services that the US government (and others) uses as its own databases for mining.
05/31: EFF save podcasting campaign
The EFF is raising funds to pay for a challenge to Personal Audio’s patent, that they are using to squeeze podcasters. Now might be a good time to contribute, or even join and make regular contributions.
It looks like they’ve already raised their goal, but it doesn’t hurt to support or join the EFF in this and other causes.
Also, in order to make their case that the patent is baseless, they have issued a call for prior art. If you can contribute information to their case, that will also help to win this case.
05/25: python's setuptools
One of the nice things about the Ottawa Python Author’s Group irc channel (oftc.net, #opag) is that they occasionally mention a great but under-advertized reference, like this one for setuptools:
I needed to install awstats into an existing web installation recently, and finding the info needed for that was a bit annoying. The documentation I could find gets into the nitty gritty without giving you the big picture.
So here is the big picture for awstats. Because it is meant to be a “big picture”, I’m putting the configuration discusson last. I want to cover the overall view of how the system works before getting into configuration specifics.
Overview for awstats
awstats is a script for analyzing web server logs (it has been extended to analyze other types of logs like mail logs). It analyzes the logs, and stores the statistics, and you can see the results as graphs and charts on a web page. It is a venerable old tool (meaning it doesn’t quite fit into modern ways of handling log files, init scripts, script parameters or whatever), and also is designed to be lean so it can analyze quite large logfiles without bogging down the whole system (so the parser for the log lines is a bit simple and can get confused — this just means that line is thrown away but the rest of the file does get processed).
awstats.pl is a perl script. On my Debian system it got installed into /usr/lib/cgi-bin/awstats.pl. It can run as a cgi-bin script, but doesn’t have to.
After configuring, you use it in two stages:
- analyze the web server logs
- generate the results page.
In stage 1, you run
awstats.pl -update on the log file. This will produce a bunch of .txt files. There will be a .txt file for each time period (usually a month, but could be a year). There will generally be a .txt file series for each set of logfiles for a domain or virtualhost. If one log file spans two calendar months (say, covers Jan 28 — Feb 3), then it will produce two .txt files — one for January and one for February. When you process the next logfile (that might span Feb 3 — Feb 10), then no new .txt files will be created but the existing one for February will be updated.
Generally, the documentation assumes you will not be trying to “catch up” with your old log files. If you want to run your old log files through awstats, you will need to analyze them in chronological order, as awstats.pl is meant to run on the same logfiles over and over, and only process the new items since last session. It does this by storing a date and comparing each log record to the date to find out if it is old or new. I wrote a script that processes all the old log files in order (catchup.py).
Also, as far as I know, awstats doesn’t understand compressed files, so you will have to uncompress the logfiles before analyzing them. My script handles that too, but for that it needs write permission in the logfile directory.
The “chronological ordering” requirement implies that all the things that log to that log file better agree on the time. If one app is logging in local time (say -0500) and another is logging in UTC time, then generally only the records that are 5 hours later will be picked up by awstats.pl. The other records will be regarded as “corrupted” and ignored.
You can run this stage as a cgi script — but it can also be run by a cron job. Running it as a cron job means you don’t have to give your web server user permission to write to its DocumentRoot. Running it as a cgi script means you can see the very latest statistics (right up to the moment before you run the update) — but if you don’t do it often enough, you may miss analyzing some of the web server logs (eg, if they get rotated before you run awstats.pl on them). If that happens you have the relatively painful task of trying to fix the mess, or just abandoning stats for those months. You could run it as a cron job and still allow web users to run it as well, to avoid losing info when logs are rotated.
Once you have the web server logs digested into statistics in .txt files, then you can view the results. There are two ways to view the results:
- dynamically, via a cgi script
- statically, as pre-generated static html pages
To see the results dynamically, you need to configure your web server to call the cgi script.
To see the results statically, you need to make a place for the generated html, and then call
awstats.pl -output for each report you might want to have available. There are quite a few reports, and you need to do it for each time period as well. awstats supplies a script (in my Debian system it went here:
/usr/share/awstats/tools/awstats_buildstaticpages.pl) that will generate all the reports for a given time period (i.e. month) so you just have to loop over the months. And virtualhosts, if you’re doing it for more than one web server/domain name.
There are two things to configure with awstats: one is awstats itself (a config file for each “web site”) and one is the web server that you will use to view the results (if that is how you are going to view the results). Below, I discuss only configuration of awstats itself.
The awstats.pl script is configured with files in /etc/awstats/awstats.domainname.conf (again, this is for my Debian system). You would copy the awstats example conf file to a file with your domain name in the middle, eg:
cp /etc/awstats/awstats.conf /etc/awstats/awstats.sourcerer.ca.conf
And then edit the file to have the configuration you want.
awstats works best if you have a separate series of web server logfiles for each host for which you want graphs. If you have some virtualhosts, you might want to configure them each to have their own log files.
On my Debian system using apache2 for a web server, all the log files go into the same directory
/var/log/apache2. The catchup.py script can handle this — and it would be easy to make a set of cron commands that will each update a different virtual host. At the moment, I have all the stats files and static html files going into one directory — one for stats, one for all the static html files. Maybe I should have a directory per virtualhost for the html files, though — they are getting quite numerous. A directory per virtualhost means you can more easily apply different access policies to the different domains.
The things I changed in the awstats.conf file for my purposes were:
LogFile LogFormat SiteDomain HostAliases DirData
There are lots of other options, but customizing those was enough to get some charts to start wtih.
LogFile is used if you don’t specify -LogFile on the command line. The catchup script uses the -LogFile argument on the command line, but the cron jobs that keep the stats updated can probably use the most recent logfile name domain-access.log.
LogFormat — it’s important to match the LogFormat to the actual format that your logfiles are written in, or every line will be classified as corrupted. I used format 1 for my apache2 logfiles. awstats has 4 predefined log formats, or you can specify a custom log format in exquisite detail field by field.
SiteDomain is the name of your site as your web server knows it
HostAliases is a list of other names for “self” for the web server (for the domain being analyzed).
DirData is the directory where the statistical output will go (all the .txt files).
The web sites I administer have hired a service to monitor themselves, and I added those user agents to the robots file (
/usr/share/awstats/lib/robots.pm) in order to count them. Adding them to SkipHosts just meant they weren’t counted and didn’t show up in the stats at all.
Hopefully that will give you an idea of what you’re aiming for as you follow the other, more detailed explanations of how to set up awstats. Remember, when you lose the data in a logfile and have to leave it behind — it’s only stats. You’ll manage without them. The stats are approximate anyway — very little attempt is made in the program to give an exact account of the activity. Records are thrown away and not counted almost every time awstats is run — so don’t sweat it if you lose a log file or two on the way. Once the cron jobs are set up and time passes, you’ll get fairly good coverage of the activity of the web server. Keep tuning your web app (eg, ensure that times logged use the same timezone across all apps), look at the config options for awstats and tune your .conf files for more interesting reports, and eventually you’ll have a great resource for security monitoring, marketing analysis and for web site usability and effectiveness reviews.
In fact, you probably can just set up awstats and let it accumulate statistics over time — don’t bother with the catchup script. I did it because I did have old logs, and I wanted to see what a year’s worth of web log statistics looked like in awstats and other packages. It did help me with choosing a web log analysis package, and in choosing among the various extra options for awstats not discussed above — but it was also time-consuming.
The official Debian kernel building tools are a thing of wonder. But, it didn’t do what I wanted, which was to build the exact version of the kernel that I’m running. I guess it is only ever used to build the latest version.
Also, reportbug failed (it was unable to get the list of open bugs for this package from the Bug Tracking System) — I used debian-bug in debian-el package (as noted at the bottom of this page). To actually send the mail, use ctrl-c ctrl-s in the mail buffer (or ctrl-c ctrl-c if you want to send the email and exit emacs).
Maybe I misunderstood … maybe the -5 is not the patch level I’m aiming for. We shall see.
No, the -5 is the “ABI” level, and has nothing to do with the Debian patch level. So there was no bug. I was supposed to build with all the patches. Live and learn …
X starts and promptly exits.
But only under the Xen Hypervisor.
This time the keyboard device is there even under the hypervisor, but xinit “cannot invoke xkbcomp” under the hypervisor. It’s there in /usr/bin/xkbcomp, but xinit cannot “invoke” it under the hypervisor while it can invoke it when it’s not running under the hypervisor. Mysterious.
09/27: power supply shopping
I had to shop for a power supply for my desktop. I visited a couple of computer part retailers and looked at their lists of power supplies — there are a lot of brands out there! and they make competing claims about what to look for. “One rail! Better than two or more!” “4 rails! Better than 1 rail!” “80+ Bronze” “780 Watts — Peak” “620 Watts — Continuous” So I looked for some help in interpreting all this.
I found jonnyguru.com. What a great site! They explain all the features you might want in a power supply, and explain why vendors make the competing claims (like 1 rail vs 4 rails) in plain English. Check out the FAQ for everything you need to know, concisely. They have some very nice reviews too — worth a read just to admire the review.
09/17: Software Freedom Day
Today is International Software Freedom Day. Enjoy your free software! While you still have it.