Resources and terminology: web site statistics
By Makiko Itoh
There are many reasons to count the number of visitors to your web site. Perhaps you want to sell banner ads to generate some revenue. Or maybe you need to know if your search engine submissions have taken effect. Or maybe you're simply curious to see who's looking at your work! Here is an overview of the different ways of counting visitors to your site, and what trails they leave behind.
There are three ways of counting and tracking web site visitors. The first method involves analyzing the access logs that are automatically generated by the server software, parsing the results and displaying them in charts or graphs. These statistical analysis programs can either reside on the server itself, or on your own computer.
The second method is commonly called a counter. These work by counting up the number of times a specific file is called. Most remote-hosted counter scripts work this way; the "count" data is then put into a database that is located on the remote server. The file called can be a graphic, or an include (that's a file that is parsed on the server side and then inserted into the HTML document before it's delivered to the client. Or in plain English, it's like a bit of data that's inserted on the fly at a specified place in the web page.)
The third method involves setting a cookie in the user's browser cookie section. This allows you to track individual visitors. Cookie setting requires some customized scripting and so is beyond the scope of this article, so let's talk about the log analysis and counter methods.
This may be the most familiar type of page access counter. They are the small graphics that are included on many pages from places such as Sitemeter and TheCounter.com (see the Resources section for a list.)
The main advantage of remote hosted counters is that they are very easy to set up, and can be put on any web page, even free hosting sites such as Geocities. Usually, all that is involved is to set up an account and then copy-pasting the appropriate code onto the page you want to track. They are well suited to small and personal web sites and home pages; they are great for example if you are a weblogger, since most web logs start from just one page. However, there are some significant drawbacks to counters that are worth consideration.
Another drawback of graphical counters is that some people consider them to a bit tacky. This is purely a matter of personal taste, and you can always choose a counter based on whether their graphic suits your taste or not. Be aware though that there really is nothing that's totally "free" -that little graphic you are putting on your page is an advertising vehicle for the counter provider. (Some services such as MyComputer.com say outright that they require you to put their advertising graphic on your pages.) They are usually advertising for-pay "enterprise" options of their counter services, or some other web-related business.
It's also possible to "hide" the graphics required by the counters. If the particular counter script requires you to place a small graphic on your page for example, you can make it practically invisible by specifying the IMG size as
"height=1" width="1", or placing it in a hidden layer. Be careful though, since this may violate the basic agreement with the counter provider. Several counter service providers will allow you to hide the image in this way if you pay them a small fee.
Another option to consider is a counter script that is hosted on your own site. There are many CGI scripts that accomplish this, and you have more control over what is actually shown on your page. If you have the ability to use PHP or ASP for instance, you can also use counter scripts in those languages. The more useful counter scripts can track sessions, or a user's whole visit from the time she enters the site to the time she goes away (or is idle for a certain number of minutes or more).
A note about the visible counters with numbers you may see on other people's pages: just because a counter shows a large number does not necessarily mean that page was actually accessed that much! Counter "start numbers" can often be manually set to any number by the web site administrator.
If you have a larger site and want to analyze the actual logs for a big advertising campaign for example, then the stat analysis method may be preferred. Many hosting providers offer preinstalled stat analysis packages at little or no cost with their hosting plans; this can be something to factor in when you are comparing hosting providers. On a dedicated server that my company runs for example, we installed Urchin, which generates very attractive graphical reports that are accessible via a web page. Other leading server-hosted stat analysis programs include WebTrends and WUsage.
The last category is log analysis software that runs on your own PC. Basically, you download the raw logs from your server, and run them through the analyzer, which then generates various kinds of reports and graphs for you. Two programs in this category are the free Analog, which is available for most major operating systems, and FastStatsAnalyzer.
If you have access to the raw logs on your server, these can be very useful tools. The reports are far more meaningful than those you can get with counters, and you can set up all kinds of filters to pick apart your statistics. The results can then be output to various formats, such as comma-delimited format for importing into databases, or HTML. If you are a web developer or web master who is in charge of maintenance of multiple sites, it may be possible to offer periodic statistics as an added service to your clients.
When someone visits your site, they leave a trail of valuable information. Here are some statistics to look for, roughly in order of usefulness:
Referrers. This tells you the previous page the visitor was on before they got to your site. This can tell you many things, such as who is linking to you, and whether your advertising and promotional efforts are paying off. This may be the most important statistic to have when you are trying to increase traffic to your site. It will also tell you if your search engine listings are effective or not too. (Note that some log analyzers will also give you separate stats for referrals from major search engines.)
Another use of referrer information is to find out if anyone is linking to files on your site (such as any graphics you are offering) without permission, or even "stealing" whole pages and putting them within frames on their own site. It's a good idea to always keep a sharp eye on such bandwidth and content thieves, and stats are the only means of doing this.
Errors. When someone requests a file and it's not found, you will see an error entry in your logs. That can alert you to missing files and broken links.
Most visited pages, most requested files, and entry/exit pages. What are the most popular sections of your site? Which page(s) are people getting to first? Which page is the major exit point? Also - is anyone accessing a particular file again and again, such as a GIF? If they are they might be linking directly to it, stealing your bandwidth.
Unique Visitors and repeat visitors. This method keeps track of the IP address of visitors, and is time-based. If someone with the same IP address visits your site within a specified amount of time, they are counted as only one visitor, and accesses after the first one are counted as repeat visits. This is a fairly good way of keeping track of people, but is not necessarily accurate, since most people do not have static of fixed IP addresses. (If you have a dialup connection, a new IP address is assigned to you by your ISP every time you go online. If you have a broadband connection such as cable or DSL and have not specifically asked for a static IP address, then your IP address changes whenever you go offline or when your ISP just changes them, which it does periodically.) If you really want to track individual visitors as accurately as possible, you would have to set cookies.
However, hit counts are very important for one reason - advertising. Currently, all web advertising rates for banner ads and such are based on CPM, or counts per 1,000 hits/page views.
Unusual activity in general. Keep on the lookout for an unusual volume or kind of activity in general, and especially for anything that looks like suspicious activity by spiders or robots out to harvest email addresses and such. If you suspect anything but don't know how to handle it yourself, talk to the tech support for your server.
Maybe the most important thing to look for is trends. For example, are hit counts to your site increasing, decreasing or staying even? Did your recent banner ad campaign yield good results? How about the newsletter you just started - is it increasing visitors to your site?
Ultimately, you want to use your stats to get the most out of your web site. For more information about how to utilize the information you glean from your web site statistics, see this Wise-Women article by Gisele Glosser about ways to promote your site.
See if your host provider offers any of these with their web site hosting plans, and factor that into the price/performance evaluation when selecting a host.
LiveStats Statistics Server
WebTrends (WebTrends also offers WebTrends Live, a for-pay remote-hosted counter service that can track multiple pages.)
To use these, you need access to your "raw logs" that are generated by the server software. If in doubt, ask your host provider.
FastStatsAnalyzer (shareware, $99; for Windows only)
Just a few of the many available. Many of these companies offer free and pay options, with the pay options offering more statistics. Also see note about WebTrends Live above.
Hitometer from Website Garage
SuperStats and Counter from MyComputer.com
Stats for All
The CGI Resource Index lists many counters scripts, written in Perl and other languages. The PHP Resource Index is a sister site for PHP scripts.
HotWired's Web Monkey has several articles that explain some of the terminology here such as hits, bandwidth, IP addresses, and much more.