Resources and terminology: web site statistics

Web Site Statistics: How, Why and What to Count

By Makiko Itoh

image: data jumping from computer to computer

There are many reasons to count the number of visitors to your web site. Perhaps you want to sell banner ads to generate some revenue. Or maybe you need to know if your search engine submissions have taken effect. Or maybe you're simply curious to see who's looking at your work! Here is an overview of the different ways of counting visitors to your site, and what trails they leave behind.

Tracking Visitors

There are three ways of counting and tracking web site visitors. The first method involves analyzing the access logs that are automatically generated by the server software, parsing the results and displaying them in charts or graphs. These statistical analysis programs can either reside on the server itself, or on your own computer.

The second method is commonly called a counter. These work by counting up the number of times a specific file is called. Most remote-hosted counter scripts work this way; the "count" data is then put into a database that is located on the remote server. The file called can be a graphic, or an include (that's a file that is parsed on the server side and then inserted into the HTML document before it's delivered to the client. Or in plain English, it's like a bit of data that's inserted on the fly at a specified place in the web page.)

The third method involves setting a cookie in the user's browser cookie section. This allows you to track individual visitors. Cookie setting requires some customized scripting and so is beyond the scope of this article, so let's talk about the log analysis and counter methods.

Remote Hosted Graphical Counters

This may be the most familiar type of page access counter. They are the small graphics that are included on many pages from places such as Sitemeter and TheCounter.com (see the Resources section for a list.)

The main advantage of remote hosted counters is that they are very easy to set up, and can be put on any web page, even free hosting sites such as Geocities. Usually, all that is involved is to set up an account and then copy-pasting the appropriate code onto the page you want to track. They are well suited to small and personal web sites and home pages; they are great for example if you are a weblogger, since most web logs start from just one page. However, there are some significant drawbacks to counters that are worth consideration.

Some of the counter methods completely omit some users; for example, if the counter relies on JavaScript, it will not count any users who are surfing with non-JavaScript enabled browsers or browsers that have problems with JavaScript. Therefore, they might be undercounting WebTV users for example, or even completely missing text-only browsers such as Lynx. You must also install the counter script and file (usually a graphic) on every page that you want to be tallied up.

Another drawback of graphical counters is that some people consider them to a bit tacky. This is purely a matter of personal taste, and you can always choose a counter based on whether their graphic suits your taste or not. Be aware though that there really is nothing that's totally "free" -that little graphic you are putting on your page is an advertising vehicle for the counter provider. (Some services such as MyComputer.com say outright that they require you to put their advertising graphic on your pages.) They are usually advertising for-pay "enterprise" options of their counter services, or some other web-related business.

It's also possible to "hide" the graphics required by the counters. If the particular counter script requires you to place a small graphic on your page for example, you can make it practically invisible by specifying the IMG size as "height=1" width="1", or placing it in a hidden layer. Be careful though, since this may violate the basic agreement with the counter provider. Several counter service providers will allow you to hide the image in this way if you pay them a small fee.

Counter scripts

Another option to consider is a counter script that is hosted on your own site. There are many CGI scripts that accomplish this, and you have more control over what is actually shown on your page. If you have the ability to use PHP or ASP for instance, you can also use counter scripts in those languages. The more useful counter scripts can track sessions, or a user's whole visit from the time she enters the site to the time she goes away (or is idle for a certain number of minutes or more).

A note about the visible counters with numbers you may see on other people's pages: just because a counter shows a large number does not necessarily mean that page was actually accessed that much! Counter "start numbers" can often be manually set to any number by the web site administrator.

Server-installed log analysis tools

If you have a larger site and want to analyze the actual logs for a big advertising campaign for example, then the stat analysis method may be preferred. Many hosting providers offer preinstalled stat analysis packages at little or no cost with their hosting plans; this can be something to factor in when you are comparing hosting providers. On a dedicated server that my company runs for example, we installed Urchin, which generates very attractive graphical reports that are accessible via a web page. Other leading server-hosted stat analysis programs include WebTrends and WUsage.

Log analysis tools for your PC

The last category is log analysis software that runs on your own PC. Basically, you download the raw logs from your server, and run them through the analyzer, which then generates various kinds of reports and graphs for you. Two programs in this category are the free Analog, which is available for most major operating systems, and FastStatsAnalyzer.

If you have access to the raw logs on your server, these can be very useful tools. The reports are far more meaningful than those you can get with counters, and you can set up all kinds of filters to pick apart your statistics. The results can then be output to various formats, such as comma-delimited format for importing into databases, or HTML. If you are a web developer or web master who is in charge of maintenance of multiple sites, it may be possible to offer periodic statistics as an added service to your clients.

What statistics to look for

When someone visits your site, they leave a trail of valuable information. Here are some statistics to look for, roughly in order of usefulness:

Referrers. This tells you the previous page the visitor was on before they got to your site. This can tell you many things, such as who is linking to you, and whether your advertising and promotional efforts are paying off. This may be the most important statistic to have when you are trying to increase traffic to your site. It will also tell you if your search engine listings are effective or not too. (Note that some log analyzers will also give you separate stats for referrals from major search engines.)

Another use of referrer information is to find out if anyone is linking to files on your site (such as any graphics you are offering) without permission, or even "stealing" whole pages and putting them within frames on their own site. It's a good idea to always keep a sharp eye on such bandwidth and content thieves, and stats are the only means of doing this.

Errors. When someone requests a file and it's not found, you will see an error entry in your logs. That can alert you to missing files and broken links.

Browser/platform usage, or USER AGENT information. This tells you what kind of browser and computer your visitors are using. This is very important information to have when you are constructing a site. For example, let's say your statistics show that you have a substantial number of visitors who are using older browsers. Then, you might have to think twice about making your site Flash-only or heavily dependent on JavaScript, for example. However, be wary of relying on this information totally, since some browsers such as Opera and iCab have the ability to "masquerade" as other browsers.

Most visited pages, most requested files, and entry/exit pages. What are the most popular sections of your site? Which page(s) are people getting to first? Which page is the major exit point? Also - is anyone accessing a particular file again and again, such as a GIF? If they are they might be linking directly to it, stealing your bandwidth.

Unique Visitors and repeat visitors. This method keeps track of the IP address of visitors, and is time-based. If someone with the same IP address visits your site within a specified amount of time, they are counted as only one visitor, and accesses after the first one are counted as repeat visits. This is a fairly good way of keeping track of people, but is not necessarily accurate, since most people do not have static of fixed IP addresses. (If you have a dialup connection, a new IP address is assigned to you by your ISP every time you go online. If you have a broadband connection such as cable or DSL and have not specifically asked for a static IP address, then your IP address changes whenever you go offline or when your ISP just changes them, which it does periodically.) If you really want to track individual visitors as accurately as possible, you would have to set cookies.

Hits. This is the most-often mentioned statistic. It can sound impressive if someone says that they get 1,000 hits in one day for example, but this can be a bit misleading. A hit is the access of a single file; therefore, if you have an HTML page that has 5 graphics on it, the HTML file itself, an external JavaScript file and an external CSS file, then that's going to count as 8 hits. If your site has pages that are dynamically generated on the server using lots of includes for example, that's going to count as even more hits for every file that makes up that page. To confuse things even further, some counter programs claim to count "page views", or accesses to a single page. They can claim this because their counting method depends on access of one file per page, as described above.

However, hit counts are very important for one reason - advertising. Currently, all web advertising rates for banner ads and such are based on CPM, or counts per 1,000 hits/page views.

Unusual activity in general. Keep on the lookout for an unusual volume or kind of activity in general, and especially for anything that looks like suspicious activity by spiders or robots out to harvest email addresses and such. If you suspect anything but don't know how to handle it yourself, talk to the tech support for your server.

Putting your stats to work

Maybe the most important thing to look for is trends. For example, are hit counts to your site increasing, decreasing or staying even? Did your recent banner ad campaign yield good results? How about the newsletter you just started - is it increasing visitors to your site?

Ultimately, you want to use your stats to get the most out of your web site. For more information about how to utilize the information you glean from your web site statistics, see this Wise-Women article by Gisele Glosser about ways to promote your site.

Resources

Server-installed log analysis scripts or programs

See if your host provider offers any of these with their web site hosting plans, and factor that into the price/performance evaluation when selecting a host.

LiveStats Statistics Server
WebTrends (WebTrends also offers WebTrends Live, a for-pay remote-hosted counter service that can track multiple pages.)
Urchin
WUsage

Log analysis programs

To use these, you need access to your "raw logs" that are generated by the server software. If in doubt, ask your host provider.

Analog (freeware)
FastStatsAnalyzer (shareware, $99; for Windows only)

Remote-hosted counters and statistics services

Just a few of the many available. Many of these companies offer free and pay options, with the pay options offering more statistics. Also see note about WebTrends Live above.

Extreme Counter
Hitbox
Hitmatic
Hitometer from Website Garage
Sitemeter
SuperStats and Counter from MyComputer.com
Stats for All
TheCounter.com

Other resources

The CGI Resource Index lists many counters scripts, written in Perl and other languages. The PHP Resource Index is a sister site for PHP scripts.

HotWired's Web Monkey has several articles that explain some of the terminology here such as hits, bandwidth, IP addresses, and much more.

Photo of Maki


Makiko (Maki) is principal of PRODOK Engineering, a company near Zürich, Switzerland specializing in low-paper workflow solution consulting for clients worldwide. She is a graphic designer with more than 5 years experience in print design prior to switching almost exclusively to the web 4 years ago; currently she also provides CSS/JavaScript consulting for PRODOK clients. She is also an Adobe Certified Expert in Photoshop, having used it and Illustrator since about 1990. She has written for PlanetPDF.com, Digital-Web.com, and WebReview.com.