ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

Latest post 10-23-2007 12:35 PM by Jesse. 14 replies.

ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 8:51 AM

When we talk about different "types" of Web stats tools, we're referring to the main difference in the approach used by these tools in the way they collect data.

Today, I'm going to focus on two main types of tools: Log Files and Page Tagging solutions.

Log File Tools

These tools use your Web server’s log files to analyze visitor behavior. Every time someone visits your site, your Web server creates a log file containing information about who visited the page, what site referred them to your site, what time they made the request, what browser they used, and other information. These tools work by tapping into your Web server’s log files (you may need to contact your ISP for help in accessing your log files). Log files are then imported into the tool, and the tool uses its own algorithms to process your log files and present your traffic data.

Key Advantages of Log File Tools:

[ulist][*]Capture Robot/Spider Activity: Because these tools separate robot activity from human activity (unlike page tagging solutions), this information can be eliminated from your traffic reports. Robot activity can also be used to monitor SEO efforts and indexing frequency.[/*]
• [*]Server Error Code Reporting: Most log files record error code information, which allows you to identify site functionality and design issues that would be hard to detect through other methods.[/*][/ulist]
Key Disadvantages of Log File Tools:

[ulist][*]Browser Caching: Information about visitor behavior stored in cached pages residing in the user's hard drive is not accessible to log file tools. For example, when a user clicks the "Back" and "Forward" button their, these actions are not recorded by log files and this leads to inaccuracies in data collection.[/*]
[*]Identifying Unique Visitors: Because server logs will often identify different users as having the same IP address, log files do not reliably measure unique visitors.[/*][/ulist]

Page Tagging Tools

Page tagging tools generally provide more accurate data than log files. As opposed to log files that depend on server information, page tags use a client-side data collection methodology. Page tagging tools require you to insert JavaScript tags in each page you wish to track.

Key Advantages of Page Tagging Tools:
[ulist][*]Accuracy: While not perfect, these tools provide greater accuracy over server log files since they collect data directly at the end-user level. For example, data entered in form fields can be more easily collected through page tags.[/*]
[*]Speed of data reporting: Virtually instantaneous reporting, as the data is parsed more easily and reported in real time.[/*][/ulist]

Key Disadvantages of Page Tagging tools:
[ulist][*]Cookie rejection: Cookies are used to store information about visitor behaviors and to identify repeat visitors. If visitors disable cookies, the data collected is less accurate: "repeat visitor" rates will be undercounted while "new visitor" rates will be overcounted.[/*]
[*]Page tagging issues: Inserting page tags in each page of your site can be time consuming if you have a large site and the tool will NOT track any visitor data in pages that are missing tags. However page-tagging can be done much more quickly by placing the tag in a client-side include file or in your footer template, assuming your site uses any of these options.[/*][/ulist]

Which type of tool should you use? Which is better?

This is a high debated question in the Web analytics industry. Clearly, there are advantages and limitations in using either type of tool.

Personally, I like combining a page tagging tool (such as Google Analytics) with a log file tool (such as AWStats). This "hybrid" approach gives me the advantages of both types of data collection methodologies.

But no matter what type of tool you use, be sure you take the time to understand the limitations of your own tool and explore any possible workaround solutions.


Questions for the Community:

-- What tool do you use and what are the limitations (if any) that you've encountered with it in doing Web analytics for your site?

-- Do you have any strategies for getting around these limitations?


Yann

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 11:34 AM

Hi, Yann,

To be honest, the biggest limitation I've run up against is finding information on how to use the data. I've gotten around that limitation by simply working with the parts that I could understand first and extending my knowledge organically.

At first, the information that was offered was overwhelming. I just dug in where I could understand things, and slowly things have become clearer.

That doesn't address the finer points of working with these tools. I am learning a lot myself from these postings and thank you for the information.

Best wishes,

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 1:04 PM

I currently use two solutions to track metrics.

Google Analytics, which is phenomenal, but is limited in the following ways:

1. A Google Account is required, even to be granted access to view the stats. This makes it kind of awkward when dealing with clients (I build websites for Non-Profits) who don't have a Google account.

2. Their "Terms" state that they can start charging money for the service at any time. It sounds kind of like the "crack dealer" strategy to me. They get you hooked for free, then start charging you money.

3. The Map feature is not very impressive. It will only show users per country, and there is no way to implement it into your website.

4. It doesn't track IP addresses :(

The other one I use is StatsInsight.com. It is very inexpensive, and gives me something that GA doesn't: A "visitor's map" (complete with cut/paste code) that I can insert on the client's website. This way they can see points on a map of the world indicating where each visitor comes from.

Others worth mentioning: StatCounter.com - only tracks the last 100 visits with the free version. They have a Google Map to plot visits, and you can go very indepth with each visitor.

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 1:16 PM

I have a question about Page Tagging and javascript.

I have the NoScript add-on with Firefox. It seems that nearly 100% of pages today have javascript, and whether I permit it or not does not seem to affect my viewing of most of the sites. What does the javascript typically do in such cases? Is it just page tagging? How do I know if I get any benefit from allowing the scripts?

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 1:28 PM

Regarding this observation about Google:

[*]2. Their "Terms" state that they can start charging money for the service at any time. It sounds kind of like the "crack dealer" strategy to me. They get you hooked for free, then start charging you money.[/*]

I am quite uncomfortable about Google's terms of service (though I'm already addicted to gmail, I will not use any other services), and I've ranted about it a couple of times on techsoup, with a deafening silence in response. So maybe my posting here will kill this thread too:



[*]11. Content licence from you
11.1 You retain copyright and any other rights you already hold in Content which you submit, post or display on or through, the Services. By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive licence to reproduce, adapt, modify, translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services. This licence is for the sole purpose of enabling Google to display, distribute and promote the Services and may be revoked for certain Services as defined in the Additional Terms of those Services.
11.2 You agree that this licence includes a right for Google to make such Content available to other companies, organizations or individuals with whom Google has relationships for the provision of syndicated services, and to use such Content in connection with the provision of those services.[/*]

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 1:37 PM

I have a question about Page Tagging and javascript.

I have the NoScript add-on with Firefox. It seems that nearly 100% of pages today have javascript, and whether I permit it or not does not seem to affect my viewing of most of the sites. What does the javascript typically do in such cases? Is it just page tagging? How do I know if I get any benefit from allowing the scripts?

Hi Jessie,

Unfortunately I'm not familiar with the NoScript Firefox extension since I've never used it. But I would think that if you have this script enabled, Firefox should normally block out any site that use JavaScript code. Is this what the script is supposed to do?

What do you mean when you ask "is it just page tagging?" Are you asking if JavaScript page tags would interact with the NoScript extension?

If you can clarify your questions for me that would be great.

Thanks,

Yann

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-17-2007 4:21 PM


11. Content licence from you
11.1 You retain copyright and any other rights you already hold in Content which you submit, post or display on or through, the Services. By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive licence to reproduce, adapt, modify, translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services. This licence is for the sole purpose of enabling Google to display, distribute and promote the Services and may be revoked for certain Services as defined in the Additional Terms of those Services.
11.2 You agree that this licence includes a right for Google to make such Content available to other companies, organizations or individuals with whom Google has relationships for the provision of syndicated services, and to use such Content in connection with the provision of those services.


Wow, Jesse,

Those are some brutal terms. If those are the terms for Gmail, I have basically given Google licenses to most of my business' projects. Since just about every aspect of my business communication is handled through Gmail, including transfer of proprietary files.

Hopefully they have an "Additional Terms" addendum!

I better go re-read the terms right now... Uh-oh, I just found out that AL GORE sits on the board at Google... CODE RED!

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-18-2007 10:11 AM

Yann,

NoScript extension blocks scripts by default. It has an icon in the (windows) taskbar that indicates whether scripts on a page are cleared, blocked or partially blocked. When you load a page it recognizes script calls, and if you have not allowed scripts for the site from which they would run, it blocks them. You can categorically allow sites, or categorically block them ("mark as untrusted"), or you can temporarily allow them. Some sites behave very poorly if scripts are not allowed, and some sites have many scripts, often several from the likes of doubleclick and google, that have no effect on the pages. So for the scripts, served either from the visited website or from another site, that have no apparent effect on the functionality of the pages, is it always just page-tagging? Whether for page tagging or not, is there any reason for the visitor to allow such scripts? How much privacy or security is one risking by allowing such activities?

My question is based on my lack of skill with javascript -- I tried to learn it a few years ago, but did not make much progress. If I knew a little more, I might just answer my own question by deciphering the scripts on pages.

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-18-2007 2:33 PM

Hi Jesse,

Thanks for those clarifications.

So for the scripts, served either from the visited website or from another site, that have no apparent effect on the functionality of the pages, is it always just page-tagging?

No, it's not. There are so many different kinds of scripts that sites employ for so many different purposes, including those that work at the server-end (aka "server-side" scripts) and those that work at the browser level (aka "client-side" scripts). And not all of these scripts are designed to affect the front-end functionality and/or appearance of a site. So it's very likely that even when using a script-blocker such as "NoScript", you won't notice any apparent problems in the functionality of the site. Everything about the site could look perfectly normal.

Whether for page tagging or not, is there any reason for the visitor to allow such scripts?

As I'm sure you know, many of these scripts are meant to make the site more interactive (like having pop-up windows, rollover effects, etc.) and therefore enhance the visitor's experience with the site. The problem is many people can't be bothered by pop-ups and other script-related annoyances, so it's understandable that they would disable such scripts to eliminate these disturbances.

Personally, I don't mind allowing scripts. They don't really bother me all that much on most sites that I visit.

But coming back to page tagging...

If a person that prevents scripts from running goes to a site that uses page tagging to collect visitor data, the tool won't be able to count the visit to the site in the measurements. So something like the NoScript script-blocker will prevent the activation of the JavaScript page tag, which is required to send information about the visit back to the tool. So such a visitor won't be counted in the measurements. Consequently, this affects the accuracy of the data collection to some degree.

When visiting a site that relies on page tags, and you've disabled scripts from running, I guess you're just helping to skew their analytics data a little bit more.

How much privacy or security is one risking by allowing such activities?

I'm not sure exactly how much of a risk is involved, but the security risks today are less of an issue than they were in the past. Maybe someone with more knowledge about this can fill us in?

Hope this helps!

Yann

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-19-2007 2:08 AM

Paranoia concerning the Google license is much like the "common sense" purveyed by Rush Limbaugh. A full understanding of the actual contractual terms cited in your post, and the limits on the contractual relationship under law, would put most people at ease.

I would be much more afraid of discussing sensitive business information on a cell phone (where the provider industry has an 80+ year history of knee jerk cooperation with government and law enforcement information requests, and where every conversation is reconstructable through the magic of digital technology) than of discussing the same information in an email sent through gmail.

I certainly wouldn't use Google's website builder and host to run an intranet! That would place everything squarely under the terms cited:

...Content which you submit, post or display on or through, the Services....


With respect to the pertinent questions: Are the substance of an email and attachements "Content"? Is an email communication "submit[ted]"?

As the learned author of the Google terms of service must know, there are contractual terms that are unenforceable even when agreed to! In many states of the US for example, no contract can waive the "implied fitness for merchantability" imputed to every product!

Without doubt, the expectation of email service users is that the actual substance of emails and attachments won't be "sold" by the service provider. In Hawaii at least, selling the actual substance of emails and attachments would subject Google to a class action that would most likely be sustained by the Courts.

If Google has good legal advice (who could imagine they dont?) they won't subject themselves to this significant risk of financial exposure!

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-19-2007 2:33 AM

I noticed while reading this discussion that it was limited to log file analysis and page tagging. Real ecommerce selling (and non-profits on the web are selling, one way or another), has long found both these forms of analysis to lack the fineness of detail necessary for productive analysis of customer (visitor) behavior. The "big league" in visitor behavior analysis is now accomplished by definition of "custom" events, logged at the application level.

By way of example, an ecommerce site which I am hoping to take live in the next few weeks logs more than 40 types of events. Included are such minutia as hovering on specific links, scrolling the page, changing the displayed text or image size, and many other "events" that won't ever appear in a log file or data returned by "tag" scripts.

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-19-2007 1:48 PM

On the subject of use licensing by Google, I suspect that if you look at any of the large services that host videos, pictures and blogs that they contain the same conditions. Here's a "real" (read physical) world analogy:

You are an artist and you create a poster that informs people about a cause, event, etc. You print the poster and arrange to have it displayed in the shop window of a bookstore in your neighborhood. The moment you drop off the poster you have provided an implicit "use license" to the shop owner. I don't think anyone would argue that the shop owner has the right to:
[olist]1. Move the poster
2. Use the poster to illustrate the types of things they will display
3. Use the poster promote their own involvement with whatever the cause is[/olist]

What Google is claiming right to do is along those lines. You may have an argument if you are paying for the service. If it is provided for free then their use of your content is your "cost" for the service. I for one don't believe email messages are "content". I'm sure they have a definition somewhere that defines what falls under the submitted content clauses.

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-19-2007 1:53 PM

Yes, kula -- I limited the discussion to those two main types of tools.

I actually thought about discussing other technologies here. But given that this is a nonprofit community, I figured most people would relate best to log files and page tagging solutions -- these mainstream tool types -- and decided not to get into the more e-commerce-oriented Web analytics.

But you're right. There are more sophisticated analytics systems that are being employed by organizations. Such tools offer very deep measurement and segmentation of visitor/customer behavior at a much more granular level.

Indeed, these are exciting times in the world of Web analytics. I can't wait to see where the tools of today will be 5 years from now.

One thing is certain: Web analytics will continue to gain prominence in the corporate world, and will eventually become "standard operating procedure" for every company. Just like you need a Finance and Accounting department, no company will be able to survive without a Web Analytics unit as well.

In fact, I wouldn't be surprised if this becomes a central function of every Marketing department.

Yann

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-20-2007 10:23 AM

Some of the defense of Google's terms is nonsense. The defense is made with the implication that it is based on "full understanding of the actual contractual terms ... and the limits on the contractual relationship under law." But, for a big example, there is no issue of merchantability.
Merchantability: "of commercially acceptable quality : salable" Nobody is claiming that google services are of poor quality; quite to the contrary, it is the quality of the services that leads people into a compromised, if not really dangerous, position. The fact that Google does not seem to be widely abusing its contractual position does not tell you what they are allowed to do. We know that they want to publish all the world's books. Maybe they want to publish things that you write.

RE: ONLINE EVENT -- Day 2 -- Types of Tools and Data Collection Methods

10-23-2007 12:35 PM

gizmoproject.com has similar terms of use, but the content that they are interested in is only comments, et c., about their products and services. I think their language is colorful:

"... if, at any time you upload or post User Materials, including but not limited to comments, suggestions, problem reports, bug reports and design ideas to the Site you automatically grant Gizmo Project a non-exclusive, royalty-free, perpetual license of all rights throughout the universe to use, edit, modify, include, incorporate, adapt, record and reproduce the User Materials including, without limitation, all trademarks associated therewith, in any manner whatsoever, in or out-of-context, in all languages, in all media now known and hereafter devised, and to use the User Materials in advertising, promotion and publicity for the Site, Gizmo Project and its products and services, in any and all media now known or hereafter devised."