optimized counting of web statistics

From: Billy extyeightysix <exty86(at)gmail(dot)com>
To: Postgresql Performance list <pgsql-performance(at)postgresql(dot)org>
Subject: optimized counting of web statistics
Date: 2005-06-28 20:39:05
Message-ID: 22455e7050628133978b17cd4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hola folks,

I have a web statistics Pg database (user agent, urls, referrer, etc)
that is part of an online web survey system. All of the data derived
from analyzing web server logs is stored in one large table with each
record representing an analyzed webserver log entry.

Currently all reports are generated when the logs are being analyzed
and before the data ever goes into the large table I mention above.
Well, the time has come to build an interface that will allow a user
to make ad-hoc queries against the stats and that is why I am emailing
the performance list.

I need to allow the user to specify any fields and values in a query.
For example,

"I want to see a report about all users from Germany that have flash
installed" or
"I want to see a report about all users from Africa that are using FireFox 1"

I do not believe that storing all of the data in one big table is the
correct way to go about this. So, I am asking for suggestions,
pointers and any kind of info that anyone can share on how to store
this data set in an optimized manner.

Also, I have created a prototype and done some testing using the
colossal table. Actually finding all of the rows that satisfy the
query is pretty fast and is not a problem. The bottleneck in the
whole process is actually counting each data point (how many times a
url was visited, or how many times a url referred the user to the
website). So more specifically I am wondering if there is way to store
and retrieve the data such that it speeds up the counting of the
statistics.

Lastly, this will become an open source tool that is akin to urchin,
awstats, etc. The difference is that this software is part of a suite
of tools for doing online web surveys and it maps web stats to the
survey respondent data. This can give web site managers a very clear
view of what type of people come to the site and how those types use
the site.

Thanks in advance,

exty

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Billy extyeightysix 2005-06-28 20:43:46 Re: optimized counting of web statistics
Previous Message Sebastian Hennebrueder 2005-06-28 20:38:54 Re: tricky query