Re: Database design advice

From: Neil Saunders <n(dot)j(dot)saunders(at)gmail(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: Database design advice
Date: 2005-10-27 13:01:48
Message-ID: ddcd549e0510270601yc5bf7f5nf05100d0ea27e2c3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Following the advice given in this thread, I'd tought I'd just post
back what I'd come up with for the benefit of interested parties.

After a bit of searching I found a little program called pgLogd
http://www.digitalstratum.com/pglogd/

Which is a lightweight deamon that saves Apache access logs to a
PostgreSQL table. In order to generate statistics for property views
(A single page, passed a GET variable), I will use an intermediate
table as suggseted by Daniel which would be a updated via script that
is cronned to run every night.

With appropriate indexes on the date field in the primary log table,
speed shouldn't be significantly affected as the table grows. It also
has the advantage of retaining all data to allow for wider offline
analysis to assist with site trend analysis etc. The 'filtered' data
table will stay lean and mean, containing only number of hits per
property per month.

Knid Regards,

Neil Saunders.

On 10/25/05, Neil Saunders <n(dot)j(dot)saunders(at)gmail(dot)com> wrote:
> Apologies for the delayed reply - Thank you all for the advice.
>
> I'll get on to installing crosstab tonight. I really like the idea of
> dumping each page view to a table and creating a view to present that
> data, but since this data will be used to render a graph every time a
> user logs in I've got this feeling that this will prove a bottleneck
> if (when!!) the site becomes successful and the query starts to slow
> to a crawl. Of course my fears are totally without substance, but I'd
> rather spend more time thinking about the design than undo-ing a bad
> design mistake in the future.
>
> Thanks once again for you input!
>
> Kind Regards,
>
> Neil.
>
>
> On 10/21/05, Michael Glaesemann <grzm(at)myrealbox(dot)com> wrote:
> >
> > On Oct 22, 2005, at 0:47 , Neil Saunders wrote:
> >
> > > I'm writing a property rental web application, and one of the metrics
> > > I wish to track is the number of page views per month for each
> > > property.
> > >
> > > I originally envisaged a table with a column for each month (one row
> > > per property), for which the relevant column would be incremented each
> > > time a property is viewed, depending on the month. But this raises
> > > questions as to the best way to maintain this table (i.e. Create a new
> > > month column each month) I'd prefer to keep all logic in the database,
> > > so would prefer not to use a cron job to do this.
> >
> > I strongly suggest *not* making a column for each month, but rather a
> > row for each month--or even for each page view for greater
> > granularity. Databases are a great way of storing raw information and
> > transforming it into different kinds of summaries, one of which could
> > be a cross tab of properties and the hits per month. In this case
> > your raw data is tracking when a page view occurred, so you'd want to
> > have a table that captures this information: property, timestamp (or
> > date). Take a look at tablefunc in contrib about how to generate
> > cross tabs. I've found them very helpful for generating this kind of
> > summary (though in my case it's been orders per month).
> >
> > Depending on your performance needs, you may want to generate an
> > interim table that pre-calculates your totals per month. For example,
> > if your property view tracking table is
> >
> > create table property_views (
> > property_id integer not null references properties (property_id)
> > , view_timestamp timestamp(0) with time zone not null
> > ) without oids;
> >
> > you'd have a table
> >
> > create table property_views_per_month as
> > select property_id
> > , date_trunc('month', view_timestamp) as view_month
> > , count(property_id) as view_count
> > from property_views
> > group by property_id, view_month;
> >
> > You'd then generate your crosstab from the property_views_per_month
> > table. (Of course, you'd have to drop and recreate or otherwise
> > update this table periodically, as it doesn't capture up-to-date
> > data). This is an optimization step, however, so unless you find that
> > you need the data faster, you can just use a view and generate the
> > crosstab from the view, e.g.,
> >
> > create view property_views_per_month_view as
> > select property_id
> > , date_truc('month', view_timestamp) as view_month
> > , count(property_id) as view_count
> > from property_views
> > group by property_id, view_month;
> >
> > Hope this helps!
> >
> > Michael Glaesemann
> > grzm myrealbox com
> >
> >
>

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Kevin Crenshaw 2005-10-27 14:29:51 Problem inserting a row containing GUIDs
Previous Message Kevin Crenshaw 2005-10-27 09:43:10 Re: vacuuming and manu updates