Re: Need for speed 3

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Ulrich Wisser" <ulrich(dot)wisser(at)relevanttraffic(dot)se>, pgsql-performance(at)postgresql(dot)org
Cc: "Nicholas E(dot) Wakefield" <nwakefield(at)KineticNetworks(dot)com>, "Barry Klawans" <bklawans(at)jaspersoft(dot)com>, "Daria Hutchinson" <dhutchinson(at)greenplum(dot)com>
Subject: Re: Need for speed 3
Date: 2005-09-01 16:37:53
Message-ID: BF3C7C71.EBC5%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ulrich,

On 9/1/05 6:25 AM, "Ulrich Wisser" <ulrich(dot)wisser(at)relevanttraffic(dot)se> wrote:

> My application basically imports Apache log files into a Postgres
> database. Every row in the log file gets imported in one of three (raw
> data) tables. My columns are exactly as in the log file. The import is
> run approx. every five minutes. We import about two million rows a month.

Bizgres Clickstream does this job using an ETL (extract transform and load)
process to transform the weblogs into an optimized schema for reporting.

> After every import the data from the current day is deleted from the
> reporting table and recalculated from the raw data table.

This is something the optimized ETL in Bizgres Clickstream also does well.

> What do you think of this approach? Are there better ways to do it? Is
> there some literature you recommend reading?

I recommend the Bizgres Clickstream docs, you can get it from Bizgres CVS,
and there will shortly be a live html link on the website.

Bizgres is free - it also improves COPY performance by almost 2x, among
other enhancements.

- Luke

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nicholas E. Wakefield 2005-09-01 17:30:43 Re: Need for speed 3
Previous Message Merlin Moncure 2005-09-01 15:28:33 Re: Need for speed 3