Re: suggestions to improve postgresql suitability for data-mining

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Fabien COELHO" <coelho(at)cri(dot)ensmp(dot)fr>, "PostgreSQL Developers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: suggestions to improve postgresql suitability for data-mining
Date: 2003-07-23 01:33:54
Message-ID: 0efc01c350ba$7ad71140$2800a8c0@mars
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> II) SQL
> -------
>
> The first idea is to ask SQL to do the job with a 'group by' clause:
>
> SELECT area, type, month, SUM(amount), COUNT(*)
> FROM client AS c, invoice AS i
> WHERE c.id=i.client
> GROUP BY area, type, month;
>
> As I am just interested in reading the data, without any transaction, I
> tuned a little bit the database parameters (fsync=false, more shared_mem
> and sort_mem).
>
> It works, but it is quite slow and it requires a lot of disk space.
> Indeed, the result of the join is big, and the aggregation seems to
> require an external sort step so as to sum up data one group after the
> other.
>
> As the resulting table is very small, I wish the optimizer would have
> skipped the sort phase, so as to aggregate the data as they come after the
> join. All may be done on the fly without much additionnal storage (well,
> with some implementation efforts). Maybe it is the "hash evaluation of
> group by aggregates" item listed in the todo list.

As of 7.4CVS it does do this. You will find this much faster in 7.4
release.

Chris

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-07-23 03:01:11 Re: reprise on Linux overcommit handling
Previous Message Christopher Kings-Lynne 2003-07-23 01:18:53 Re: php with postgres