Re: cost and actual time

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>, "Chantal Ackermann" <chantal(dot)ackermann(at)biomax(dot)de>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: cost and actual time
Date: 2003-02-21 01:45:24
Message-ID: 07e301c2d94a$e6eec450$6500a8c0@fhp.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I nominate Manfred for support response award of the week!

Chris

----- Original Message -----
From: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>
To: "Chantal Ackermann" <chantal(dot)ackermann(at)biomax(dot)de>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>; <pgsql-performance(at)postgresql(dot)org>;
<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Thursday, February 20, 2003 6:00 PM
Subject: Re: [PERFORM] cost and actual time

> On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann
> <chantal(dot)ackermann(at)biomax(dot)de> wrote:
> >Nested Loop: 53508.86 msec
> >Merge Join: 113066.81 msec
> >Hash Join: 439344.44 msec
>
> Chantal,
>
> you might have reached the limit of what Postgres (or any other
> database?) can do for you with these data structures. Time for
> something completely different: Try calculating the counts in
> advance.
>
> CREATE TABLE occ_stat (
> did INT NOT NULL,
> gid INT NOT NULL,
> cnt INT NOT NULL
> ) WITHOUT OIDS;
>
> CREATE INDEX occ_stat_dg ON occ_stat(did, gid);
> CREATE INDEX occ_stat_gd ON occ_stat(gid, did);
>
> There is *no* UNIQUE constraint on (did, gid). You get the numbers
> you're after by
> SELECT did, sum(cnt) AS cnt
> FROM occ_stat
> WHERE gid = 'whatever'
> GROUP BY did
> ORDER BY cnt DESC;
>
> occ_stat is initially loaded by
>
> INSERT INTO occ_stat
> SELECT did, gid, count(*)
> FROM g_o INNER JOIN d_o ON (g_o.sid = d_o.sid)
> GROUP BY did, gid;
>
> Doing it in chunks
> WHERE sid BETWEEN a::bigint AND b::bigint
> might be faster.
>
> You have to block any INSERT/UPDATE/DELETE activity on d_o and g_o
> while you do the initial load. If it takes too long, see below for
> how to do it in the background; hopefully the load task will catch up
> some day :-)
>
> Keeping occ_stat current:
>
> CREATE RULE d_o_i AS ON INSERT
> TO d_o DO (
> INSERT INTO occ_stat
> SELECT NEW.did, g_o.gid, 1
> FROM g_o
> WHERE g_o.sid = NEW.sid);
>
> CREATE RULE d_o_d AS ON DELETE
> TO d_o DO (
> INSERT INTO occ_stat
> SELECT OLD.did, g_o.gid, -1
> FROM g_o
> WHERE g_o.sid = OLD.sid);
>
> On UPDATE do both. Create a set of similar rules for g_o.
>
> These rules will create a lot of duplicates on (did, gid) in occ_stat.
> Updating existing rows and inserting only new combinations might seem
> obvious, but this method has concurrency problems (cf. the thread
> "Hard problem with concurrency" on -hackers). So occ_stat calls for
> reorganisation from time to time:
>
> BEGIN;
> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> CREATE TEMP TABLE t (did INT, gid INT, cnt INT) WITHOUT OIDS;
>
> INSERT INTO t
> SELECT did, gid, sum(cnt)
> FROM occ_stat
> GROUP BY did, gid
> HAVING count(*) > 1;
>
> DELETE FROM occ_stat
> WHERE t.did = occ_stat.did
> AND t.gid = occ_stat.gid;
>
> INSERT INTO occ_stat SELECT * FROM t;
>
> DROP TABLE t;
> COMMIT;
> VACUUM ANALYZE occ_stat; -- very important!!
>
> Now this should work, but the rules could kill INSERT/UPDATE/DELETE
> performance. Depending on your rate of modifications you might be
> forced to push the statistics calculation to the background.
>
> CREATE TABLE d_o_change (
> sid BIGINT NOT NULL,
> did INT NOT NULL,
> cnt INT NOT NULL
> ) WITHOUT OIDS;
>
> ... ON INSERT TO d_o DO (
> INSERT INTO d_o_change VALUES (NEW.sid, NEW.did, 1));
>
> ... ON DELETE TO d_o DO (
> INSERT INTO d_o_change VALUES (OLD.sid, OLD.did, -1));
>
> ... ON UPDATE TO d_o
> WHERE OLD.sid != NEW.sid OR OLD.did != NEW.did
> DO both
>
> And the same for g_o.
>
> You need a task that periodically scans [dg]_o_change and does ...
>
> BEGIN;
> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT <any row (or some rows) from x_o_change>;
> INSERT INTO occ_stat <see above>;
> DELETE <the selected row(s) from x_o_change>;
> COMMIT;
>
> Don't forget to VACUUM!
>
> If you invest a little more work, I guess you can combine the
> reorganisation into the loader task ...
>
> I have no idea whether this approach is better than what you have now.
> With a high INSERT/UPDATE/DELETE rate it may lead to a complete
> performance disaster. You have to try ...
>
> Servus
> Manfred
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jakab Laszlo 2003-02-21 07:48:16 performance issues for processing more then 150000 records / day.
Previous Message Andrew Sullivan 2003-02-20 23:49:28 Re: Tuning scenarios (was Changing the default