Skip site navigation (1) Skip section navigation (2)

Re: cost and actual time

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>,"Chantal Ackermann" <chantal(dot)ackermann(at)biomax(dot)de>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>,<pgsql-performance(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: cost and actual time
Date: 2003-02-21 01:45:24
Message-ID: 07e301c2d94a$e6eec450$6500a8c0@fhp.internal (view raw or flat)
Thread:
Lists: pgsql-performance
I nominate Manfred for support response award of the week!

Chris

----- Original Message -----
From: "Manfred Koizar" <mkoi-pg(at)aon(dot)at>
To: "Chantal Ackermann" <chantal(dot)ackermann(at)biomax(dot)de>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>; <pgsql-performance(at)postgresql(dot)org>;
<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: Thursday, February 20, 2003 6:00 PM
Subject: Re: [PERFORM] cost and actual time


> On Wed, 19 Feb 2003 10:38:54 +0100, Chantal Ackermann
> <chantal(dot)ackermann(at)biomax(dot)de> wrote:
> >Nested Loop: 53508.86 msec
> >Merge Join: 113066.81 msec
> >Hash Join:  439344.44 msec
>
> Chantal,
>
> you might have reached the limit of what Postgres (or any other
> database?) can do for you with these data structures.  Time for
> something completely different:  Try calculating the counts in
> advance.
>
>     CREATE TABLE occ_stat (
>         did INT NOT NULL,
>         gid INT NOT NULL,
>         cnt INT NOT NULL
>     ) WITHOUT OIDS;
>
>     CREATE INDEX occ_stat_dg ON occ_stat(did, gid);
>     CREATE INDEX occ_stat_gd ON occ_stat(gid, did);
>
> There is *no* UNIQUE constraint on (did, gid).  You get the numbers
> you're after by
>     SELECT did, sum(cnt) AS cnt
>       FROM occ_stat
>      WHERE gid = 'whatever'
>      GROUP BY did
>      ORDER BY cnt DESC;
>
> occ_stat is initially loaded by
>
>     INSERT INTO occ_stat
>     SELECT did, gid, count(*)
>       FROM g_o INNER JOIN d_o ON (g_o.sid = d_o.sid)
>      GROUP BY did, gid;
>
> Doing it in chunks
>     WHERE sid BETWEEN a::bigint AND b::bigint
> might be faster.
>
> You have to block any INSERT/UPDATE/DELETE activity on d_o and g_o
> while you do the initial load.  If it takes too long, see below for
> how to do it in the background; hopefully the load task will catch up
> some day :-)
>
> Keeping occ_stat current:
>
>     CREATE RULE d_o_i AS ON INSERT
>         TO d_o DO (
>             INSERT INTO occ_stat
>             SELECT NEW.did, g_o.gid, 1
>               FROM g_o
>              WHERE g_o.sid = NEW.sid);
>
>     CREATE RULE d_o_d AS ON DELETE
>         TO d_o DO (
>             INSERT INTO occ_stat
>             SELECT OLD.did, g_o.gid, -1
>               FROM g_o
>              WHERE g_o.sid = OLD.sid);
>
> On UPDATE do both.  Create a set of similar rules for g_o.
>
> These rules will create a lot of duplicates on (did, gid) in occ_stat.
> Updating existing rows and inserting only new combinations might seem
> obvious, but this method has concurrency problems (cf. the thread
> "Hard problem with concurrency" on -hackers).  So occ_stat calls for
> reorganisation from time to time:
>
>     BEGIN;
>     SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>     CREATE TEMP TABLE t (did INT, gid INT, cnt INT) WITHOUT OIDS;
>
>     INSERT INTO t
>     SELECT did, gid, sum(cnt)
>       FROM occ_stat
>      GROUP BY did, gid
>     HAVING count(*) > 1;
>
>     DELETE FROM occ_stat
>      WHERE t.did = occ_stat.did
>        AND t.gid = occ_stat.gid;
>
>     INSERT INTO occ_stat SELECT * FROM t;
>
>     DROP TABLE t;
>     COMMIT;
>     VACUUM ANALYZE occ_stat;  -- very important!!
>
> Now this should work, but the rules could kill INSERT/UPDATE/DELETE
> performance.  Depending on your rate of modifications you might be
> forced to push the statistics calculation to the background.
>
>     CREATE TABLE d_o_change (
>         sid BIGINT NOT NULL,
>         did INT NOT NULL,
>         cnt INT NOT NULL
>     ) WITHOUT OIDS;
>
>     ... ON INSERT TO d_o DO (
>         INSERT INTO d_o_change VALUES (NEW.sid, NEW.did, 1));
>
>     ... ON DELETE TO d_o DO (
>         INSERT INTO d_o_change VALUES (OLD.sid, OLD.did, -1));
>
>     ... ON UPDATE TO d_o
>         WHERE OLD.sid != NEW.sid OR OLD.did != NEW.did
>         DO both
>
> And the same for g_o.
>
> You need a task that periodically scans [dg]_o_change and does ...
>
>     BEGIN;
>     SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
>     SELECT <any row (or some rows) from x_o_change>;
>     INSERT INTO occ_stat <see above>;
>     DELETE <the selected row(s) from x_o_change>;
>     COMMIT;
>
> Don't forget to VACUUM!
>
> If you invest a little more work, I guess you can combine the
> reorganisation into the loader task ...
>
> I have no idea whether this approach is better than what you have now.
> With a high INSERT/UPDATE/DELETE rate it may lead to a complete
> performance disaster.  You have to try ...
>
> Servus
>  Manfred
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>


In response to

pgsql-performance by date

Next:From: Jakab LaszloDate: 2003-02-21 07:48:16
Subject: performance issues for processing more then 150000 records / day.
Previous:From: Andrew SullivanDate: 2003-02-20 23:49:28
Subject: Re: Tuning scenarios (was Changing the default

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group