From: Bradley Baetz <bbaetz(at)student(dot)usyd(dot)edu(dot)au>
To: pgsql-sql(at)postgresql(dot)org
Subject:
Date: 2002-02-23 10:21:07
Message-ID: Pine.LNX.4.44.0202232045320.2350-100000@tomato.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

I'm having some problems with query optimisation, using postgresql 7.2.

Basically, I'm hacking on porting bugzilla to pgsql, and I'm trying
various queries on a very small subset of the schema:

CREATE TABLE bugs (
bug_id integer NOT NULL auto_increment,
assigned_to integer NOT NULL,
reporter integer NOT NULL,
PRIMARY KEY (bug_id)
);
CREATE INDEX bugs_assigned_to_idx ON bugs(assigned_to);
CREATE INDEX bugs_reporter_idx ON bugs(reporter);

CREATE TABLE cc (
bug_id integer NOT NULL,
who integer NOT NULL
);
CREATE INDEX cc_who_idx ON cc(who);
CREATE UNIQUE INDEX cc_bug_id_who_idx ON cc(bug_id,who);

I've used a perl script to insert 25000 users, 10000 bugs, and 30000 cc
entries randomly, and I then ran ANALYZE on the database. The script just
uses rand, (plus some hashes to check that uniqueness still hold) so the
data should be fairly evenly distributed.

Now I want to search for all bugs which user #86 is ccd on:

bugs=> explain analyze SELECT bugs.bug_id FROM bugs LEFT JOIN cc ON
bugs.bug_id = cc.bug_id WHERE cc.who=86;
NOTICE: QUERY PLAN:

Merge Join (cost=0.00..1817.34 rows=10030 width=12) (actual
time=1516.75..2951.11 rows=1 loops=1)
-> Index Scan using bugs_pkey on bugs (cost=0.00..201.30 rows=10030
width=4) (actual time=0.33..135.67 rows=10000 loops=1)
-> Index Scan using cc_bug_id_who_idx on cc (cost=0.00..1065.97
rows=30000 width=8) (actual time=0.37..2522.77 rows=30000 loops=1)
Total runtime: 2951.37 msec

EXPLAIN
bugs=>

However, if I disable merge joins:

bugs=> set enable_mergejoin=0;
SET VARIABLE
bugs=> explain analyze SELECT bugs.bug_id FROM bugs LEFT JOIN cc ON
bugs.bug_id = cc.bug_id WHERE cc.who=86;
NOTICE: QUERY PLAN:

Hash Join (cost=1212.86..3288.98 rows=10030 width=12) (actual
time=1024.50..1237.59 rows=1 loops=1)
-> Seq Scan on bugs (cost=0.00..159.30 rows=10030 width=4) (actual
time=0.16..70.25 rows=10000 loops=1)
-> Hash (cost=463.00..463.00 rows=30000 width=8) (actual
time=284.51..284.51 rows=0 loops=1)
-> Seq Scan on cc (cost=0.00..463.00 rows=30000 width=8) (actual
time=0.13..159.54 rows=30000 loops=1)
Total runtime: 1237.78 msec

EXPLAIN
bugs=>

Then the time taken more than halves - 2951ms to 1237ms. Is this a bug in
the optimiser?

Also, in this case it would be better to use an inner join rather than a
left join. Since there is a condition on cc.who which won't match NULL
values from the cc table, an inner join should give the same results,
shouldn't it? Using an inner join makes the query take 0.98msec, so it
would be good if postgres could do that optimisation automatically,
assuming that it is valid. (This particular query is generated from a perl
script, and some cases do need the left join. It should probably be fixed
on that side, but it would be nice if pg could do it automatically.)

Thanks,

Bradley

Responses

  • Re: at 2002-02-25 16:48:22 from Tom Lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kumar Khare 2002-02-23 13:30:36 Implementation Proposal For Add Free Behind Capability For Large Sequential Scan
Previous Message Mike Mascari 2002-02-23 10:03:39 Re: elog() proposal

Browse pgsql-sql by date

  From Date Subject
Next Message Peter Eisentraut 2002-02-23 17:21:19 Re: should I use postgresql arrays...
Previous Message Masaru Sugawara 2002-02-23 08:41:44 Re: Finding matching columns from two tables