Re: Query optimization problem

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Zotov <zotov(at)oe-it(dot)ru>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query optimization problem
Date: 2010-07-15 07:58:50
Message-ID: 4C3EBFBA.1040408@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Zotov,

Somehow the equivalence d2.basedon=d1.id is not used in the slow query,
probably because the equivalence constant value would be used inside a
not-base expression (the OR). You can see that the equivalence values
*are* used by changing the or to an and and compare both queries. The
only thing you can do to guarantee the planner has all information to in
cases like this it explicitly name the equivalence inside OR
expressions, e.g.

SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d1.ID=234409763 and d2.basedon=234409763) or (d2.ID=234409763) ;

regards,
Yeb Havinga

PS: the analyze time of the slow query showed 0.007ms?

Zotov wrote:
> I have a query:
>
> SELECT d1.ID, d2.ID
> FROM DocPrimary d1
> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>
> i think what QO(Query Optimizer) can make it faster (now it seq scan
> and on
> million records works 7 sec)
>
> SELECT d1.ID, d2.ID
> FROM DocPrimary d1
> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
> WHERE (d2.BasedOn=234409763) or (d2.ID=234409763)
>
>
> ----------------------
> Slow Query
> ----------------------
> test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT
> d1.ID,
> d2.ID
> test-# FROM DocPrimary d1
> test-# JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
> test-# WHERE (d1.ID=234409763) or (d2.ID=234409763);
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------
>
> Hash Join (cost=58.15..132.35 rows=2 width=8) (actual
> time=0.007..0.007
> rows=0 loops=1)
> Output: d1.id, d2.id
> Hash Cond: (d2.basedon = d1.id)
> Join Filter: ((d1.id = 234409763) OR (d2.id = 234409763))
> -> Seq Scan on public.docprimary d2 (cost=0.00..31.40 rows=2140
> width=8) (actual time=0.002..0.002 rows=0 loops=1)
> Output: d2.id, d2.basedon
> -> Hash (cost=31.40..31.40 rows=2140 width=4) (never executed)
> Output: d1.id
> -> Seq Scan on public.docprimary d1 (cost=0.00..31.40
> rows=2140
> width=4) (never executed)
> Output: d1.id
>
> ------------------
> Fast Query
> ------------------
> test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT
> d1.ID,
> d2.ID
> test-# FROM DocPrimary d1
> test-# JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
> test-# WHERE (d2.BasedOn=234409763) or (d2.ID=234409763);
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Nested Loop (cost=8.60..58.67 rows=12 width=8) (actual
> time=0.026..0.026
> rows=0 loops=1)
> Output: d1.id, d2.id
> -> Bitmap Heap Scan on public.docprimary d2 (cost=8.60..19.31
> rows=12
> width=8) (actual time=0.023..0.023 rows=0 loops=1)
> Output: d2.id, d2.basedon
> Recheck Cond: ((d2.basedon = 234409763) OR (d2.id =
> 234409763))
> -> BitmapOr (cost=8.60..8.60 rows=12 width=0) (actual
> time=0.018..0.018 rows=0 loops=1)
> -> Bitmap Index Scan on basedon_idx (cost=0.00..4.33
> rows=11 width=0) (actual time=0.008..0.008 rows=0 loops=1)
> Index Cond: (d2.basedon = 234409763)
> -> Bitmap Index Scan on id_pk (cost=0.00..4.26 rows=1
> width=0) (actual time=0.003..0.003 rows=0 loops=1)
> Index Cond: (d2.id = 234409763)
> -> Index Scan using id_pk on public.docprimary d1
> (cost=0.00..3.27
> rows=1 width=4) (never executed)
> Output: d1.id, d1.basedon
> Index Cond: (d1.id = d2.basedon)
>
>
> --------------------------------------------
> PGver: PostgreSQL 9.0b x86
> OS: Win7 x64
>
> ---------------------
> Create table query:
> ---------------------
>
> CREATE TABLE docprimary
> (
> id integer NOT NULL,
> basedon integer,
> CONSTRAINT id_pk PRIMARY KEY (id)
> );
> CREATE INDEX basedon_idx
> ON docprimary
> USING btree
> (basedon);
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Patrick Donlin 2010-07-15 14:41:22 Identical query slower on 8.4 vs 8.3
Previous Message Ryan Wexler 2010-07-15 06:18:10 Re: performance on new linux box