Re: CLUSTER and a problem

From: Andrzej Zawadzki <zawadaa(at)wp(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: CLUSTER and a problem
Date: 2009-09-15 20:10:49
Message-ID: 4AAFF4C9.5050903@wp.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andrzej Zawadzki wrote:
> Tom Lane wrote:
>
>> Andrzej Zawadzki <zawadaa(at)wp(dot)pl> writes:
>>
>>
>>> # EXPLAIN ANALYZE SElect telekredytid from kredytyag
>>> WHERE TRUE
>>> AND kredytyag.id = 3064776
>>> AND NOT EXISTS
>>> (SELECT 1 FROM
>>> ( SELECT * FROM kredyty kr
>>> where telekredytid = 328650
>>> ORDER BY kr.datazaw DESC LIMIT 1 )
>>> kred where kred.bank = 2);
>>>
>>>
>> So this is the slow bit:
>>
>>
>>
>>> -> Subquery Scan kred (cost=0.00..778.06 rows=1 width=0) (actual
>>> time=2045556.496..2045556.496 rows=0 loops=1)
>>> Filter: (kred.bank = 2)
>>> -> Limit (cost=0.00..778.05 rows=1 width=3873) (actual
>>> time=2045556.492..2045556.492 rows=0 loops=1)
>>> -> Index Scan Backward using kredyty_datazaw on
>>> kredyty kr (cost=0.00..1088490.39 rows=1399 width=3873) (actual
>>> time=2045556.487..2045556.487 rows=0 loops=1)
>>> Filter: (telekredytid = 328650)
>>>
>>>
>> It's doing a scan in descending datazaw order and hoping to find a row
>> that has both telekredytid = 328650 and bank = 2. Evidently there isn't
>> one, so the indexscan runs clear to the end before it can report that the
>> NOT EXISTS is true. Unfortunately, you've more or less forced this
>> inefficient query plan by wrapping some of the search conditions inside a
>> LIMIT and some outside. Try phrasing the NOT EXISTS query differently.
>> Or, if you do this type of query a lot, a special-purpose index might be
>> worthwhile. It would probably be fast as-is if you had an index on
>> (telekredytid, datazaw) (in that order).
>>
>>
> That's no problem - we already has changed this query:
> SELECT * FROM kredyty kr
> where kr.telekredytid = 328652
> and kr.bank = 2
> AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2
> and k2.creationdate > kr.creationdate)
> Works good.
>
> But in fact this wasn't my point.
> My point was: why operation CLUSTER has such a big and bad impact on
> planer for this query?
> Like I sad: before CLUSTER query was run in xx milliseconds :-)
>
OK I've got it :-)
I've prepared test database (on fast disks - CLUSTER took 2h anyway ;-)

Step 1:
qstest=# CREATE UNIQUE INDEX kredyty_desc_pkey ON kredyty using btree
(id desc);
CREATE
INDEX
Step 2:
qstest=# CLUSTER kredyty USING kredyty_desc_pkey;
CLUSTER
Step 3:
qstest=# ANALYZE kredyty;
ANALYZE
Step 4:
qstest=# EXPLAIN ANALYZE SELECT telekredytid FROM kredytyag
WHERE TRUE
AND kredytyag.id = 3064776
AND NOT EXISTS
(
SELECT 1 FROM
(
SELECT * FROM kredyty kr
where telekredytid = 328652
ORDER BY kr.datazaw DESC LIMIT 1
)
kred where kred.bank = 2)
;

QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Result (cost=833.09..841.38 rows=1 width=4) (actual
time=70.050..70.050 rows=0
loops=1)

One-Time Filter: (NOT
$0)


InitPlan

-> Subquery Scan kred (cost=833.07..833.09 rows=1 width=0)
(actual time=48.223..48.223 rows=0
loops=1)
Filter: (kred.bank =
2)

-> Limit (cost=833.07..833.08 rows=1 width=3975) (actual
time=48.206..48.206 rows=0
loops=1)
-> Sort (cost=833.07..835.66 rows=1035 width=3975)
(actual time=48.190..48.190 rows=0
loops=1)
Sort Key: kr.datazaw
Sort Method: quicksort Memory: 25kB
-> Index Scan using kredyty_telekredytid_idx on
kredyty kr (cost=0.00..827.90 rows=1035 width=3975) (actual
time=48.163..48.163 rows=0 loops=1)
Index Cond: (telekredytid = 328652)
-> Index Scan using kredytyag_pkey on kredytyag (cost=0.00..8.29
rows=1 width=4) (actual time=21.798..21.798 rows=0 loops=1)
Index Cond: (id = 3064776)
Total runtime: 70.550 ms
(14 rows)

qstest=#

So, I was close - bad index... DESCending is much better.
Thanks to Grzegorz Ja\skiewicz hi has strengthened me in the conjecture.

I'm posting this - maybe someone will find something useful in that case.

ps. query was and is good :-)

--
Andrzej Zawadzki

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Grzegorz Jaśkiewicz 2009-09-16 09:34:42 Re: CLUSTER and a problem
Previous Message Alan McKay 2009-09-15 18:10:52 statement stats extra load?