Skip site navigation (1) Skip section navigation (2)

Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct

From: Casey Allen Shobe <casey(dot)shobe(at)messagesystems(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Date: 2012-01-12 18:03:25
Message-ID: 20120112130325.00fc8782@cshobe.int.messagesystems.com (view raw or flat)
Thread:
Lists: pgsql-bugs
On Wed, 11 Jan 2012 19:40:34 -0500, Andres wrote:
> Unfortunately the pastebins where raptelan provided plans expired by
> now... Perhaps he can provide them again?

Sure, the original curiosity I noticed was that adjusting the block
size of results returned by the CTE had widely different effects, 5000
seemed to be some sort of "magic number", while either 6000 or 4000
worked poorly.  Originally, our design used blocks of 100,000.

I then noticed a regression with the 5,000 block size.  The difference
in the queries between a prototype (that was pretty fast (5-10s) and
what was generated (slow, (5m+)) was that the order of columns in one
of the join conditions in the main query was reversed.  ON
source.column = joined.column was slow, while ON joined.column =
source.column was fast.  Apparently this is enough to get a different
possible plan to hit the planner first, while another slow plan with
nearly identical estimates is hitting the planner first in other cases.

Attached are three files - one shows the fast plan, another the slow
plan, and another with the query in question.  The ON clause where
reversal makes a difference is the td_13 one.  I use a different range
in the CTE for both queries as otherwise filesystem cache makes the
timings look better, but in both cases, the CTE returns exactly 5,000
results.

Yes, the query could be written a lot better, but currently it's
generated this way to conform to an expectation of user-defined custom
where clauses that do not qualify column names.  Breaking that
compatibility and redoing this better is a longer-term plan.

Please let me know if I'm omitting any important details.

Regards,
-- 
Casey Allen Shobe | Senior Software Engineer/DBA
Message Systems | http://messagesystems.com
casey(dot)shobe(at)messagesystems(dot)com | 443-656-3311 x248

Attachment: 230final.sql
Description: text/x-sql (3.9 KB)

In response to

pgsql-bugs by date

Next:From: Tom LaneDate: 2012-01-12 18:19:14
Subject: Re: BUG #6393: cluster sometime fail under heavy concurrent write load
Previous:From: Tom LaneDate: 2012-01-12 17:24:28
Subject: Re: BUG #6393: cluster sometime fail under heavy concurrent write load

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group