Re: <> join selectivity estimate question

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: <> join selectivity estimate question
Date: 2017-09-13 22:49:17
Message-ID: CAEepm=24be0mEWQeqhOnq0KjQOdTtYCNc29vQGA1ZyvG99_EpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> That just leaves the question of whether we should try to handle the
>> empty RHS and single-value RHS cases using statistics. My intuition
>> is that we shouldn't, but I'll be happy to change my intuition and
>> code that up if that is the feedback from planner gurus.
>
> Empty RHS can result from dummy relations also, which are produced by
> constraint exclusion, so may be that's an interesting case. Single
> value RHS may be interesting with partitioned table with all rows in a
> given partition end up with the same partition key value. But may be
> those are just different patches. I am not sure.

Can you elaborate on the constraint exclusion case? We don't care
about the selectivity of an excluded relation, do we?

Any other views on the empty and single value special cases, when
combined with [NOT] EXISTS (SELECT ... WHERE r.something <>
s.something)? Looking at this again, my feeling is that they're too
obscure to spend time on, but others may disagree.

>> Please find attached a new version, and a test script I used, which
>> shows a bunch of interesting cases. I'll add this to the commitfest.
>
> I added some "stable" tests to your patch taking inspiration from the
> test SQL file. I think those will be stable across machines and runs.
> Please let me know if those look good to you.

Hmm. But they show actual rows, not plan->plan_rows, and although the
former is interesting as a sanity check the latter is the thing under
test here. It seems like we don't have fine enough control of
EXPLAIN's output to show estimated rows but not cost. I suppose we
could try to capture EXPLAIN's output somehow (plpgsql dynamic
execution or spool output from psql?) and then pull out just the row
estimates, maybe with extra rounding to cope with instability.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-13 23:00:05 Re: <> join selectivity estimate question
Previous Message Henry 2017-09-13 22:46:29 Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?