Re: [HACKERS] <> join selectivity estimate question

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] <> join selectivity estimate question
Date: 2017-12-01 00:16:41
Message-ID: CAEepm=1J7F+VpSViTLs_Te+Fuzrg3qJK6osEABd=2yMMHF-JMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 1, 2017 at 10:41 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Fri, Dec 1, 2017 at 4:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Hmm, do you have an example of the better but still-funky estimates
>> handy? Like an EXPLAIN plan?
>
> Sure. Here's some EXPLAIN ANALYZE output from scale 3 TPCH + a few
> indexes[1]. There's a version from HEAD with and without commit
> 7ca25b7d.

So, in that plan we saw anti-join estimate 1 row but really there were
13462. If you remove most of Q21 and keep just the anti-join between
l1 and l3, then you try removing different quals, you can see the the
problem is not the <> qual:

select count(*)
from lineitem l1
where not exists (
select *
from lineitem l3
where l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
=> estimate=1 actual=8998304

select count(*)
from lineitem l1
where not exists (
select *
from lineitem l3
where l3.l_suppkey <> l1.l_suppkey
)
=> estimate=1 actual=0

select count(*)
from lineitem l1
where not exists (
select *
from lineitem l3
where l3.l_orderkey = l1.l_orderkey
)
=> estimate=1 actual=0

select count(*)
from lineitem l1
where not exists (
select *
from lineitem l3
where l3.l_orderkey = l1.l_orderkey
and l3.l_receiptdate > l3.l_commitdate
)
=> estimate=1 actual=294884

The = and <> quals see to be estimated well, but when that filter on
l3 is added we go off the rails. It removes about 37% of the rows in
l3, and means that we sometimes don't find a match, so the anti-join
produces some rows.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-12-01 00:33:04 Re: Combine function returning NULL unhandled?
Previous Message Michael Paquier 2017-11-30 22:59:26 Re: Commit fest 2017-11