Re: <> join selectivity estimate question

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: <> join selectivity estimate question
Date: 2017-03-17 22:49:13
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Sat, Mar 18, 2017 at 6:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> After a bit more thought, it seems like the bug here is that "the
> fraction of the LHS that has a non-matching row" is not one minus
> "the fraction of the LHS that has a matching row". In fact, in
> this example, *all* LHS rows have both matching and non-matching
> RHS rows. So the problem is that neqjoinsel is doing something
> that's entirely insane for semijoin cases.
> It would not be too hard to convince me that neqjoinsel should
> simply return 1.0 for any semijoin/antijoin case, perhaps with
> some kind of discount for nullfrac. Whether or not there's an
> equal row, there's almost always going to be non-equal row(s).
> Maybe we can think of a better implementation but that seems
> like the zero-order approximation.

Right. If I temporarily hack neqjoinsel() thus:

result = 1.0 - result;
+ if (jointype == JOIN_SEMI)
+ result = 1.0;

... then I obtain sensible row estimates and the following speedups
for TPCH Q21:

8 workers = 8.3s -> 7.8s
7 workers = 8.2s -> 7.9s
6 workers = 8.5s -> 8.2s
5 workers = 8.9s -> 8.5s
4 workers = 9.5s -> 9.1s
3 workers = 39.7s -> 9.9s
2 workers = 36.9s -> 11.7s
1 worker = 38.2s -> 15.0s
0 workers = 47.9s -> 24.7s

The plan is similar to the good plan from before even at lower worker
counts, but slightly better because the aggregation has been pushed
under the Gather node. See attached.

Thomas Munro

Attachment Content-Type Size
hacked_q21_4workers.txt text/plain 5.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-03-17 22:59:51 Re: <> join selectivity estimate question
Previous Message Andres Freund 2017-03-17 22:44:44 Introduce expression initialization hook?