| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: <> join selectivity estimate question |
| Date: | 2017-03-17 16:59:40 |
| Message-ID: | 14543.1489769980@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The relevant code is in neqsel(). It estimates the fraction of rows
> that will be equal, and then does 1 - that number. Evidently, the
> query planner thinks that l1.l_suppkey = l2.l_suppkey would almost
> always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost
> always be false. I think the presumed selectivity of l1.l_suppkey =
> l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little
> puzzled by that function is managing to produce a selectivity estimate
> of, essentially, 1.
No, I believe it's going through neqjoinsel and thence to eqjoinsel_semi.
This query will have been flattened into a semijoin.
I can reproduce a similarly bad estimate in the regression database:
regression=# explain select * from tenk1 a where exists(select * from tenk1 b where a.thousand = b.thousand and a.twothousand <> b.twothousand);
QUERY PLAN
-------------------------------------------------------------------------
Hash Semi Join (cost=583.00..1067.25 rows=1 width=244)
Hash Cond: (a.thousand = b.thousand)
Join Filter: (a.twothousand <> b.twothousand)
-> Seq Scan on tenk1 a (cost=0.00..458.00 rows=10000 width=244)
-> Hash (cost=458.00..458.00 rows=10000 width=8)
-> Seq Scan on tenk1 b (cost=0.00..458.00 rows=10000 width=8)
(6 rows)
The problem here appears to be that we don't have any MCV list for
the "twothousand" column (because it has a perfectly flat distribution),
and the heuristic that eqjoinsel_semi is using for the no-MCVs case
is falling down badly.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Banck | 2017-03-17 17:12:11 | Re: [patch] reorder tablespaces in basebackup tar stream for backup_label |
| Previous Message | Corey Huinker | 2017-03-17 16:56:00 | Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless) |