Quick Links

Re: <> join selectivity estimate question

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: <> join selectivity estimate question
Date:	2017-03-17 16:14:29
Message-ID:	CA+TgmoYQXgkxcH0jAcaUC6dtqLz5PsV5JT8bp6Oh4XPPsOTzPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 17, 2017 at 1:54 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> SELECT *
> FROM lineitem l1
> WHERE EXISTS (SELECT *
> FROM lineitem l2
> WHERE l1.l_orderkey = l2.l_orderkey);
>
> -> estimates 59986012 rows, actual rows 59,986,052 (scale 10 TPCH)
>
> SELECT *
> FROM lineitem l1
> WHERE EXISTS (SELECT *
> FROM lineitem l2
> WHERE l1.l_orderkey = l2.l_orderkey
> AND l1.l_suppkey <> l2.l_suppkey);
>
> -> estimates 1 row, actual rows 57,842,090 (scale 10 TPCH)

The relevant code is in neqsel(). It estimates the fraction of rows
that will be equal, and then does 1 - that number. Evidently, the
query planner thinks that l1.l_suppkey = l2.l_suppkey would almost
always be true, and therefore l1.l_suppkey <> l2.l_suppkey will almost
always be false. I think the presumed selectivity of l1.l_suppkey =
l2.l_suppkey is being computed by var_eq_non_const(), but I'm a little
puzzled by that function is managing to produce a selectivity estimate
of, essentially, 1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

<> join selectivity estimate question at 2017-03-17 05:54:46 from Thomas Munro

Responses

Re: <> join selectivity estimate question at 2017-03-17 16:59:40 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Osahon Oduware	2017-03-17 16:22:05	Re: QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions
Previous Message	Giuseppe Broccolo	2017-03-17 16:13:33	Re: QGIS Seem To Bypass PostgreSQL/PostGIS User Privileges/Permissions