Re: 200 = 199 + 1?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 200 = 199 + 1?
Date: 2017-10-01 21:27:53
Message-ID: 29652.1506893273@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I experimented a bit with the attached patch, which modifies
> eqjoinsel_semi in two ways. First, if the number-of-distinct-values
> estimate for the inner rel is just a default rather than having any
> real basis, it replaces it with inner_rel->rows, effectively assuming
> that the inside of the IN or EXISTS is unique.

That might or might not be a good idea ...

> Second, it drops the
> fallback to selectivity 0.5 altogether, just applying the nd1 vs nd2
> heuristic all the time.

... but this probably isn't. A review of the history shows me that
this change amounts to reverting 3f5d2fe30, which doesn't seem like
a good plan because that was fixing user-reported misbehavior.
The problem is still that the nd2/nd1 calculation can produce garbage
if nd2 or nd1 is made-up. It's possible that we can get away with using
nd2 = inner_rel->rows as a suitable substitute for a default nd2, but
I'm much less happy to assume something like that for nd1.

Now, there's still not much to defend the 0.5 selectivity in particular;
according to the commit log for 3f5d2fe30, I used that because it
reproduced the behavior of previous versions that didn't understand what
a semijoin was at all. So we could perhaps substitute some other rule
there, but I don't know what.

I also note that the precise behavior of HEAD in this area only dates
back to ca5f88502, which hasn't even shipped in a release yet, so it
surely doesn't have a lot of field experience justifying it.

Other useful-though-not-terribly-recent discussions in this area include

https://www.postgresql.org/message-id/flat/201201112110.40403.andres%40anarazel.de

https://www.postgresql.org/message-id/13290.1335976455@sss.pgh.pa.us

Given that eqjoinsel_semi has been more or less a constant source of
issues, maybe we need to think about a wholesale redesign rather than
just incremental tweaking. But I have few ideas about what would be
better.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-10-01 21:47:52 Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.
Previous Message Daniel Gustafsson 2017-10-01 21:27:30 Re: [PATCH] Off-by-one error in logical slot resource retention