Re: planner missing a trick for foreign tables w/OR conditions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Ridge <e_ridge(at)tcdi(dot)com>
Subject: Re: planner missing a trick for foreign tables w/OR conditions
Date: 2013-12-17 17:28:33
Message-ID: 12558.1387301313@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Dec 16, 2013 at 6:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The hard part is not extracting the partial qual. The hard part is
>> trying to make sure that adding this entirely-redundant scan qual doesn't
>> catastrophically degrade join size estimates.

> OK, I had a feeling that's where the problem was likely to be. Do you
> have any thoughts about a more principled way of solving this problem?
> I mean, off-hand, it's not clear to me that the comments about this
> being a MAJOR HACK aren't overstated.

Well, the business about injecting the correction by adjusting a cached
selectivity is certainly a hack, but it's not one that I think is urgent
to get rid of; I don't foresee anything that's likely to break it soon.

> I might be missing something, but I suspect it works fine if every
> path for the relation is generating the same rows.

I had been thinking it would fall down if there are several OR conditions
affecting different collections of rels, but after going through the math
again, I'm now thinking I was wrong and it does in fact work out. As you
say, we do depend on all paths generating the same rows, but since the
extracted single-rel quals are inserted as plain baserestrictinfo quals,
that'll be true.

A bigger potential objection is that we're opening ourselves to larger
problems with estimation failures due to correlated qual conditions, but
again I'm finding that the math doesn't bear that out. It's reasonable
to assume that our estimate for the extracted qual will be better than
our estimate for the OR as a whole, so our adjusted size estimates for
the filtered base relations are probably OK. And the adjustment to the
OR clause selectivity means that the size estimate for the join comes
out exactly the same. We'll actually be better off than with what is
likely to happen now, which is that people manually extract the simplified
condition and insert it into the query explicitly. PG doesn't realize
that that's redundant and so will underestimate the join size.

So at this point I'm pretty much talked into it. We could eliminate the
dependence on indexes entirely, and replace this code with a step that
simply tries to pull single-base-relation quals out of ORs wherever it can
find one. You could argue that the produced quals would sometimes not be
worth testing for, but we could apply a heuristic that says to forget it
unless the estimated selectivity of the extracted qual is less than,
I dunno, 0.5 maybe. (I wonder if it'd be worth inserting a check that
there's not already a manually-generated equivalent clause, too ...)

A very nice thing about this is we could do this step ahead of relation
size estimate setting and thus remove the redundant work that currently
happens in set_plain_rel_size when the optimization fires. Which is
another aspect of the current code that's a hack, so getting rid of it
would be a net reduction in hackiness.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-12-17 17:33:01 Re: patch: make_timestamp function
Previous Message Dimitri Fontaine 2013-12-17 17:17:19 Re: Extension Templates S03E11