Re: planner missing a trick for foreign tables w/OR conditions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Eric Ridge <e_ridge(at)tcdi(dot)com>
Subject: Re: planner missing a trick for foreign tables w/OR conditions
Date: 2013-12-17 18:04:27
Message-ID: CA+Tgmoa_bhRSVxAnu-rv1FK_0BtDB-GwDXwU6epzArFVpHABaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 17, 2013 at 12:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I had been thinking it would fall down if there are several OR conditions
> affecting different collections of rels, but after going through the math
> again, I'm now thinking I was wrong and it does in fact work out. As you
> say, we do depend on all paths generating the same rows, but since the
> extracted single-rel quals are inserted as plain baserestrictinfo quals,
> that'll be true.

OK.

> A bigger potential objection is that we're opening ourselves to larger
> problems with estimation failures due to correlated qual conditions, but
> again I'm finding that the math doesn't bear that out. It's reasonable
> to assume that our estimate for the extracted qual will be better than
> our estimate for the OR as a whole, so our adjusted size estimates for
> the filtered base relations are probably OK. And the adjustment to the
> OR clause selectivity means that the size estimate for the join comes
> out exactly the same. We'll actually be better off than with what is
> likely to happen now, which is that people manually extract the simplified
> condition and insert it into the query explicitly. PG doesn't realize
> that that's redundant and so will underestimate the join size.

I had not thought of that, but it seems like a valid point.

> So at this point I'm pretty much talked into it. We could eliminate the
> dependence on indexes entirely, and replace this code with a step that
> simply tries to pull single-base-relation quals out of ORs wherever it can
> find one. You could argue that the produced quals would sometimes not be
> worth testing for, but we could apply a heuristic that says to forget it
> unless the estimated selectivity of the extracted qual is less than,
> I dunno, 0.5 maybe.

This is an area where I think things are very different from local and
remote tables. For a local table, the cost of transmitting a row from
one plan node to another is basically zero. For a remote table, it's
potentially far higher, although unfortunately it's hard to know
exactly. But if the qual is cheap to evaluate, and we're getting back
a lot of rows, I suspect even eliminating 5-10% of them could work out
to a win. With a local table, 50% sounds like a reasonable number.

Another point to ponder is that there could be places where this
actually changes the plan significantly for the better. Pre-filtering
one side of a join might, for example, reduce the amount of data on
one side to the point where a hash join is chosen over some other
strategy. I don't know that this will actually help all that many
people but the best case is pretty dramatic for those it does help:
the partial qual might be almost as selective as the join condition
itself.

> (I wonder if it'd be worth inserting a check that
> there's not already a manually-generated equivalent clause, too ...)

Sounds a little too clever IMHO.

> A very nice thing about this is we could do this step ahead of relation
> size estimate setting and thus remove the redundant work that currently
> happens in set_plain_rel_size when the optimization fires. Which is
> another aspect of the current code that's a hack, so getting rid of it
> would be a net reduction in hackiness.

I'm not sure that would save anything measurable performance-wise, but
the hackiness reduction would be nice to have.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-12-17 18:13:55 Re: Extension Templates S03E11
Previous Message Tom Lane 2013-12-17 17:54:40 Re: Extension Templates S03E11