Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Astapov <dastapov(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date: 2021-05-12 15:54:06
Message-ID: 1099349.1620834846@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dmitry Astapov <dastapov(at)gmail(dot)com> writes:
> I am trying to understand the behaviour of the query planner regarding the
> push-down of the conditions "through" the join.

I think your mental model is wrong. What's actually happening here is
that the planner uses equivalence classes to deduce implied conditions.
That is, we have the join condition a.adate = b.bdate and then you've
added the where condition a.adate = '2021-05-12'. Transitivity implies
that b.bdate = '2021-05-12', so we deduce that condition and are able
to apply it at the relation scan of b. Furthermore, having restricted
both a.adate and b.bdate to the same constant value at the scan level,
we no longer need to apply the join condition a.adate = b.bdate at all.
This is important not only to avoid the (probably minor) inefficiency
of rechecking the join condition, but because if we believed that all
three conditions were independently applicable, we'd come out with a
serious underestimate of the size of the join result.

> In my experiments, I was never able to get an execution plan that "pushes
> down" any condition apart from (=) through to the right side of the join,

None of the argument sketched above works for non-equality conditions.
There are some situations where you could probably figure out how to
use transitivity to deduce some implied condition, but cleaning things
up so that you don't have redundant conditions fouling up the join
size estimates seems like a hard problem.

Another issue is that we could easily expend a lot of cycles on deductions
that lead nowhere, because once you try to open up the mechanism to
consider operators other than equality, there will be a lot of things that
it looks at and then fails to do anything with. The equivalence class
mechanism is tied into the same logic that considers merge and hash joins,
so we are expending lots of cycles anytime we see an equality operator,
and not so much for other operators.

> Equally surprising is that I was unable to find documentation or past
> mailing list discussions of this or similar topic, which leads me to
> believe that I am just not familiar with the proper terminology and can't
> come up with the right search terms.

src/backend/optimizer/README has a discussion of equivalence classes.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-05-12 15:59:18 Re: Granting control of SUSET gucs to non-superusers
Previous Message Aleksander Alekseev 2021-05-12 15:25:14 Re: Extending amcheck to check toast size and compression