Re: Reduce "Var IS [NOT] NULL" quals during constant folding

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Date: 2025-07-03 09:08:54
Message-ID: a5f93486-fcc8-45a5-a62e-86051fdd7142@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/7/2025 02:30, Richard Guo wrote:
> On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>> I apologise for the confusion in my previous message. I am not
>> suggesting that we postpone this. Instead, I would like an explanation
>> of why you believe that accessing the table statistics earlier could
>> negatively impact planner performance. As I mentioned before, I have
>> only envisioned rare instances where join eliminations may reduce the
>> number of relations and clause evaluations resulting in a constant.
>
> I wonder how you arrived at the conclusion that these cases are rare.
> If they truly are, then why have we invested so much effort in
> optimizing for them?
There is no direct connection between effort and frequency; it primarily
depends on personal desire. As you might find, much of the effort goes
into convincing the community.
These specific cases should be rare from the Postgres perspective, the
planner's code remains simple based on the assumption that crafting the
appropriate query is the user's responsibility.

>
> I also wonder why you think we should collect all catalog information
> at the very early stage of the planner, given that most of it is only
> used much later -- after RelOptInfos have been created. If the goal
> is to avoid redundant catalog retrieval for the same relation in
> get_relation_info(), perhaps adding a caching mechanism within that
> function would be a more targeted solution. I don't see a strong
> reason for moving get_relation_info() to the very beginning of the
> planner.
This indicates that there is still room for further exploration and
discussion. For starters, the 'Redundant NullTest' issue is not the only
concern. Additionally, Postgres processes pull-up transformation blindly
without considering the cost model. However, each pull-up has its corner
case, and in practice, we often see new complaints arise after a new
pull-up technique is committed. One possible solution I envision could
be to examine indexes and/or make raw initial estimations to avoid
problematic pull-up cases.

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2025-07-03 09:23:06 Re: MergeJoin beats HashJoin in the case of multiple hash clauses
Previous Message John Naylor 2025-07-03 08:43:52 use radix tree for bitmap heap scan