Re: Wired if-statement in gen_partprune_steps_internal

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Wired if-statement in gen_partprune_steps_internal
Date: 2021-04-07 14:07:23
Message-ID: CA+HiwqFx6BBk17wt7A1Dkmo_1E-QhTOavOWL7nFCnCMOR1O3Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 7, 2021 at 6:53 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Wed, 7 Apr 2021 at 21:04, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> >
> > On Wed, Apr 7, 2021 at 4:43 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > > However, it does change the meaning of what PARTCLAUSE_MATCH_STEPS
> > > does. If we ever needed to expand what PARTCLAUSE_MATCH_STEPS does,
> > > then we'll have less flexibility with the newly updated code. For
> > > example if we needed to return multiple steps and only combine them at
> > > the top level then we now can't. I feel there's a good possibility
> > > that we'll never need to do that, but I'm not certain of that.
> > >
> > > I'm keen to hear your opinion on this.
> >
> > That's a good point. So maybe gen_partprune_steps_internal() should
> > continue to return a list of steps, the last of which would be an
> > intersect step to combine the results of the earlier multiple steps.
> > We should still fix the originally reported issue that
> > gen_prune_steps_from_opexps() seems to needlessly add an intersect
> > step.
>
> I was hoping you'd just say that we'll likely not need to do that and
> if we ever did we could adapt the code at that time. :)
>
> Thinking more about it, these steps we're talking about are generated
> from a recursive call to gen_partprune_steps_internal(). I'm finding
> it very hard to imagine that we'd want to combine steps generated in
> some recursive call with steps from outside that same call. Right now
> we recuse into AND BoolExprs OR BoolExprs. I'm struggling to think of
> why we'd want to combine a set of steps we generated processing some
> of those with steps from outside that BoolExpr. If we did, we might
> want to consider teaching canonicalize_qual() to fix it beforehand.
>
> e.g.
>
> postgres=# explain select * from ab where (a = 1 and b = 1) or (a = 1
> and b = 2);
> QUERY PLAN
> ---------------------------------------------------
> Seq Scan on ab (cost=0.00..49.55 rows=1 width=8)
> Filter: ((a = 1) AND ((b = 1) OR (b = 2)))
> (2 rows)
>
> If canonicalize_qual() had been unable to rewrite that WHERE clause
> then I could see that we might want to combine steps from other
> recursive quals. I'm thinking right now that I'm glad
> canonicalize_qual() does that hard work for us.
> (I think partprune.c
> could handle the original WHERE clause as-is in this example
> anyway...)

Actually, I am not sure that canonicalization always makes things
better for partprune.c. I can show examples where canonicalization
causes partprune.c as it is today to not be able to prune as optimally
as it could have with the original ones.

create table ab (a int, b int) partition by range (a, b);
create table ab0 partition of ab for values from (1, 1) to (1, 2);
create table ab1 partition of ab for values from (1, 2) to (1, 3);
create table ab2 partition of ab for values from (1, 3) to (1, 4);
create table ab3 partition of ab for values from (2, 1) to (2, 2);

explain select * from ab where (a = 1 and b = 1) or (a = 1 and b = 2);
QUERY PLAN
---------------------------------------------------------------
Append (cost=0.00..148.66 rows=3 width=8)
-> Seq Scan on ab0 ab_1 (cost=0.00..49.55 rows=1 width=8)
Filter: ((a = 1) AND ((b = 1) OR (b = 2)))
-> Seq Scan on ab1 ab_2 (cost=0.00..49.55 rows=1 width=8)
Filter: ((a = 1) AND ((b = 1) OR (b = 2)))
-> Seq Scan on ab2 ab_3 (cost=0.00..49.55 rows=1 width=8)
Filter: ((a = 1) AND ((b = 1) OR (b = 2)))
(7 rows)

explain select * from ab where (a = 1 and b = 1) or (a = 1 and b = 3);
QUERY PLAN
---------------------------------------------------------------
Append (cost=0.00..148.66 rows=3 width=8)
-> Seq Scan on ab0 ab_1 (cost=0.00..49.55 rows=1 width=8)
Filter: ((a = 1) AND ((b = 1) OR (b = 3)))
-> Seq Scan on ab1 ab_2 (cost=0.00..49.55 rows=1 width=8)
Filter: ((a = 1) AND ((b = 1) OR (b = 3)))
-> Seq Scan on ab2 ab_3 (cost=0.00..49.55 rows=1 width=8)
Filter: ((a = 1) AND ((b = 1) OR (b = 3)))
(7 rows)

I would've expected the 1st query to scan ab0 and ab1, whereas the 2nd
query to scan ab0 and ab2. But in the canonicalized version, the
AND's 2nd arm is useless for multi-column range pruning, because it
only provides clauses for the 2nd key. With the original version,
both arms of the OR have ANDed clauses covering both keys, so pruning
with that would have produced the desired result.

So, if I am not entirely wrong, maybe it is exactly because of
canonicalization that partprune.c should be looking to peek across
BoolExprs.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2021-04-07 14:09:41 Re: Why is specifying oids = false multiple times in create table is silently ignored?
Previous Message Andy Fan 2021-04-07 14:04:11 Cost model improvement for run-time partition prune