Re: Runtime pruning problem

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Yuzuko Hosoya <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Runtime pruning problem
Date: 2019-07-31 02:29:56
Message-ID: CA+HiwqHwGnUuC6-4Lx+xS7Qr6hnqJnRHpapFwpSiDuY-s3Hhxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 31, 2019 at 8:31 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > On Wed, 31 Jul 2019 at 10:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The portion of this below the Append is fine, but I argue that
> >> the Vars above the Append should say "part", not "part_p1".
> >> In that way they'd look the same regardless of which partitions
> >> have been pruned or not.
>
> > That seems perfectly reasonable for Append / MergeAppend that are for
> > scanning partitioned tables. What do you propose we do for inheritance
> > and UNION ALLs?
>
> For inheritance, I don't believe there would be any change, precisely
> because we've historically used the parent rel as reference.

I may be missing something, but Vars above an Append/MergeAppend,
whether it's scanning a partitioned table or a regular inheritance
table, always refer to the first child subplan, which may or may not
be for the inheritance parent in its role as a child, not the Append
parent.

create table parent (a int);
alter table only parent add check (a = 1) no inherit;
create table child1 (a int check (a = 2)) inherits (parent);
create table child2 (a int check (a = 3)) inherits (parent);

explain (costs off, verbose) select * from parent where a > 1 order by 1;
QUERY PLAN
───────────────────────────────────────
Sort
Output: child1.a
Sort Key: child1.a
-> Append
-> Seq Scan on public.child1
Output: child1.a
Filter: (child1.a > 1)
-> Seq Scan on public.child2
Output: child2.a
Filter: (child2.a > 1)
(10 rows)

I think this is because we replace the original targetlist of such
nodes by a dummy one using set_dummy_tlist_references(), where all the
parent Vars are re-stamped with OUTER_VAR as varno. When actually
printing the EXPLAIN VERBOSE output, ruleutils.c considers the first
child of Append as the OUTER referent, as set_deparse_planstate()
states:

/*
* We special-case Append and MergeAppend to pretend that the first child
* plan is the OUTER referent; we have to interpret OUTER Vars in their
* tlists according to one of the children, and the first one is the most
* natural choice.

If I change set_append_references() to comment out the
set_dummy_tlist_references() call, I get this output:

explain (costs off, verbose) select * from parent where a > 1 order by 1;
QUERY PLAN
───────────────────────────────────────
Sort
Output: a
Sort Key: a
-> Append
-> Seq Scan on public.child1
Output: child1.a
Filter: (child1.a > 1)
-> Seq Scan on public.child2
Output: child2.a
Filter: (child2.a > 1)
(10 rows)

Not parent.a as I had expected. That seems to be because parent's RTE
is considered unused in the plan. One might say that the plan's
Append node belongs to that RTE, but then Append doesn't have any RT
index attached to it, so it escapes ExplainPreScanNode()'s walk of the
plan tree to collect the indexes of "used RTEs". I changed
set_rtable_names() to get around that as follows:

@@ -3458,7 +3458,7 @@ set_rtable_names(deparse_namespace *dpns, List
*parent_namespaces,
/* Just in case this takes an unreasonable amount of time ... */
CHECK_FOR_INTERRUPTS();

- if (rels_used && !bms_is_member(rtindex, rels_used))
+ if (rels_used && !bms_is_member(rtindex, rels_used) && !rte->inh)

and I get:

explain (costs off, verbose) select * from parent where a > 1 order by 1;
QUERY PLAN
───────────────────────────────────────
Sort
Output: parent.a
Sort Key: parent.a
-> Append
-> Seq Scan on public.child1
Output: child1.a
Filter: (child1.a > 1)
-> Seq Scan on public.child2
Output: child2.a
Filter: (child2.a > 1)
(10 rows)

> For setops we've traditionally used the left input as reference.
> Maybe we could do better, but I'm not very sure how, since SQL
> doesn't actually provide any explicit names for the setop result.
> Making up a name with no basis in the query probably isn't an
> improvement, or at least not enough of one to justify a change.

I too am not sure what we should about Appends of setops, but with the
above hacks, I get this:

explain (costs off, verbose) select * from child1 union all select *
from child2 order by 1;
QUERY PLAN
───────────────────────────────────────
Sort
Output: "*SELECT* 1".a
Sort Key: "*SELECT* 1".a
-> Append
-> Seq Scan on public.child1
Output: child1.a
-> Seq Scan on public.child2
Output: child2.a
(8 rows)

whereas currently it prints:

explain (costs off, verbose) select * from child1 union all select *
from child2 order by 1;
QUERY PLAN
───────────────────────────────────────
Sort
Output: child1.a
Sort Key: child1.a
-> Append
-> Seq Scan on public.child1
Output: child1.a
-> Seq Scan on public.child2
Output: child2.a
(8 rows)

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2019-07-31 03:07:21 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Andres Freund 2019-07-31 02:20:53 Re: tap tests driving the database via psql