Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Date: 2019-03-09 14:04:30
Message-ID: CAKJS1f_Wt_tL3S32R3wpU86zQjuHfbnZbFt0eqm=qcRFcdbLvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 4 Mar 2019 at 16:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I was a bit surprised to find that I didn't need to fool around
> with lying about whether [Merge]Append can project. I've not dug
> into the exact reason why, but I suspect it's that previous changes
> made in support of parallelization have resulted in ensuring that
> we push the upper tlist down to the children anyway, at some earlier
> stage.

I can get a plan that does end up with Result nodes above a Scan node.

create table rangep (a int, b int) partition by range (a);
create table rangep1 partition of rangep for values from(0) to (1000000);
explain select r1::text from rangep r1 inner join rangep r2 on
r1::text = r2::Text order by r1::text;

However, if we modify is_projection_capable_plan() similar to how
ExecSupportsMarkRestore() has been modified then we'd need to ensure
that any changes made to the Append/MergeAppend's tlist also are made
to the underlying node's tlist. I'm unsure if that's worth the
complexity. What do you think?

> I haven't looked into whether this does the right things for parallel
> planning --- possibly create_[merge]append_path need to propagate up
> parallel-related path fields from the single child?

I looked at this. For Append, with the current code in
add_paths_to_append_rel() we won't increase the parallel_workers
higher than the parallel workers from the single child rel. The
current code does:

parallel_workers = Max(parallel_workers, fls(list_length(live_childrels)));

so fls(1) == 1. However, I wonder if it's better to set that
explicitly in create_append_path() for the
list_length(pathnode->subpaths) == 1 case.

The other issue I found when looking into the parallel code was that
partial paths with pathkeys will get made, but will never be used.
The reason is that Append can't normally do anything with these as it
can't maintain their sort order and MergeAppend is not a parallel
aware node. I ended up adding some new code at the bottom of
generate_append_paths() to add these paths as single subpaths to
Append nodes. This also required changing create_append_path() so
that it records the child pathkeys when dealing with a single subpath
Append.

> Also, I wonder why you didn't teach ExecSupportsMarkRestore that a
> single-child MergeAppend can support mark/restore. I didn't add such
> code here either, but I suspect that's an oversight.

Added code for that.

> One other remark is that the division of labor between
> create_[merge]append_path and their costsize.c subroutines
> seems pretty unprincipled. I'd be inclined to push all the
> relevant logic into costsize.c, but have not done so here.
> Moving the existing cost calculations in create_mergeappend_path
> into costsize.c would better be done as a separate refactoring patch,
> perhaps.

I've not touched that, but happy to do it as a separate patch.

I've attached my changes to your v14 patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
drop-useless-merge-appends-15.patch application/octet-stream 59.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-09 16:31:36 Re: Should we increase the default vacuum_cost_limit?
Previous Message Andrew Dunstan 2019-03-09 13:28:22 Re: Should we increase the default vacuum_cost_limit?