Re: Early Sort/Group resjunk column elimination.

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Subject: Re: Early Sort/Group resjunk column elimination.
Date: 2021-07-20 15:47:57
Message-ID: 2961622.EcX9pJ86yZ@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le vendredi 16 juillet 2021, 17:37:15 CEST James Coleman a écrit :
> Thanks for hacking on this; as you're not surprised given I made the
> original suggestion, I'm particularly interested in this for
> incremental sort benefits, but I find the other examples you gave
> compelling also.
>
> Of course I haven't seen code yet, but my first intuition is to try to
> avoid adding extra nodes and teach the (hopefully few) relevant nodes
> to remove the resjunk entries themselves. Presumably in this case that
> would mostly be the sort nodes (including gather merge).
>
> One thing to pay attention to here is that we can't necessarily remove
> resjunk entries every time in a sort node since, for example, in
> parallel mode the gather merge node above it will need those entries
> to complete the sort.

Yes that is actually a concern, especially as the merge node is already
handled specially when applying a projection.

>
> I'm interested to see what you're working on with a patch.

I am posting this proof-of-concept, for the record, but I don't think the
numerous problems can be solved easily. I tried to teach Sort to use a limited
sort of projection, but it brings its own slate of problems...

Quick list of problems with the current implementation, leaving aside the fact
that it's quite hacky in a few places:

* result nodes are added for numerous types of non-projection-capable paths,
since the above (final) target includes resjunk columns which should be
eliminated.
* handling of appendrel seems difficult, as both ordered and unordered appends
are generated at the same time against the same target
* I'm having trouble understanding the usefulness of a building physical
tlists for SubqueryScans

The second patch is a very hacky way to try to eliminate some generated result
nodes. The idea is to bypass the whole interpreter when using a "simple"
projection which is just a reduction of the number of columns, and teach Sort
and Result to perform it. To do this, I added a parameter to
is_projection_capable_path to make the test depend on the actual asked target:
for a sort node, only a "simple" projection.

The implementation uses a junkfilter which assumes nothing else than Const and
outer var will be present.

I don't feel like this is going anywhere, but at least it's here for
discussion and posterity, if someone is interested.

--
Ronan Dunklau

Attachment Content-Type Size
v1-0001-reduce_resjunk_sortgroup_columns.patch text/x-patch 22.0 KB
v1-0002-reduce_resjunk_sortgroup_columns.patch text/x-patch 13.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-07-20 15:57:23 Re: Avoid stack frame setup in performance critical routines using tail calls
Previous Message Peter Eisentraut 2021-07-20 15:30:25 Re: logical decoding and replication of sequences