Re: Ordered Append Node

From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ordered Append Node
Date: 2007-11-22 19:37:20
Message-ID: 4745DA70.9060209@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Gregory,

Gregory Stark wrote:
> I've been hacking on the idea of an Append node which maintains the ordering
> of its subtables merging their records in order. This is important for
> partitioned tables since otherwise a lot of plans are not available such as
> merge joins.

Cool!

Some time ago, I've been trying something very similar, but didn't get
very far. I'd like to share my thoughts anyway.

First of, I envisioned that append node to be applicable also to
unordered reading from multiple partitions, simply interleaving between
the partitions.

> The logic I've followed is to do as follows:
>
> 1) Go through all subrels asking for any interesting pathkey lists. Gather up
> the union of all of these.

I also tried to modify the Append node first, then figured that it might
be better to base on the merge join node instead. While this seems
farther away, I had the hope that a binary tree of such 'plain merge'
nodes would require less comparisons in total.

Plus, I found it a lot simpler to cope with exactly two input relations
instead of n, as with the append node. :-)

> 2) Go back through all the subrels and for each accumulated pathkey list ask
> for the best path for that subrel for that pathkey list.
>
> 3) Generate an two append paths for each of these pathkey lists, one using the
> best startup_cost subpath and one with the best total_cost subpath
> available for the pathkey list for each child rel. If there is none
> available take the best unordered path and put a sort node above it.
>
> 4) Additionally advertise the traditional unordered append node which our
> parent could choose to put a sort node above same as ever.
>
> 5) Append plan nodes look a lot like Sort plan nodes glued onto an Append plan
> node, with sort function oids and so on.
>
> 6) Append exec state nodes look a lot like a (a few fields from)
> tuplesortstate glued onto an Append node. They have the ScanKey array and
> the fmgr sort functions. They also have an array of TupleTableSlot and a
> heap of indexes into that array.
>
> 8) ExecAppend does something similar to tuplesort's bounded sort (I fear I'm
> going to get typecasted) or more to the point, similar to the final merge
> of a merge sort. It directly returns the slot the child rel last returned
> to it.

Uh.. well, yeah. I guess you have a better understanding of the planner
and executor that I do.

> Open questions:
>
> 1) I still haven't completely figured out what to do with equivalence classes.
> My hack of just stuffing all the append subrel vars into there seems to
> work fine. I need to understand what's going on to see if there's really a
> problem with it or not.
>
> 2) I'm not sure this code will work when the append rel is a target (ie UPDATE
> and DELETE stmts).
>
> 3) It does seem to work when the columns in the subrels don't line up but I
> didn't do anything special to handle this case.

Uh.. is there any use case for that? WRT partitioning certainly not, is
there?

I'll have a look at your patch.

Regards

Markus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-11-22 20:10:38 Re: Ordered Append Node
Previous Message Simon Riggs 2007-11-22 19:14:57 Re: Autovacuum and OldestXmin