Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I've been hacking on the idea of an Append node which maintains the ordering
> of its subtables merging their records in order.
I finally got round to looking at this ...
> 1) I still haven't completely figured out what to do with equivalence classes.
> My hack of just stuffing all the append subrel vars into there seems to
> work fine. I need to understand what's going on to see if there's really a
> problem with it or not.
This still makes me itchy, but it might be all right, because we'd never
really consider a plan for a single subnode as representing a useful
plan for the whole query. What it would need is some additions to the
README files (ahem) describing what's going on.
> 2) I'm not sure this code will work when the append rel is a target (ie UPDATE
> and DELETE stmts).
It won't, at least not for the case where the appendrel is an
inheritance tree in which the children are unlike the parent.
You have missed updating the estate->es_result_relation_info and
estate->es_junkFilter to match the tuple actually returned. In a plain
Append it is only necessary to update those when switching from one
subnode to the next, so we handle it in exec_append_initialize_next.
In an ordered Append you'd have to track which subnode each entry in the
heap came from (easy, but you aren't doing it now) and update those
fields for every tuple returned.
If you need an example of what I'm talking about:
create table p1(f1 int, f2 int);
create table p2(f3 int, f4 int);
create table c() inherits(p1, p2);
... insert some data ...
update p2 set ...
Of course a plain UPDATE p2 wouldn't have much interest in
ordered-Append plans, but you could force the failure by doing
a joining update with a mergejoin plan.
> 4) I haven't handled mark/restore or random access. I think they could be
> handled and they'll probably be worth the complexity but I'm not sure.
I don't think it'd be nearly as easy as you think, eg just how are you
going to "back up" the merge? A heap merge doesn't normally remember
the past few tuples it emitted. I'd definitely recommend not trying
that in v1.
BTW, I concur with Heikki's suggestion that this might be better done
as a separate executor node type. There'd be a certain amount of
duplicated boilerplate, but that would force you to look at each and
every reference to Append nodes and decide what's needed. There are a
whole bunch of places you've missed touching that know something about
what Append nodes can do, and you'll need to look at them all in any
> 5) Is it considering too many paths?
Hard to say for sure, but I concur that that needs thinking about.
One point is that as soon as you have even one subnode that hasn't
got a cheap-startup-cost path, it probably stops being interesting
to try to build a cheap-startup-cost ordered-Append path. But
exactly what's the threshold of "cheap enough", I'm not sure.
regards, tom lane
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2008-04-07 02:16:15|
|Subject: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC) |
|Previous:||From: Bruce Momjian||Date: 2008-04-06 23:54:37|
|Subject: Re: Kludge in pg_standby.c|