Re: partition routing layering in nodeModifyTable.c

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: partition routing layering in nodeModifyTable.c
Date: 2020-10-17 15:54:29
Message-ID: 20201017155429.GA9862@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-Oct-17, Amit Langote wrote:

> Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
> information, because it's primarily meant to be used when inserting
> *directly* into a partition, although it's true we do initialize it in
> routing target partitions too in some cases.
>
> Also, ChildToRootMap was introduced by the trigger transition table
> project, not tuple routing. I think we misjudged this when we added
> PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
> belong there. This patch fixes that by removing PartitionToRootMap.
>
> RootToPartitionMap and the associated partition slot is the only piece
> of extra information that is needed by tuple routing target relations.

Well, I was thinking on making the ri_PartitionInfo be about
partitioning in general, not just specifically for partition tuple
routing. Maybe Heikki is right that it may end up being simpler to
remove ri_PartitionInfo altogether. It'd just be a couple of additional
pointers in ResultRelInfo after all. (Remember that we wanted to get
rid of fields specific to only certain kinds of RTEs in RangeTblEntry
for example, to keep things cleanly separated, although that project
eventually found its demise for other reasons.)

> As I said in my previous email, I don't see how we can make
> initializing the map any lazier than it already is. If a partition
> has a different tuple descriptor than the root table, then we know for
> sure that any tuples that are routed to it will need to be converted
> from the root tuple format to its tuple format, so we might as well
> build the map when the ResultRelInfo is built. If no tuple lands into
> a partition, we would neither build its ResultRelInfo nor the map.
> With the current arrangement, if the map field is NULL, it simply
> means that the partition has the same tuple format as the root table.

I see -- makes sense.

> On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> > BTW it is curious that ExecInitRoutingInfo is called both in
> > ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
> > for the partition is not found) *and* from ExecFindPartition again, when
> > the ResultRelInfo for the partition *is* found. Doesn't this mean that
> > ri_PartitionInfo is set up twice for the same partition?
>
> No. ExecFindPartition() directly calls ExecInitRoutingInfo() only for
> reused update result relations, that too, only the first time a tuple
> lands into such a partition. For the subsequent tuples that land into
> the same partition, ExecFindPartition() will be able to find that
> ResultRelInfo in the proute->partitions[] array. All ResultRelInfos
> in that array are assumed to have been processed by
> ExecInitRoutingInfo().

Doh, right, sorry, I was misreading the if/else maze there.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-10-17 15:58:26 Re: warn_unused_results
Previous Message Tom Lane 2020-10-17 15:50:57 Re: Sometimes the output to the stdout in Windows disappears