Re: ON CONFLICT DO UPDATE for partitioned tables

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ON CONFLICT DO UPDATE for partitioned tables
Date: 2018-02-28 09:24:27
Message-ID: 3c3bbb2c-c28f-5b09-c124-b99e2231fe9b@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018/02/28 9:46, Alvaro Herrera wrote:
> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
> Following the lead of edd44738bc88 ("Be lazier about partition tuple
> routing.") this incarnation only does the necessary push-ups for the
> specific partition that needs it, at execution time. As far as I can
> tell, it works as intended.

Thanks. I too have been meaning to send an updated (nay, significantly
rewritten) version of the patch, but you beat me to it.

> I chose to refuse the case where the DO UPDATE clause causes the tuple
> to move to another partition (i.e. you're updating the partition key of
> the tuple). While it's probably possible to implement that, it doesn't
> seem a very productive use of time.

Probably a good idea to save it for later.

> However, there is a shortcoming in the design: it fails if there are
> multiple levels of partitioning, because there is no easy (efficient)
> way to map the index OID more than one level. I had already mentioned
> this shortcoming to Amit's patch. So this case (which I added in the
> regression tests) fails unexpectedly:
>
> -- multiple-layered partitioning
> create table parted_conflict_test (a int primary key, b text) partition by range (a);
> create table parted_conflict_test_1 partition of parted_conflict_test
> for values from (0) to (10000) partition by range (a);
> create table parted_conflict_test_1_1 partition of parted_conflict_test_1
> for values from (0) to (100);
> insert into parted_conflict_test values ('10', 'ten');
> insert into parted_conflict_test values ('10', 'ten two')
> on conflict (a) do update set b = excluded.b;
> ERROR: invalid ON CONFLICT DO UPDATE specification
> DETAIL: An inferred index was not found in partition "parted_conflict_test_1_1".
>
> So the problem here is that my MapPartitionIndexList() implementation is
> too stupid. I think it would be smarter to use the ResultRelInfo
> instead of bare Relation, for one. But that still doesn't answer how to
> find a "path" from root to leaf partition, which is what I'd need to
> verify that there are valid pg_inherits relationships for the partition
> indexes. I'm probably missing something about the partitionDesc or
> maybe the partitioned_rels lists that helps me do it efficiently, but I
> hope figure out soon.
>
> One idea I was toying with is to add RelationData->rd_children as a list
> of OIDs of children relations. So it should be possible to walk the
> list from the root to the desired descendant, without having to scan
> pg_inherits repeatedly ... although that would probably require doing
> relation_open() for every index, which sounds undesirable.
>
> (ISTM that having RelationData->rd_children might be a good idea in
> general anyway -- I mean to speed up some code that currently scans
> pg_inherits via find_inheritance_children. However, since the partition
> descriptor itself is already in relcache, maybe this doesn't matter too
> much.)
>
> Another idea is to abandon the notion that we need to find a path from
> parent index to descendant index, and just run the inference algorithm
> again on the partition. I'm not sure how much I like this idea, yet.
>
> Anyway, while this is still WIP, I think it works correctly for the case
> where there is a single partition level.

Actually, after your comment on my original patch [1], I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command). So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list. After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list). In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions. That means, simple map_partition_varattnos()
translation doesn't help in this case.

For example, with your patch (sorry, I know you said it's a WIP patch), I
see either a crash or errors when dealing with such differing attribute
numbers:

drop table p;
create table p (a int, b text) partition by list (a);
create table p12 (b text, a int);
alter table p attach partition p12 for values in (1, 2);
alter table p drop b, add b text;
create table p4 partition of p for values in (4);
create unique index on p (a);

insert into p values (1, 'a') on conflict (a) do update set b = excluded.b;

insert into p values (1, 'b') on conflict (a) do update set b = excluded.b;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

insert into p values (4, 'a') on conflict (a) do update set b = excluded.b;

postgres=# insert into p values (4, 'b') on conflict (a) do update set b =
excluded.b;
ERROR: attribute number 3 exceeds number of columns 2

I attach my patch here for your reference, which I polished this morning
after seeing your email and the patch. It works for most of the cases, as
you can see in the updated tests in insert_conflict.sql. Since I agree
with you that we should, for now, error out if DO UPDATE causes row
movement, I adopted the code from your patch for that.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/20171227225031.osh6vunpuhsath25%40alvherre.pgsql

Attachment Content-Type Size
v1-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patch text/plain 31.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2018-02-28 10:14:51 Re: [HACKERS] path toward faster partition pruning
Previous Message Amit Langote 2018-02-28 09:10:38 Re: inserts into partitioned table may cause crash