Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-06-26 03:07:36
Message-ID: CAJ3gD9dWs2j5W0dzSQJANUuW8wMC7qtcCxwO2=QjM0Mq3pUhsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 June 2017 at 01:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Second, it will amount to a functional bug if you get a
>>> different answer than the planner did.
>>
>> Actually, the per-leaf WCOs are meant to be executed on the
>> destination partitions where the tuple is moved, while the WCOs
>> belonging to the per-subplan resultRelInfo are meant for the
>> resultRelinfo used for the UPDATE plans. So actually it should not
>> matter whether they look same or different, because they are fired at
>> different objects. Now these objects can happen to be the same
>> relations though.
>>
>> But in any case, it's not clear to me how the mapped WCO and the
>> planner's WCO would yield a different answer if they are both the same
>> relation. I am possibly missing something. The planner has already
>> generated the withCheckOptions for each of the resultRelInfo. And then
>> we are using one of those to re-generate the WCO for a leaf partition
>> by only adjusting the attnos. If there is already a WCO generated in
>> the planner for that leaf partition (because that partition was
>> present in mtstate->resultRelInfo), then the re-built WCO should be
>> exactly look same as the earlier one, because they are the same
>> relations, and so the attnos generated in them would be same since the
>> Relation TupleDesc is the same.
>
> If the planner's WCOs and mapped WCOs are always the same, then I
> think we should try to avoid generating both. If they can be
> different, but that's intentional and correct, then there's no
> substantive problem with the patch but the comments need to make it
> clear why we are generating both.
>
>> Actually I meant, "above works for only local updates. For
>> row-movement-updates, we need per-leaf partition WCOs, because when
>> the row is inserted into target partition, that partition may be not
>> be included in the above planner resultRelInfo, so we need WCOs for
>> all partitions". I think this said comment should be sufficient if I
>> add this in the code ?
>
> Let's not get too focused on updating the comment until we are in
> agreement about what the code ought to be doing. I'm not clear
> whether you accept the point that the patch needs to be changed to
> avoid generating the same WCOs and returning lists in both the planner
> and the executor.

Yes, we can re-use the WCOs generated in the planner, as an
optimization, since those we re-generate for the same relations will
look exactly the same. The WCOs generated by planner (in
inheritance_planner) are generated when (in adjust_appendrel_attrs())
we change attnos used in the query to refer to the child RTEs and this
adjusts the attnos of the WCOs of the child RTEs. So the WCOs of
subplan resultRelInfo are actually the parent table WCOs, but only the
attnos changed. And in ExecInitModifyTable() we do the same thing for
leaf partitions, although using different function
map_variable_attnos().

>
>>> Also, I feel like it's probably not correct to use the first result
>>> relation as the nominal relation for building WCOs and returning lists
>>> anyway. I mean, if the first result relation has a different column
>>> order than the parent relation, isn't this just broken? If it works
>>> for some reason, the comments don't explain what that reason is.

One thing I didn't mention earlier about the WCOs, is that for child
rels, we don't use the WCOs defined for the child rels. We only
inherit the WCO expressions defined for the root rel. That's the
reason they are the same expressions, only the attnos changed to match
the respective relation tupledesc. If the WCOs of each of the subplan
resultRelInfo() were different, then definitely it was not possible to
use the first resultRelinfo to generate other leaf partition WCOs,
because the WCO defined for relation A is independent of that defined
for relation B.

So, since the WCOs of all the relations are actually those of the
parent, we only need to adjust the attnos of any of these
resultRelInfos.

For e.g., if the root rel WCO is defined as "col > 5" where col is the
4th column, the expression will look like "var_1.attno_4 > 5". And the
WCO that is generated for a subplan resultRelInfo will look something
like "var_n.attno_2 > 5" if col is the 2nd column in this table.

All of the above logic assumes that we never use the WCO defined for
the child relation. At least that's how it looks by looking at the way
we generate WCOs in ExecInitModifyTable() for INSERTs as well looking
at the code in inheritance_planner() for UPDATEs. At both these
places, we never use the WCOs defined for child tables.

So suppose we define the tables and their WCOs like this :

CREATE TABLE range_parted ( a text, b int, c int) partition by range (a, b);

ALTER TABLE range_parted ENABLE ROW LEVEL SECURITY;
GRANT ALL ON range_parted TO PUBLIC ;
create policy seeall ON range_parted as PERMISSIVE for SELECT using ( true);

create table part_b_10_b_20 partition of range_parted for values from
('b', 10) to ('b', 20) partition by range (c);

create table part_c_1_100 (b int, c int, a text);
alter table part_b_10_b_20 attach partition part_c_1_100 for values
from (1) to (100);
create table part_c_100_200 (c int, a text, b int);
alter table part_b_10_b_20 attach partition part_c_100_200 for values
from (100) to (200);

GRANT ALL ON part_c_100_200 TO PUBLIC ;
ALTER TABLE part_c_100_200 ENABLE ROW LEVEL SECURITY;
create policy seeall ON part_c_100_200 as PERMISSIVE for SELECT using ( true);

insert into part_c_1_100 (a, b, c) values ('b', 12, 96);
insert into part_c_1_100 (a, b, c) values ('b', 13, 97);
insert into part_c_100_200 (a, b, c) values ('b', 15, 105);
insert into part_c_100_200 (a, b, c) values ('b', 17, 105);

-- For root table, allow updates only if NEW.c is even number.
create policy pu on range_parted for UPDATE USING (true) WITH CHECK (c % 2 = 0);
-- For this table, allow updates only if NEW.c is divisible by 4.
create policy pu on part_c_100_200 for UPDATE USING (true) WITH CHECK
(c % 4 = 0);

Now, if we try to update the child table using UPDATE on root table,
it will allow setting c to a number which would otherwise violate WCO
constraint of the child table if the query would have run on the child
table directly :

postgres=# set role user1;
SET
postgres=> select tableoid::regclass, * from range_parted where b = 17;
tableoid | a | b | c
----------------+---+----+-----
part_c_100_200 | b | 17 | 105
-- root table does not allow updating c to odd numbers
postgres=> update range_parted set c = 107 where a = 'b' and b = 17 ;
ERROR: new row violates row-level security policy for table "range_parted"
-- child table does not allow to modify it to 106 because it is not
divisble by 4.
postgres=> update part_c_100_200 set c = 106 where a = 'b' and b = 17 ;
ERROR: new row violates row-level security policy for table "part_c_100_200"
-- But we can update it to 106 by running update on the root table,
because here child table WCOs won't get used.
postgres=> update range_parted set c = 106 where a = 'b' and b = 17 ;
UPDATE 1
postgres=> select tableoid::regclass, * from range_parted where b = 17;
tableoid | a | b | c
----------------+---+----+-----
part_c_100_200 | b | 17 | 106

Same applies for INSERTs. I hope this is expected behaviour. Initially
I had found this weird, but then saw that is consistent for both
inserts as well as updates.

>>
>> Not sure why parent relation should come into picture. As long as the
>> first result relation belongs to one of the partitions in the whole
>> partition tree, we should be able to use that to build WCOs of any
>> other partitions, because they have a common set of attributes having
>> the same name. So we are bound to find each of the attributes of first
>> resultRelInfo in the other leaf partitions during attno mapping.
>
> Well, at least for returning lists, we've got to generate the
> returning lists so that they all match the column order of the parent,
> not the parent's first child.
> Otherwise, for example, UPDATE
> parent_table ... RETURNING * will not work correctly. The tuples
> returned by the returning clause have to have the attribute order of
> parent_table, not the attribute order of parent_table's first child.
> I'm not sure whether WCOs have the same issue, but it's not clear to
> me why they wouldn't: they contain a qual which is an expression tree,
> and presumably there are Var nodes in there someplace, and if so, then
> they have varattnos that have to be right for the purpose for which
> they're going to be used.

So once we put the attnos right according to the child relation
tupdesc, the rest part of generating the final RETURNING expressions
as per the root able column order is taken care of by the returning
projection, no ?

This scenario is included in the update.sql regression test here :
-- ok (row movement, with subset of rows moved into different partition)
update range_parted set b = b - 6 where c > 116 returning a, b + c;

>
>>> + for (i = 0; i < num_rels; i++)
>>> + {
>>> + ResultRelInfo *resultRelInfo = &result_rels[i];
>>> + Relation rel = resultRelInfo->ri_RelationDesc;
>>> + Bitmapset *expr_attrs = NULL;
>>> +
>>> + pull_varattnos((Node *) rel->rd_partcheck, 1, &expr_attrs);
>>> +
>>> + /* Both bitmaps are offset by FirstLowInvalidHeapAttributeNumber. */
>>> + if (bms_overlap(expr_attrs, GetUpdatedColumns(resultRelInfo, estate)))
>>> + return true;
>>> + }
>>>
>>> This seems like an awfully expensive way of performing this test.
>>> Under what circumstances could this be true for some result relations
>>> and false for others;
>>
>> One resultRelinfo can have no partition key column used in its quals,
>> but the next resultRelinfo can have quite different quals, and these
>> quals can have partition key referred. This is possible if the two of
>> them have different parents that have different partition-key columns.
>
> Hmm, true. So if we have a table foo that is partitioned by list (a),
> and one of its children is a table bar that is partitioned by list
> (b), then we need to consider doing tuple-routing if either column a
> is modified, or if column b is modified for a partition which is a
> descendant of bar. But visiting that only requires looking at the
> partitioned table and those children that are also partitioned, not
> all of the leaf partitions as the patch does.
>

Will give a thought on this and get back on this, and remaining points.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-06-26 03:12:09 Re: Get stuck when dropping a subscription during synchronizing table
Previous Message Michael Paquier 2017-06-26 03:06:19 Re: Timing-sensitive case in src/test/recovery TAP tests