Re: [HACKERS] UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: Re: [HACKERS] UPDATE of partition key
Date: 2018-01-10 15:00:05
Message-ID: CAJ3gD9dixkkMzNnnP1CaZ1H17-U17ok_sVbjZZo+wnB=rJH6yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 January 2018 at 02:52, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> 1.
>
> NEW.c = NEW.c + 1; -- Make even number odd, or vice versa
>
> This seems to be worded as if there'd only ever be one number. I think
> it should be plural and read "Make even numbers odd, and vice versa"

Done.

>
> 2. The following comment does not make a huge amount of sense.
>
> -- UPDATE with
> -- partition key or non-partition columns, with different column ordering,
> -- triggers.
>
> Should "or" be "on"? Does ", triggers" mean "with triggers"?

Actually I was trying to summarize what kinds of scenarios are going
to be tested. Now I think we don't have to give this summary. Rather,
we should describe each of the scenarios individually. But I did want
to use list partitions at least in a subset of update-partition-key
scenarios. So I have removed this comment, and replaced it by :

-- Some more update-partition-key test scenarios below. This time use list
-- partitions.

>
> 3. The follow test tries to test a BEFORE DELETE trigger stopping a
> DELETE on sub_part1, but going by the SELECT, there are no rows in
> that table to stop being DELETEd.
>
> select tableoid::regclass::text , * from list_parted order by 1, 2, 3, 4;
> tableoid | a | b | c
> ------------+---+----+----
> list_part1 | 2 | 52 | 50
> list_part1 | 3 | 6 | 60
> sub_part2 | 1 | 2 | 10
> sub_part2 | 1 | 2 | 70
> (4 rows)
>
> drop trigger parted_mod_b ON sub_part1 ;
> -- If BR DELETE trigger prevented DELETE from happening, we should also skip
> -- the INSERT if that delete is part of UPDATE=>DELETE+INSERT.
> create or replace function func_parted_mod_b() returns trigger as $$
> begin return NULL; end $$ language plpgsql;
> create trigger trig_skip_delete before delete on sub_part1
> for each row execute procedure func_parted_mod_b();
> update list_parted set b = 1 where c = 70;
> select tableoid::regclass::text , * from list_parted order by 1, 2, 3, 4;
> tableoid | a | b | c
> ------------+---+----+----
> list_part1 | 2 | 52 | 50
> list_part1 | 3 | 6 | 60
> sub_part1 | 1 | 1 | 70
> sub_part2 | 1 | 2 | 10
> (4 rows)
>
> You've added the BEFORE DELETE trigger to sub_part1, but you can see
> the tuple was DELETEd from sub_part2 and INSERTed into sub_part1, so
> the test is not working as you've commented.
>
> It's probably a good idea to RAISE NOTICE 'something useful here'; in
> the trigger function to verify they're actually being called in the
> test.

Done. The trigger should have been for sub_part2, not sub_part1. Corrected that.
Also, dropped the trigger and again tested the UPDATE.

>
> 4. I think the final drop function in the following should be before
> the UPDATE FROM test. You've already done some cleanup for that test
> by doing "drop trigger trig_skip_delete ON sub_part1 ;"
>
> drop trigger trig_skip_delete ON sub_part1 ;
> -- UPDATE partition-key with FROM clause. If join produces multiple output
> -- rows for the same row to be modified, we should tuple-route the row
> only once.
> -- There should not be any rows inserted.
> create table non_parted (id int);
> insert into non_parted values (1), (1), (1), (2), (2), (2), (3), (3), (3);
> update list_parted t1 set a = 2 from non_parted t2 where t1.a = t2.id and a = 1;
> select tableoid::regclass::text , * from list_parted order by 1, 2, 3, 4;
> tableoid | a | b | c
> ------------+---+----+----
> list_part1 | 2 | 1 | 70
> list_part1 | 2 | 2 | 10
> list_part1 | 2 | 52 | 50
> list_part1 | 3 | 6 | 60
> (4 rows)
>
> drop table non_parted;
> drop function func_parted_mod_b();

Done. Moved it to relevant place.

>
> Also, there's a space before the ; in the drop trigger above. Can that
> be removed?
Removed.

>
> 5. The following comment:
>
> -- update to a partition should check partition bound constraint for
> the new tuple.
> -- If partition key is updated, the row should be moved to the appropriate
> -- partition. updatable views using partitions should enforce the check options
> -- for the rows that have been moved.
>
> Can this be changed a bit? I think it's not accurate to say that an
> update to a partition key causes the row to move. The row movement
> only occurs when the new tuple does not match the partition bound and
> another partition exists that does have a partition bound that matches
> the tuple. How about:
>
> -- When a partitioned table receives an UPDATE to the partitioned key and the
> -- new values no longer meet the partition's bound, the row must be moved to
> -- the correct partition for the new partition key (if one exists). We must
> -- also ensure that updatable views on partitioned tables properly enforce any
> -- WITH CHECK OPTION that is defined. The situation with triggers in this case
> -- also requires thorough testing as partition key updates causing row
> -- movement convert UPDATEs into DELETE+INSERT.

Done.

>
> 6. What does the following actually test?
>
> -- This tests partition-key UPDATE on a partitioned table that does
> not have any child partitions
> update part_b_10_b_20 set b = b - 6;
>
> There are no records in that partition, or anywhere in the hierarchy.
> Are you just testing that there's no error? If so then the comment
> should say so.

Yes, I understand that there won't be any update scan plans. But, with
the modifications done in ExecInitModifyTable(), I wanted to run that
code with this scenario where there are no partitions, to make sure it
does not behave weirdly or crash. Any suggestions for comments, given
this perspective ? For now, I have made the comment this way:

-- Check that partition-key UPDATE works sanely on a partitioned table
that does not have any child partitions.

>
> 7. I think the following comment:
>
> -- As mentioned above, the partition creation is intentionally kept in
> descending bound order.
>
> should instead say:
>
> -- Create some more partitions following the above pattern of descending bound
> -- order, but let's make the situation a bit more complex by having the
> -- attribute numbers of the columns vary from their parent partition.

Done.

>
> 8. Just to make the tests a bit easier to follow, can you move the
> following down to where you're first using it:
>
> create table mintab(c1 int);
> insert into mintab values (120);
>
> and
>
> CREATE VIEW upview AS SELECT * FROM range_parted WHERE (select c > c1
> from mintab) WITH CHECK OPTION;

Done.

>
> 9. It seems that the existing part of update.sql capitalises SQL
> keywords, but you mostly don't. I understand we're not always
> consistent, but can you keep it the same as the existing part of the
> file?

Done.

>
> 10. Stray space before trailing ':'
>
> -- fail (row movement happens only within the partition subtree) :

Done, at other applicable places also.

>
> 11. Can the following become:
>
> -- succeeds, row movement , check option passes
>
> -- success, update with row movement, check option passes:
>
> Seems there's also quite a mix of comment formats in your tests.
>
> You're using either one of; ok, success, succeeds followed by
> sometimes a comma, and sometimes a reason in parentheses. The existing
> part of the file seems to use:
>
> -- fail, <reason>:
>
> and just
>
> -- <reason>:
>
> for non-failures.
>
> Would be great to stick to what's there.

There were existing lines where "ok, " was used.
So, now used this everywhere :
ok, ...
fail, ...

>
> 12. The following comment seems to indicate that you're installing
> triggers on all leaf partitions, but that's not the case:
>
> -- Install BR triggers on child partition, so that transition tuple
> conversion takes place.
>
> maybe you should write "on some child partitions"? Or did you mean to
> define a trigger on them all?

Trigger should be installed at least on the partitions onto which rows
are moved. I have corrected the comment accordingly.

Actually, to test transition tuple conversion with
update-row-movement, it requires a statement level trigger that
references transition tables. And trans_updatetrig already was
dropped. So transition tuple conversion for rows being inserted did
not get tested (I had manually tested though). So I have moved down
the drop statement.

>
> 13. Stray space at the end of the case statement:
>
> update range_parted set c = (case when c = 96 then 110 else c + 1 end
> ) where a = 'b' and b > 10 and c >= 96;

Done.

>
> 14. Stray space in the USING clause:
>
> create policy seeall ON range_parted as PERMISSIVE for SELECT using ( true);
Done

>
> 15. we -> we're
> -- part_a_10_a_20 to part_d_1_15, because we setting 'c' to an odd number.
Changed it to "we are"

>
> 16. The comment probably should be before the "update range_parted",
> not the "set session authorization":
> -- This should fail with RLS violation error while moving row from
> -- part_a_10_a_20 to part_d_1_15, because we setting 'c' to an odd number.
> set session authorization regress_range_parted_user;
> update range_parted set a = 'b', c = 151 where a = 'a' and c = 200;

Moved "set session authorization" statement above the comment.

>
> 17. trigger -> the trigger function
>
> -- part_d_1_15, because trigger makes 'c' value an even number.
>
> likewise in:
>
> -- This should fail with RLS violation error because trigger makes 'c' value
> -- an odd number.

I have made changes to the comment to make it clearer. Finally, the statement
contains phrase "trigger at the destination partition again makes it
an even number". With this phrase, "trigger function at destination
partition" looks odd. So I think "trigger at destination partition
makes ..." looks ok. It is implied that it is the trigger function
that is actually changing the value.

>
> 18. Why two RESET SESSION AUTHORIZATIONs?
>
> reset session authorization;
> drop trigger trig_d_1_15 ON part_d_1_15;
> drop function func_d_1_15();
> -- Policy expression contains SubPlan
> reset session authorization;

The second reset is actually in a different paragraph. The reason it's
there is to ensure we have reset it regardless of the earlier cleanup.

>
> 19. The following should be cleaned up in the final test that its used
> on rather than randomly after the next test after it:
>
> drop table mintab;

Done.

>
> 20. Comment is not worded very well:
>
> -- UPDATE which does not modify partition key of partitions that are
> chosen for update.
>
> Does "partitions that are chosen for update" mean "the UPDATE target"?

Actually it means the partitions participating in the update subplans,
i.e the unpruned ones.

I have modified the comment as :
-- Test update-partition-key, where the unpruned partitions do not have their
-- partition keys updated.

>
> I'm also not quite sure what the test is testing. In the past I've
> written tests that have a header comment as -- Ensure that <what the
> test is testing>. Perhaps if you can't think of what you're ensuring
> with the test, then the test might not be that worthwhile.

I am just testing that the update behaves sanely in the particular scenario.

BTW, it was a concious decision made that in this particular scenario,
we still conclude internally that update-tuple-routing is needed, and
do the tuple routing setup.

>
> 21. The following comment could be improved:
>
> -- Triggers can cause UPDATE row movement if it modified partition key.
>
> Might be better to write:
>
> -- Tests for BR UPDATE triggers changing the partition key.

Done

I have also done this following suggestion of yours :

>
> 22. In copy.c CopyFrom() you have the following code:
>
> /*
> * We might need to convert from the parent rowtype to the
> * partition rowtype.
> */
> map = proute->partition_tupconv_maps[leaf_part_index];
> if (map)
> {
> Relation partrel = resultRelInfo->ri_RelationDesc;
>
> tuple = do_convert_tuple(tuple, map);
>
> /*
> * We must use the partition's tuple descriptor from this
> * point on. Use a dedicated slot from this point on until
> * we're finished dealing with the partition.
> */
> slot = proute->partition_tuple_slot;
> Assert(slot != NULL);
> ExecSetSlotDescriptor(slot, RelationGetDescr(partrel));
> ExecStoreTuple(tuple, slot, InvalidBuffer, true);
> }
>
> Should this use ConvertPartitionTupleSlot() instead?

Attached v35 patch. Thanks.

Attachment Content-Type Size
update-partition-key_v35.patch application/octet-stream 119.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-01-10 15:07:38 Re: [HACKERS] Secondary index access optimizations
Previous Message Vik Fearing 2018-01-10 14:28:27 Re: [HACKERS] PATCH: psql tab completion for SELECT