Quick Links

UPDATE of partition key

From:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	UPDATE of partition key
Date:	2017-02-13 12:01:56
Message-ID:	CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Currently, an update of a partition key of a partition is not allowed,
since it requires to move the row(s) into the applicable partition.

Attached is a WIP patch (update-partition-key.patch) that removes this
restriction. When an UPDATE causes the row of a partition to violate
its partition constraint, then a partition is searched in that subtree
that can accommodate this row, and if found, the row is deleted from
the old partition and inserted in the new partition. If not found, an
error is reported.

There are a few things that can be discussed about :

1. We can run an UPDATE using a child partition at any level in a
nested partition tree. In such case, we should move the row only
within that child subtree.

For e.g. , in a tree such as :
tab ->
t1 ->
t1_1
t1_2
t2 ->
t2_1
t2_2

For "UPDATE t2 set col1 = 'AAA' " , if the modified tuple does not fit
in t2_1 but can fit in t1_1, it should not be moved to t1_1, because
the UPDATE is fired using t2.

2. In the patch, as part of the row movement, ExecDelete() is called
followed by ExecInsert(). This is done that way, because we want to
have the ROW triggers on that (sub)partition executed. If a user has
explicitly created DELETE and INSERT BR triggers for this partition, I
think we should run those. While at the same time, another question
is, what about UPDATE trigger on the same table ? Here again, one can
argue that because this UPDATE has been transformed into a
DELETE-INSERT, we should not run UPDATE trigger for row-movement. But
there can be a counter-argument. For e.g. if a user needs to make sure
about logging updates of particular columns of a row, he will expect
the logging to happen even when that row was transparently moved. In
the patch, I have retained the firing of UPDATE BR trigger.

3. In case of a concurrent update/delete, suppose session A has locked
the row for deleting it. Now a session B has decided to update this
row and that is going to cause row movement, which means it will
delete it first. But when session A is finished deleting it, session B
finds that it is already deleted. In such case, it should not go ahead
with inserting a new row as part of the row movement. For that, I have
added a new parameter 'already_delete' for ExecDelete().

Of course, this still won't completely solve the concurrency anomaly.
In the above case, the UPDATE of Session B gets lost. May be, for a
user that does not tolerate this, we can have a table-level option
that disallows row movement, or will cause an error to be thrown for
one of the concurrent session.

4. The ExecSetupPartitionTupleRouting() is re-used for routing the row
that is to be moved. So in ExecInitModifyTable(), we call
ExecSetupPartitionTupleRouting() even for UPDATE. We can also do this
only during execution time for the very first time we find that we
need to do a row movement. I will think over that, but I am thinking
it might complicate things, as compared to always doing the setup for
UPDATE. WIll check on that.

5. Regarding performance testing, I have compared the results of
row-movement with partition versus row-movement with inheritance tree
using triggers. Below are the details :

Schema :

CREATE TABLE ptab (a date, b int, c int);

CREATE TABLE ptab (a date, b int, c int) PARTITION BY RANGE (a, b);

CREATE TABLE ptab_1_1 PARTITION OF ptab
for values from ('1900-01-01', 1) to ('1900-01-01', 101)
PARTITION BY range (c);

CREATE TABLE ptab_1_1_1 PARTITION OF ptab_1_1
for values from (1) to (51);
CREATE TABLE ptab_1_1_2 PARTITION OF ptab_1_1
for values from (51) to (101);
.....
.....
CREATE TABLE ptab_1_1_n PARTITION OF ptab_1_1
for values from (n) to (n+m);

......
......

CREATE TABLE ptab_5_n PARTITION OF ptab
for values from ('1905-01-01', 101) to ('1905-01-01', 201)
PARTITION BY range (c);

CREATE TABLE ptab_1_2_1 PARTITION OF ptab_1_2
for values from (1) to (51);
CREATE TABLE ptab_1_2_2 PARTITION OF ptab_1_2
for values from (51) to (101);
.....
.....
CREATE TABLE ptab_1_2_n PARTITION OF ptab_1_2
for values from (n) to (n+m);
.....
.....

Similarly for inheritance :

CREATE TABLE ptab_1_1
(constraint check_ptab_1_1 check (a = '1900-01-01' and b >= 1 and b <
8)) inherits (ptab);
create trigger brutrig_ptab_1_1 before update on ptab_1_1 for each row
execute procedure ptab_upd_trig();
CREATE TABLE ptab_1_1_1
(constraint check_ptab_1_1_1 check (c >= 1 and c < 51))
inherits (ptab_1_1);
create trigger brutrig_ptab_1_1_1 before update on ptab_1_1_1 for each
row execute procedure ptab_upd_trig();
CREATE TABLE ptab_1_1_2
(constraint check_ptab_1_1_2 check (c >= 51 and c < 101))
inherits (ptab_1_1);

create trigger brutrig_ptab_1_1_2 before update on ptab_1_1_2 for each
row execute procedure ptab_upd_trig();

I had to have a BR UPDATE trigger on each of the leaf tables.

Attached is the BR trigger function update_trigger.sql. There it
generates the table name assuming a fixed pattern of distribution of
data over the partitions. It first deletes the row and then inserts a
new one. I also skipped the deletion part, and it did not show any
significant change in results.

parts partitioned inheritance no. of rows subpartitions
===== =========== =========== =========== =============

500 10 sec 3 min 02 sec 1,000,000 0
1000 10 sec 3 min 05 sec 1,000,000 0
1000 1 min 38sec 30min 50 sec 10,000,000 0
4000 28 sec 5 min 41 sec 1,000,000 10

part : total number of partitions including subparitions if any.
partitioned : Partitions created using declarative syntax.
inheritence : Partitions created using inheritence , check constraints
and insert,update triggers.
subpartitions : Number of subpartitions for each partition (in a 2-level tree)

Overall the UPDATE in partitions is faster by 10-20 times compared
with inheritance with triggers.

The UPDATE query moved all of the rows into another partition. It was
something like this :
update ptab set a = '1949-01-1' where a <= '1924-01-01'

For a plain table with 1,000,000 rows, the UPDATE took 8 seconds, and
with 10,000,000 rows, it took 1min 32sec.

In general, for both partitioned and inheritence tables, the time
taken linearly rose with the number of rows.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment	Content-Type	Size
update_trigger.sql	application/octet-stream	794 bytes
update-partition-key.patch	application/octet-stream	5.8 KB

Responses

Re: UPDATE of partition key at 2017-02-14 14:44:11 from Robert Haas
Re: UPDATE of partition key at 2017-02-14 16:54:01 from David Fetter
Re: UPDATE of partition key at 2017-02-16 10:47:28 from Greg Stark
Re: UPDATE of partition key at 2017-02-23 10:32:55 from Amit Langote

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2017-02-13 12:17:22	Re: Parallel Index Scans
Previous Message	Fabien COELHO	2017-02-13 12:01:11	Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)