Re: UPDATE of partition key

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UPDATE of partition key
Date: 2017-09-28 07:09:20
Message-ID: CAJ3gD9cC_gBtNZnwyea5f4OxgJbvZbLesPObVRagLPy=L9gcBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Below are some performance figures. Overall, there does not appear to
be a noticeable difference in the figures in partition key updates
with and without row movement (which is surprising), and
non-partition-key updates with and without the patch.

All the values are in milliseconds.

Configuration :

shared_buffers = 8GB
maintenance_work_mem = 4GB
synchronous_commit = off
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
log_line_prefix = '%t [%p] '
max_wal_size = 5GB
max_connections = 200

The attached files were used to create a partition tree made up of 16
partitioned tables, each containing 125 partitions. First half of the
2000 partitions are filled with 10 million rows. Update row movement
moves the data to the other half of the partitions.

gen.sql : Creates the partitions.
insert.data : This data file is uploaded here [1]. Used "COPY ptab
from '$PWD/insert.data' "
index.sql : Optionally, Create index on column d.

The schema looks like this :

CREATE TABLE ptab (a date, b int, c int, d int) PARTITION BY RANGE (a, b);

CREATE TABLE ptab_1_1 PARTITION OF ptab
for values from ('1900-01-01', 1) to ('1900-01-01', 7501)
PARTITION BY range (c);
CREATE TABLE ptab_1_1_1 PARTITION OF ptab_1_1
for values from (1) to (81);
CREATE TABLE ptab_1_1_2 PARTITION OF ptab_1_1
for values from (81) to (161);
..........
..........
CREATE TABLE ptab_1_2 PARTITION OF ptab
for values from ('1900-01-01', 7501) to ('1900-01-01', 15001)
PARTITION BY range (c);
..........
..........

On 20 September 2017 at 00:06, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I wonder how much more expensive it
> is to execute UPDATE root SET a = a + 1 WHERE a = 1 on a table with
> 1000 subpartitions with this patch than without, assuming the update
> succeeds in both cases.

UPDATE query used : UPDATE ptab set d = d + 1 where d = 1; -- where d
is not a partition key of any of the partitions.
This query updates 8 rows out of 10 million rows.
With HEAD : 2953.691 , 2862.298 , 2855.286 , 2835.879 (avg : 2876)
With Patch : 2933.719 , 2832.463 , 2749.979 , 2820.416 (avg : 2834)
(All the values are in milliseconds.)

> suppose you make a table with 1000 partitions each containing
> 10,000 tuples and update them all, and consider three scenarios: (1)
> partition key not updated but all tuples subject to non-HOT updates
> because the updated column is indexed, (2) partition key updated but
> no tuple movement required as a result, (3) partition key updated and
> all tuples move to a different partition.

Note that the following figures do not represent a consistent set of
figures. They keep on varying. For e.g. , even though the
partition-key-update without row movement appears to have taken a bit
more time with patch than with HEAD, a new set of tests run might even
end up the other way round.

NPK : 42089 (patch)
NPKI : 81593 (patch)
PK : 45250 (patch) , 44944 (HEAD)
PKR : 46701 (patch)

The above figures are in milliseconds. The explanations of the above
short-forms :

NPK :
Update of column that is not a partition-key.
UPDATE query used : UPDATE ptab set d = d + 1 ; This update *all* rows.

NPKI :
Update of column that is not a partition-key. And this column is
indexed (Used attached file index.sql).
UPDATE query used : UPDATE ptab set d = d + 1 ; This update *all* rows.

PK :
Update of partition key, but row movement does not occur. There are no
indexed columns.
UPDATE query used : UPDATE ptab set a = a + '1 hour'::interval ;

PKR :
Update of partition key, with all rows moved to other partitions.
There are no indexed columns.
UPDATE query used : UPDATE ptab set a = a + '2 years'::interval ;

[1] https://drive.google.com/open?id=0B_YJCqIAxKjeN3hMXzdDejlNYmlpWVJpaU9mWUhFRVhXTG5Z

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
gen.tar.gz application/x-gzip 13.8 KB
index.tar.gz application/x-gzip 5.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-09-28 07:13:58 Re: [HACKERS] PartitionSchemaData & partcollation (Re: pgsql: Associate partitioning information with each RelOptInfo.)
Previous Message Amit Langote 2017-09-28 07:00:13 Re: Use of RangeVar for partitioned tables in autovacuum