partitioning and identity column

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: partitioning and identity column
Date: 2023-10-27 11:32:11
Message-ID: CAExHW5uOykuTC+C6R1yDSp=o8Q83jr8xJdZxgPkxfZ1Ue5RRGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
Reading [1] I have been experimenting with behaviour of identity
columns and serial column in case of partitioned tables. My
observations related to serial column can be found at [2]. This email
is about identity column behaviour with partitioned tables. Serial and
identity columns have sufficiently different behaviour and
implementation to have separate discussions on their behaviour. I
don't want to mix this with [1] since that thread is about replacing
serial with identity. The discussion in this and [2] will be useful to
drive [1] forward.

Behaviour 1
=========
If a partitioned table has an identity column, the partitions do not
inherit identity property.
#create table tpart (a int generated always as identity primary key,
src varchar) partition by range(a);
#create table t_p1 partition of tpart for values from (1) to (3);
#\d tpart
Partitioned table "public.tpart"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+------------------------------
a | integer | | not null | generated always
as identity
src | character varying | | |
Partition key: RANGE (a)
Indexes:
"tpart_pkey" PRIMARY KEY, btree (a)
Number of partitions: 2 (Use \d+ to list them.)

#\d t_p1
Table "public.t_p1"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
a | integer | | not null |
src | character varying | | |
Partition of: tpart FOR VALUES FROM (1) TO (3)
Indexes:
"t_p1_pkey" PRIMARY KEY, btree (a)

Notice that the default column of t_p1. This means that a direct
INSERT into partition will fail if it does not specify value for the
identity column. As a consequence such a value may conflict with an
existing value or a future value of the identity column. In the
example, the identity column is a primary key and also a partition
key, thus the conflict would result in an error. But when it's not a
partition key (and hence a primary key), it will just allow those
conflicting values.

Behaviour 2
=========
If a table being attached as a partition to a partitioned table and
both of them have column with same name as identity column, the ATTACH
succeeds and allow both tables to use different sequences.
#create table t_p5 (a int primary key, b int generated always as
identity, src varchar);
#alter table tpart attach partition t_p5 for values from (7) to (9);
#\d t_p5
Table "public.t_p5"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+------------------------------
a | integer | | not null |
b | integer | | not null | generated always
as identity
src | character varying | | |
Partition of: tpart FOR VALUES FROM (7) TO (9)
Indexes:
"t_p5_pkey" PRIMARY KEY, btree (a)

As a consequence a direct INSERT into the partition will result in a
value for identity column which conflicts with an existing value or a
future value in the partitioned table. Again, if the identity column
in a primary key (and partition key), the conflicting INSERT will
fail. But otherwise, the conflicting values will go unnoticed.

I consulted Vik Fearing, offline, about SQL standard's take on
identity columns in partitioned table. SQL standard does not specify
partitioned tables as a separate entity. Thus a partitioned table is
at par with a regular table. Hence an identity column in partitioned
table should share the same identity space across all the partitions.

Behaviour 3
=========
We allow identity column to be added to a partitioned table which
doesn't have partitions but we disallow adding identity column to a
partitioned table which has partitions.
#create table tpart (a int primary key,
src varchar) partition by range(a);
#create table t_p1 partition of tpart for values from (1) to (3);
#alter table tpart add column b int generated always as identity;
ERROR: cannot recursively add identity column to table that has child tables

I don't understand why is that prohibited. If we allow partitions to
be added to a partitioned table with identity column, we should allow
an identity column to be added to a partitioned table with partitions.

Behaviour 4
=========
Even though we don't allow an identity column to be added to a
partitioned table with partitions, we allow an existing column to be
converted to an identity column in such a table.

#create table tpart (a int primary key,
src varchar) partition by range(a);
#create table t_p1 partition of tpart for values from (1) to (3);
#create table t_p2 partition of tpart for values from (3) to (5);
#alter table tpart alter column a add generated always as identity;

#\d tpart
Partitioned table "public.tpart"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+------------------------------
a | integer | | not null | generated always
as identity
src | character varying | | |
Partition key: RANGE (a)
Indexes:
"tpart_pkey" PRIMARY KEY, btree (a)
Number of partitions: 2 (Use \d+ to list them.)

#\d t_p1
Table "public.t_p1"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
a | integer | | not null |
src | character varying | | |
Partition of: tpart FOR VALUES FROM (1) TO (3)
Indexes:
"t_p1_pkey" PRIMARY KEY, btree (a)

Behaviour 3 and 4 are conflicting with each other themselves.

I think we should fix these anomalies as follows
1. Allow identity columns to be added to the partitioned table
irrespective of whether they have partitions of not.
2. Propagate identity property to partitions.
3. Use the same underlying sequence for getting default value of an
identity column when INSERTing directly in a partition.
4. Disallow attaching a partition with identity column.

1 will fix inconsistencies in Behaviour 3 and 4. 2 and 3 will fix
anomalies in Behaviour 1. 4 will fix Behaviour 2.

Note on point 3: The current implementation uses pg_depend to find the
sequence associated with the identity column. We don't necessarily
need to add dependencies for individual partitions though. Instead we
could use the dependency on the partitioned table itself. I haven't
checked feasibility of this option, but it makes things simpler esp.
for DETACH and DROP of partition.

Note on point 4: The proposal again simplifies DETACH and DROP. If we
decide to somehow coalesce the identity columns of partition and
partitioned table, it would make DETACH and DROP complex. Also if the
identity column of the partition being attached is not identity column
in partition table, INSERT on partition table would fail in case of
ALWAYS. Of course the risk is we will break backward compatibility.
But given that the current behaviour is quite erroneous, I doubt if
there are users relying on this behaviour.

Thoughts?

[1] https://www.postgresql.org/message-id/flat/70be435b-05db-06f2-7c01-9bb8ee2fccce%40enterprisedb.com
[2] https://www.postgresql.org/message-id/CAExHW5toAsjc7uwSeSzX6sgvktFxsv7pd606zP6DnTX7Y6O4jg@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-10-27 11:35:39 Re: pg_upgrade and logical replication
Previous Message shveta malik 2023-10-27 10:34:16 Re: Synchronizing slots from primary to standby