Partitioning, Identity and Uniqueness (given pg 16 changes)

From: Darryl Green <darryl(dot)green(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Partitioning, Identity and Uniqueness (given pg 16 changes)
Date: 2024-02-19 03:32:24
Message-ID: CADJ=SENg81s5fLEEkV89ufifi-gpsaWPgzttTE=y6Qki9f9T+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,
I have a table that is capturing what is, basically, time series data. We
use identity column as primary key.

I'm considering the potential to partition it by the "source" of the time
series (100s .. 1000s of sources) as often (but not always) use of this
table is to get the time series for one source.
I note that in Postgresql 16 identity column handling in partitioned tables
has been aligned to the view that the partitioned table as a whole is a
single relation (and so a unique identity across partitions). This makes
sense.
Now, I want to partition but not by that identity column. But I have to
include the identity in the partitioning key (if I have it as the pk) so
that the uniqueness can be checked. But - using a (shared across all
partition tables) identity sequence should be enough to ensure uniqueness
without this.

Currently I need to define my partitioned table as something like

#create table tpart (id int generated always as identity not null, src int,
data varchar) partition by hash(src);
#create table t_p001 partition of tpart for values with (modulus 32,
remainder 0) primary key id;
#create table t_p002 partition of tpart for values with (modulus 32,
remainder 1) primary key id;
#...
#create table t_p031 partition of tpart for values with (modulus 32,
remainder 31) primary key id;

Note there is no pk on the partitioned table, but the (in practice unique
across all partitions) id identity column is pk of each / every partition.

The src column is not indexed at all in this example (it could be of
course, but one of the things we are trying to get out of this is high
insert performance for concurrent inserts from multiple venues).

Two things:

1) Is my attempt to improve performance of insert AND (as the number of
sources goes up, as it has) querying which is invariably by id but it is
easy to include "and src = x" in the majority of usage (the other common
case is to query across all sources, asking for a set is unusual) a
reasonable approach?

2) It would be nice to be able to specify the id as pk on the table being
partitioned (as it was in the non-partitioned definition of the table) once
to document and enforce that the partitions simply inherit the id pk. This
would seem only to need the "partition by" validation to allow a column not
mentioned in partition by clause to be defined as pk or unique if and only
if the pk/unique column is an identity column. Not a big deal but is this
practical/valid?

Thanks
Darryl.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message jian he 2024-02-19 03:43:39 Re: Emitting JSON to file using COPY TO
Previous Message Alec Lazarescu 2024-02-18 22:20:07 Re: Partitioning options