pg_dump is broken for partition tablespaces

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: pg_dump is broken for partition tablespaces
Date: 2019-03-06 06:45:06
Message-ID: CAKJS1f_1c260nOt_vBJ067AZ3JXptXVRohDVMLEBmudX1YEx-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over on [1] Andres pointed out that the pg_dump support for the new to
PG12 tablespace inheritance feature is broken. This is the feature
added in ca4103025dfe26 to allow a partitioned table to have a
tablespace that acts as the default tablespace for newly attached
partitions. The idea being that you can periodically change the
default tablespace for new partitions to a tablespace that sits on a
disk partition with more free space without affecting the default
tablespace for normal non-partitioned tables. Anyway...

pg_dump is broken with this. Consider:

create tablespace newts location '/tmp/newts';
create table listp (a int) partition by list(a) tablespace newts;
create table listp1 partition of listp for values in(1) tablespace pg_default;
create table listp2 partition of listp for values in(2);

select relname,relkind,reltablespace from pg_class where relname like
'listp%' and relkind in('r','p') order by relname;
produces:

relname | relkind | reltablespace
---------+---------+---------------
listp | p | 16384
listp1 | r | 0
listp2 | r | 16384
(3 rows)

after dump/restore:

relname | relkind | reltablespace
---------+---------+---------------
listp | p | 16384
listp1 | r | 16384
listp2 | r | 16384
(3 rows)

Here the tablespace for listp1 was inherited from listp, but we really
should have restored this to use pg_default like was specified.

The reason this occurs is that in pg_dump we do:

SET default_tablespace = '';

CREATE TABLE public.listp1 PARTITION OF public.listp
FOR VALUES IN (1);

so, since we're creating the table initially as a partition the logic
that applies the default partition from the parent kicks in.

If we instead did:

CREATE TABLE public.listp1 (a integer
);

ALTER TABLE public.list1 ATTACH PARTITION public.listp FOR VALUES IN (1);

then we'd have no issue, as tablespace will be set to whatever
default_tablespace is set to.

Partitioned indexes have this similar inherit tablespace from parent
feature, so ca4103025dfe26 was intended to align the behaviour of the
two. Partitioned indexes happen not to suffer from the same issue as
the indexes are attached after their creation similar to what I
propose above.

Can anyone see any fundamental reason that we should not create a
partitioned table by doing CREATE TABLE followed by ATTACH PARTITION?
If not, I'll write a patch that fixes it that way.

As far as I can see, the biggest fundamental difference with doing
things this way will be that the column order of partitions will be
preserved, where before it would inherit the order of the partitioned
table. I'm a little unsure if doing this column reordering was an
intended side-effect or not.

[1] https://www.postgresql.org/message-id/20190305060804.jv5mz4slrnelh3jy@alap3.anarazel.de

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-06 06:48:25 Re: BUG #15668: Server crash in transformPartitionRangeBounds
Previous Message Tatsuro Yamada 2019-03-06 06:38:54 Re: [HACKERS] CLUSTER command progress monitor