|From:||Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>|
|To:||David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>|
|Cc:||Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: pg_dump is broken for partition tablespaces|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2019-Apr-17, David Rowley wrote:
> On Mon, 15 Apr 2019 at 15:26, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > On 2019-Apr-15, David Rowley wrote:
> > > To be honest, if I'd done a better job of thinking through the
> > > implications of this tablespace inheritance in ca4103025d, then I'd
> > > probably have not bothered submitting a patch for it. We could easily
> > > revert that, but we'd still be left with the same behaviour in
> > > partitioned indexes, which is in PG11.
> > Well, I suppose if we do decide to revert it for tables, we should do it
> > for both tables and indexes. But as I said, I'm not yet convinced that
> > this is the best way forward.
> Ok. Any ideas or suggestions on how we move on from here? It seems
> like a bit of a stalemate.
Well, here's my proposed patch. I'm now fairly happy with how this
looks now, concerning partitioned tables.
This is mostly what was already discussed:
1. pg_dump now uses regular CREATE TABLE followed by ALTER TABLE / ATTACH
PARTITION when creating partitions, rather than CREATE TABLE
PARTITION OF. pg_dump --binary-upgrade was already doing that, so
this part mostly removes some code. In order to make the partitions
reach the correct tablespace, the "default_tablespace" GUC is used.
No TABLESPACE clause is added to the dump. This is David's patch
near the start of the thread.
2. When creating a partition using the CREATE TABLE PARTITION OF syntax,
the TABLESPACE clause has highest precedence; if that is not given,
the partitioned table's tablespace is used; if that is set to 0 (the
default), default_tablespace is used; if that's set to empty or a
nonexistant tablespace, the database's default tablespace is used.
This is (I think) what Andres proposed in
3. Partitioned relations can have the database tablespace in
pg_class.reltablespace, as opposed to storage-bearing relations which
cannot. This is useful to be able to put partitions in the database
tablespace even if the default_tablespace is set to something else.
4. For partitioned tables, ALTER TABLE .. SET TABLESPACE DEFAULT is
available as suggested by David, which makes future partition
creations target default_tablespace or the database's tablespace.
5. Recreating indexes during table-rewriting ALTER TABLE resulted in
broken indexes. We already had some adhesive tape in place to make
that work for regular indexes (commit bd673e8e864a); my approach to
fix it for partitioned indexes is to temporarily reset
default_tablespace to empty.
As for Tom's question in https://firstname.lastname@example.org :
> It's possible that Alvaro's patch is also broken, but I haven't had time
> to review it. The immediate question is what happens when somebody makes
> a partitioned table in template1 and then does CREATE DATABASE with a
> tablespace option. Does the partitioned table end up in the same
> tablespace as ordinary tables do?
Note that partitioned don't have any files, so they don't end up
anywhere; when a partition is created, the target tablespace is
determined using four rules instead of three (see #2 above) so yes, they
do end up in the same places as ordinary tables. Note that even if you
do put a partitioned table in some tablespace, you will not later run
afoul of this check:
errmsg("cannot assign new default tablespace \"%s\"",
errdetail("There is a conflict because database \"%s\" already has some tables in this tablespace.",
because that check uses ReadDir() and raise an error if any entry is
found; but partitioned tables don't have files in directory, so nothing
happens. (Of course, it will hit if you have a partition in that
tablespace, but that's also the case for regular tables.)
(I propose to commit both 0002 and 0003 as a single unit that fixes the
whole problem, rather than attacking backend and pg_dump separately.
0001 appears logically separate and I would push on its own.)
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Fabien COELHO||2019-04-17 21:46:09||Re: pgsql: Fix unportable code in pgbench.|
|Previous Message||Tom Lane||2019-04-17 21:30:36||pgsql: Fix unportable code in pgbench.|