Re: pg_dump versus hash partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew <pgsqlhackers(at)andrewrepp(dot)com>
Subject: Re: pg_dump versus hash partitioning
Date: 2023-02-01 20:34:52
Message-ID: 1412538.1675283692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... I like the
> fact that we have --load-via-partition-root, but it is a bit of a
> hack. You don't get a single copy into the partition root, you get one
> per child table -- and those COPY statements are listed as data for
> the partitions where the data lives now, not for the parent table. I
> am not completely sure whether there is a scenario where that's an
> issue, but it's certainly an oddity.

I spent a bit more time thinking about that, and while I agree that
it's an oddity, I don't see that it matters in the case of hash
partitioning. You would notice an issue if you tried to do a selective
restore of just one partition --- but under what circumstance would
that be a useful thing to do? By definition, under hash partitioning
there is no user-meaningful difference between different partitions.
Moreover, in the case at hand you would get constraint failures without
--load-via-partition-root, or tuple routing failures with it,
so what's the difference? (Unless you'd created all the partitions
to start with and were doing a selective restore of just one partition's
data, in which case the outcome is "fails" or "works" respectively.)

> Also, and I think pretty
> significantly, using --load-via-partition-root forces you to pay the
> overhead of rerouting every tuple to the target partition whether you
> need it or not, which is potentially a large unnecessary expense.

Oddly, I always thought that we prioritize correctness over speed.
I don't mind having an option that allows people to select a less-safe
way of doing this, but I do think it's unwise for less-safe to be the
default, especially when it's something you can't fix after the fact.

What do you think of "--load-via-partition-root=on/off/auto", where
auto means "not with hash partitions" or the like? I'm still
uncomfortable about the collation aspect, but I'm willing to concede
that range partitioning is less likely to fail in this way than hash.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-02-01 20:37:26 Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security
Previous Message Nathan Bossart 2023-02-01 20:15:29 Re: recovery modules