pg_dump versus hash partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andrew <pgsqlhackers(at)andrewrepp(dot)com>
Subject: pg_dump versus hash partitioning
Date: 2023-02-01 16:17:59
Message-ID: 1376149.1675268279@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over at [1] we have a complaint that dump-and-restore fails for
hash-partitioned tables if a partitioning column is an enum,
because the enum values are unlikely to receive the same OIDs
in the destination database as they had in the source, and the
hash codes are dependent on those OIDs. So restore tries to
load rows into the wrong leaf tables, and it's all a mess.
The patch approach proposed at [1] doesn't really work, but
what does work is to use pg_dump's --load-via-partition-root
option, so that the tuple routing decisions are all re-made.

I'd initially proposed that we force --load-via-partition-root
if we notice that we have hash partitioning on an enum column.
But the more I thought about this, the more comparable problems
came to mind:

1. Hash partitioning on text columns will likely fail if the
destination uses a different encoding.

2. Hash partitioning on float columns will fail if you use
--extra-float-digits to round off the values. And then
there's the fact that the behavior of strtod() might vary
across platforms.

3. Hash partitioning on floats is also endian-dependent,
and the same is likely true for some other types.

4. Anybody want to bet that complex types such as jsonb
are entirely free of similar hazards? (Yes, somebody
thought it'd be a good idea to provide jsonb_hash.)

In general, we've never thought that hash values are
required to be consistent across platforms.

That was leading me to think that we should force
--load-via-partition-root for any hash-partitioned table,
just to pre-emptively avoid these problems. But then
I remembered that

5. Range partitioning on text columns will likely fail if the
destination uses a different collation.

This is looking like a very large-caliber foot-gun, isn't it?
And remember that --load-via-partition-root acts at pg_dump
time, not restore. If all you have is a dump file with no
opportunity to go back and get a new one, and it won't load
into your new server, you have got a nasty problem.

I don't think this is an acceptable degree of risk, considering
that the primary use-cases for pg_dump involve target systems
that aren't 100.00% identical to the source.

So here's what I think we should actually do: make
--load-via-partition-root the default. We can provide a
switch to turn it off, for those who are brave or foolish
enough to risk that in the name of saving a few cycles,
but it ought to be the default.

Furthermore, I think we should make this happen in time for
next week's releases. I can write the patch easily enough,
but we need a consensus PDQ that this is what to do.

Anyone want to bikeshed on the spelling of the new switch?
I'm contemplating "--load-via-partition-leaf" or perhaps
"--no-load-via-partition-root".

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/765e5968-6c39-470f-95bf-7b14e6b9a1c0%40app.fastmail.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-02-01 16:27:26 Re: Progress report of CREATE INDEX for nested partitioned tables
Previous Message Yugo NAGATA 2023-02-01 16:15:25 Re: RLS makes COPY TO process child tables