Re: pg_dump versus hash partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew <pgsqlhackers(at)andrewrepp(dot)com>
Subject: Re: pg_dump versus hash partitioning
Date: 2023-02-01 18:23:26
Message-ID: 1396807.1675275806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Feb 1, 2023 at 11:18 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Over at [1] we have a complaint that dump-and-restore fails for
>> hash-partitioned tables if a partitioning column is an enum,
>> because the enum values are unlikely to receive the same OIDs
>> in the destination database as they had in the source, and the
>> hash codes are dependent on those OIDs.

> It seems to me that this is the root of the problem.

Well, that was what I thought too to start with, but I now think that
it is far too narrow-minded a view of the problem. The real issue
is something I said that you trimmed:

>> In general, we've never thought that hash values are
>> required to be consistent across platforms.

hashenum() is doing something that is perfectly fine in the context
of hash indexes, which have traditionally been our standard of how
reproducible hash values need to be. Hash partitioning has moved
those goalposts, and if there was any discussion of the consequences
of that, I didn't see it.

>> Furthermore, I think we should make this happen in time for
>> next week's releases. I can write the patch easily enough,
>> but we need a consensus PDQ that this is what to do.

> This seems extremely precipitous to me and I'm against it.

Yeah, it's been this way for awhile, so if there's debate then
we can let it go awhile longer.

> So I think we should be asking ourselves what we could do about the
> enum case specifically, rather than resorting to a bazooka.

My idea of solving this with a bazooka would be to deprecate hash
partitioning. Quite frankly that's where I think we will end up
someday, but I'm not going to try to make that argument today.

In the meantime, I think we need to recognize that hash values are
not very portable. I do not think we do our users a service by
letting them discover the corner cases the hard way.

I'd be willing to compromise on the intermediate idea of forcing
--load-via-partition-root just for hashed partitioning.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-02-01 18:24:48 Re: Progress report of CREATE INDEX for nested partitioned tables
Previous Message João Paulo Labegalini de Carvalho 2023-02-01 18:19:55 Re: Optimizing PostgreSQL with LLVM's PGO+LTO