Re: pg_dump versus hash partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew <pgsqlhackers(at)andrewrepp(dot)com>
Subject: Re: pg_dump versus hash partitioning
Date: 2023-02-01 22:08:37
Message-ID: 1422349.1675289317@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It seems to me that the job of pg_dump is to produce a dump that, when
> reloaded on another system, recreates the same database state. That
> means that we end up with all of the same objects, each defined in the
> same way, and that all of the tables end up with all the same contents
> that they had before.

No, its job is to produce *logically* the same state. We don't expect
it to produce, say, the same CTID value that each row had before ---
if you want that, you use pg_upgrade or physical replication, not
pg_dump. There are fairly strong constraints on how identical we
can make the results given that we're not replicating physical state.
And that's not even touching the point that frequently what the user
is after is not an identical copy anyway.

> Here, you'd like to argue that it's perfectly
> fine if we instead insert some of the rows into different tables than
> where they were on the original system.

I can agree with that argument for range or list partitioning, where
the partitions have some semantic meaning to the user. I don't buy it
for hash partitioning. It was an implementation artifact to begin
with that a given row ended up in partition 3 not partition 11, so why
would users care which partition it ends up in after a dump/reload?
If they think there is a difference between the partitions, they need
education.

> ... But just as we normally prioritize correctness over speed, so
> also do we normally throw errors when things aren't right instead of
> silently accepting bad input. The partitions in this scenario are
> tables that have constraints.

Again, for hash partitioning those constraints are implementation
artifacts not something that users should have any concern with.

> Keep in mind that there's no rule that a user can't
> query a partition directly.

Sure, and in the case of a hash partition, what he is going to
get is an implementation-dependent subset of the rows. I use
"implementation-dependent" here in the same sense that the SQL
standard does, which is "there is a rule but the implementation
doesn't have to tell you what it is". In particular, we are not
bound to make the subset be the same on different installations.
We already didn't promise that, because of issues like endianness.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-02-01 22:12:41 Re: pg_dump versus hash partitioning
Previous Message Robert Haas 2023-02-01 21:52:30 Re: pg_dump versus hash partitioning