Re: pg_dump versus hash partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew <pgsqlhackers(at)andrewrepp(dot)com>
Subject: Re: pg_dump versus hash partitioning
Date: 2023-02-01 21:14:10
Message-ID: CA+TgmoYoSoX8bN_yzabmRDQBNxEaWmjBKPnkZPMUDpPTBy=d4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 1, 2023 at 3:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I spent a bit more time thinking about that, and while I agree that
> it's an oddity, I don't see that it matters in the case of hash
> partitioning. You would notice an issue if you tried to do a selective
> restore of just one partition --- but under what circumstance would
> that be a useful thing to do? By definition, under hash partitioning
> there is no user-meaningful difference between different partitions.
> Moreover, in the case at hand you would get constraint failures without
> --load-via-partition-root, or tuple routing failures with it,
> so what's the difference? (Unless you'd created all the partitions
> to start with and were doing a selective restore of just one partition's
> data, in which case the outcome is "fails" or "works" respectively.)

I guess I was worried that pg_dump's dependency ordering stuff might
do something weird in some case that I'm not smart enough to think up.

> > Also, and I think pretty
> > significantly, using --load-via-partition-root forces you to pay the
> > overhead of rerouting every tuple to the target partition whether you
> > need it or not, which is potentially a large unnecessary expense.
>
> Oddly, I always thought that we prioritize correctness over speed.

That's a bit rich.

It seems to me that the job of pg_dump is to produce a dump that, when
reloaded on another system, recreates the same database state. That
means that we end up with all of the same objects, each defined in the
same way, and that all of the tables end up with all the same contents
that they had before. Here, you'd like to argue that it's perfectly
fine if we instead insert some of the rows into different tables than
where they were on the original system. Under normal circumstances, of
course, we wouldn't consider any such thing, because then we would not
be faithfully replicating the database state, which would be
incorrect. But here you want to argue that it's OK to create a
different database state because trying to recreate the same one would
produce an error and the user might not like getting an error so let's
just do something else instead and not even bother telling them.

As you have quite rightly pointed out, the --load-via-partition-root
behavior is useful for working around a variety of unfortunate things
that can happen. If a user is willing to say that getting a row into
one partition of some table is just as good as getting it into another
partition of that same table and that you don't mind paying the cost
associated with that, then that is something that we can do for that
user. But just as we normally prioritize correctness over speed, so
also do we normally throw errors when things aren't right instead of
silently accepting bad input. The partitions in this scenario are
tables that have constraints. If a dump contains a row that doesn't
satisfy some constraint on the table into which it is being loaded,
that's an error. Keep in mind that there's no rule that a user can't
query a partition directly.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-02-01 21:23:26 Re: recovery modules
Previous Message Melanie Plageman 2023-02-01 21:12:54 Re: heapgettup refactoring