Re: To make pg_dump and pg_restore parallel in processing limited number of LOs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: fkfk000 <fkfk000(at)126(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: To make pg_dump and pg_restore parallel in processing limited number of LOs
Date: 2025-05-18 14:46:02
Message-ID: 2260523.1747579562@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

fkfk000 <fkfk000(at)126(dot)com> writes:
> However, if a user only has a limited number of LOs, like 1k, which seems sensible as LOs should be large. In this scenario, there would be only 1 process work. Therefore, I'm proposing a change. Instead of using a fixed number to group LOs with same owner/ACL pair, we can use a SQL query to distribute each pair into a fixed number of batches. For each batch, it would be assigned an ArchiveEntry. So, the workload for each pair could be distributed into processes even if there are only few numbers of LO.

I do not care for this idea. I think this behavior will seem
entirely random to most users. Also, you appear not to be thinking
at all about what will happen with huge numbers (millions) of blobs.
Forcing them all into a relatively small number of TOC entries will
break exactly the same cases that we intended to fix by breaking them
up into multiple TOC entries.

I'd rather do what's speculated in the existing comment:

* At some point we might want to make this user-controllable, but for now
* a hard-wired setting will suffice.

That would in particular allow people to split things up as finely
as one blob per TOC entry, which would be useful for selective-restore
purposes.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-05-18 14:52:20 Re: Possible regression in PG18 beta1
Previous Message Sadeq Dousti 2025-05-18 14:44:40 Re: Possible regression in PG18 beta1