From: | Ron Johnson <ronljohnsonjr(at)gmail(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: pg_restore scan |
Date: | 2025-09-19 04:06:09 |
Message-ID: | CANzqJaBDMG+94DROJonQfG0S2782RQYCVQSzHXFKiZbK+qb=3w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Sep 18, 2025 at 5:37 PM R Wahyudi <rwahyudi(at)gmail(dot)com> wrote:
> I've been given a database dump file daily and I've been asked to restore
> it.
> I tried everything I could to speed up the process, including using -j 40.
>
> I discovered that at the later stage of the restore process, the
> following behaviour repeated a few times :
> 40 x pg_restore process doing 100% CPU
>
Threads are not magic. IO and memory limitations still exist.
> 40 x postgres process doing COPY but using 0% CPU
> ..... and zero disk write activity
>
> I don't see this behaviour when restoring the database that was dumped
> with -Fd.
> Also with an un-piped backup file, I can restore a specific table without
> having to wait for hours.
>
We explained this three days ago. Heck, it's in this very email. Click
on "the three dots", scroll down a bit.
> On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
>
>> On 9/18/25 05:58, R Wahyudi wrote:
>> > Hi All,
>> >
>> > Thanks for the quick and accurate response! I never been so happy
>> > seeing IOwait on my system!
>>
>> Because?
>>
>> What did you find?
>>
>> >
>> > I might be blind as I can't find information about 'offset' in pg_dump
>> > documentation.
>> > Where can I find more info about this?
>>
>> It is not in the user documentation.
>>
>> From the thread Ron referred to, there is an explanation here:
>>
>> https://www.postgresql.org/message-id/366773.1756749256%40sss.pgh.pa.us
>>
>> I believe the actual code, for the -Fc format, is in pg_backup_custom.c
>> here:
>>
>>
>> https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_backup_custom.c#L723
>>
>> Per comment at line 755:
>>
>> "
>> If possible, re-write the TOC in order to update the data offset
>> information. This is not essential, as pg_restore can cope in most
>> cases without it; but it can make pg_restore significantly faster
>> in some situations (especially parallel restore). We can skip this
>> step if we're not dumping any data; there are no offsets to update
>> in that case.
>> "
>>
>> >
>> > Regards,
>> > Rianto
>> >
>> > On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr(at)gmail(dot)com
>> > <mailto:ronljohnsonjr(at)gmail(dot)com>> wrote:
>> >
>> >
>> > PG 17 has integrated zstd compression, while --format=directory lets
>> > you do multi-threaded dumps. That's much faster than a single-
>> > threaded pg_dump into a multi-threaded compression program.
>> >
>> > (If for _Reasons_ you require a single-file backup, then tar the
>> > directory of compressed files using the --remove-files option.)
>> >
>> > On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi(at)gmail(dot)com
>> > <mailto:rwahyudi(at)gmail(dot)com>> wrote:
>> >
>> > Sorry for not including the full command - yes , its piping to a
>> > compression command :
>> > | lbzip2 -n <threadsforbzipgoeshere>--best >
>> <filenamegoeshere>
>> >
>> >
>> > I think we found the issue! I'll do further testing and see how
>> > it goes !
>> >
>> >
>> >
>> >
>> >
>> > On Wed, 17 Sept 2025 at 11:02, Ron Johnson
>> > <ronljohnsonjr(at)gmail(dot)com <mailto:ronljohnsonjr(at)gmail(dot)com>>
>> wrote:
>> >
>> > So, piping or redirecting to a file? If so, then that's the
>> > problem.
>> >
>> > pg_dump directly to a file puts file offsets in the TOC.
>> >
>> > This how I do custom dumps:
>> > cd $BackupDir
>> > pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
>> > 2> ${db}.log
>> >
>> > On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
>> > <rwahyudi(at)gmail(dot)com <mailto:rwahyudi(at)gmail(dot)com>> wrote:
>> >
>> > pg_dump was done using the following command :
>> > pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>
>> >
>> > On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
>> > <adrian(dot)klaver(at)aklaver(dot)com
>> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>> >
>> > On 9/16/25 15:25, R Wahyudi wrote:
>> > >
>> > > I'm trying to troubleshoot the slowness issue
>> > with pg_restore and
>> > > stumbled across a recent post about pg_restore
>> > scanning the whole file :
>> > >
>> > > > "scanning happens in a very inefficient way,
>> > with many seek calls and
>> > > small block reads. Try strace to see them. This
>> > initial phase can take
>> > > hours in a huge dump file, before even starting
>> > any actual restoration."
>> > > see : https://www.postgresql.org/message-id/
>> > E48B611D-7D61-4575-A820- <https://
>> >
>> www.postgresql.org/message-id/E48B611D-7D61-4575-A820->
>> > > B2C3EC2E0551%40gmx.net <http://40gmx.net>
>> > <https://www.postgresql.org/message-id/ <https://
>> > www.postgresql.org/message-id/>
>> > > E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net
>> > <http://40gmx.net>>
>> >
>> > This was for pg_dump output that was streamed to a
>> > Borg archive and as
>> > result had no object offsets in the TOC.
>> >
>> > How are you doing your pg_dump?
>> >
>> >
>> >
>> > --
>> > Adrian Klaver
>> > adrian(dot)klaver(at)aklaver(dot)com
>> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>> >
>> >
>> >
>> > --
>> > Death to <Redacted>, and butter sauce.
>> > Don't boil me, I'm still alive.
>> > <Redacted> lobster!
>> >
>> >
>> >
>> > --
>> > Death to <Redacted>, and butter sauce.
>> > Don't boil me, I'm still alive.
>> > <Redacted> lobster!
>> >
>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com
>>
>
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
From | Date | Subject | |
---|---|---|---|
Next Message | Dominique Devienne | 2025-09-19 08:13:47 | Re: PostgreSQL Account and Object Timestamp Logging |
Previous Message | David G. Johnston | 2025-09-19 03:07:19 | Re: PostgreSQL Account and Object Timestamp Logging |