Re: pg_restore scan

From: R Wahyudi <rwahyudi(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_restore scan
Date: 2025-09-18 23:45:22
Message-ID: CALWQLzRr34aZ+Dk_vhvz2VYtFjsChe1PQp3Nc_F9ENKzw3c7Tg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

>> The input must be a regular file or directory (not, for example, a pipe
or standard input).

Thanks again for the pointer!

I successfully ran a parallel restore with no warnings presented.
I didn't really pay attention to how the dump was taken until I
accidentally stumbled upon your post.

Regards,
Rianto

On Fri, 19 Sept 2025 at 07:45, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

>
>
> On 9/18/25 2:36 PM, R Wahyudi wrote:
> > I've been given a database dump file daily and I've been asked to
> > restore it.
> > I tried everything I could to speed up the process, including using -j
> 40.
> >
> > I discovered that at the later stage of the restore process, the
> > following behaviour repeated a few times :
> > 40 x pg_restore process doing 100% CPU
> > 40 x postgres process doing COPY but using 0% CPU
> > ..... and zero disk write activity
> >
> > I don't see this behaviour when restoring the database that was dumped
> > with -Fd.
> > Also with an un-piped backup file, I can restore a specific table
> > without having to wait for hours.
>
> From the docs:
>
> https://www.postgresql.org/docs/current/app-pgrestore.html
>
> "
> -j number-of-jobs
>
> Only the custom and directory archive formats are supported with this
> option. The input must be a regular file or directory (not, for example,
> a pipe or standard input). Also, multiple jobs cannot be used together
> with the option --single-transaction.
> "
>
>
> >
> >
> > --
> >
> >
> >
> >
> >
> > On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 9/18/25 05:58, R Wahyudi wrote:
> > > Hi All,
> > >
> > > Thanks for the quick and accurate response! I never been so happy
> > > seeing IOwait on my system!
> >
> > Because?
> >
> > What did you find?
> >
> > >
> > > I might be blind as I can't find information about 'offset' in
> > pg_dump
> > > documentation.
> > > Where can I find more info about this?
> >
> > It is not in the user documentation.
> >
> > From the thread Ron referred to, there is an explanation here:
> >
> > https://www.postgresql.org/message-
> > id/366773.1756749256%40sss.pgh.pa.us <https://www.postgresql.org/
> > message-id/366773.1756749256%40sss.pgh.pa.us>
> >
> > I believe the actual code, for the -Fc format, is in
> pg_backup_custom.c
> > here:
> >
> > https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/
> > pg_backup_custom.c#L723 <https://github.com/postgres/postgres/blob/
> > master/src/bin/pg_dump/pg_backup_custom.c#L723>
> >
> > Per comment at line 755:
> >
> > "
> > If possible, re-write the TOC in order to update the data offset
> > information. This is not essential, as pg_restore can cope in most
> > cases without it; but it can make pg_restore significantly faster
> > in some situations (especially parallel restore). We can skip this
> > step if we're not dumping any data; there are no offsets to update
> > in that case.
> > "
> >
> > >
> > > Regards,
> > > Rianto
> > >
> > > On Wed, 17 Sept 2025 at 13:48, Ron Johnson
> > <ronljohnsonjr(at)gmail(dot)com <mailto:ronljohnsonjr(at)gmail(dot)com>
> > > <mailto:ronljohnsonjr(at)gmail(dot)com
> > <mailto:ronljohnsonjr(at)gmail(dot)com>>> wrote:
> > >
> > >
> > > PG 17 has integrated zstd compression, while --
> > format=directory lets
> > > you do multi-threaded dumps. That's much faster than a
> single-
> > > threaded pg_dump into a multi-threaded compression program.
> > >
> > > (If for _Reasons_ you require a single-file backup, then tar
> the
> > > directory of compressed files using the --remove-files
> option.)
> > >
> > > On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi
> > <rwahyudi(at)gmail(dot)com <mailto:rwahyudi(at)gmail(dot)com>
> > > <mailto:rwahyudi(at)gmail(dot)com <mailto:rwahyudi(at)gmail(dot)com>>>
> wrote:
> > >
> > > Sorry for not including the full command - yes , its
> > piping to a
> > > compression command :
> > > | lbzip2 -n <threadsforbzipgoeshere>--best >
> > <filenamegoeshere>
> > >
> > >
> > > I think we found the issue! I'll do further testing and
> > see how
> > > it goes !
> > >
> > >
> > >
> > >
> > >
> > > On Wed, 17 Sept 2025 at 11:02, Ron Johnson
> > > <ronljohnsonjr(at)gmail(dot)com <mailto:ronljohnsonjr(at)gmail(dot)com>
> > <mailto:ronljohnsonjr(at)gmail(dot)com <mailto:ronljohnsonjr(at)gmail(dot)com>>>
> > wrote:
> > >
> > > So, piping or redirecting to a file? If so, then
> > that's the
> > > problem.
> > >
> > > pg_dump directly to a file puts file offsets in the
> TOC.
> > >
> > > This how I do custom dumps:
> > > cd $BackupDir
> > > pg_dump -Fc --compress=zstd:long -v -d${db} -f
> ${db}.dump
> > > 2> ${db}.log
> > >
> > > On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
> > > <rwahyudi(at)gmail(dot)com <mailto:rwahyudi(at)gmail(dot)com>
> > <mailto:rwahyudi(at)gmail(dot)com <mailto:rwahyudi(at)gmail(dot)com>>> wrote:
> > >
> > > pg_dump was done using the following command :
> > > pg_dump -Fc -Z 0 -h <host> -U <user> -w -d
> <database>
> > >
> > > On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
> > > <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>> wrote:
> > >
> > > On 9/16/25 15:25, R Wahyudi wrote:
> > > >
> > > > I'm trying to troubleshoot the slowness
> issue
> > > with pg_restore and
> > > > stumbled across a recent post about
> pg_restore
> > > scanning the whole file :
> > > >
> > > > > "scanning happens in a very inefficient
> > way,
> > > with many seek calls and
> > > > small block reads. Try strace to see them.
> > This
> > > initial phase can take
> > > > hours in a huge dump file, before even
> > starting
> > > any actual restoration."
> > > > see : https://www.postgresql.org/message-
> > id/ <https://www.postgresql.org/message-id/>
> > > E48B611D-7D61-4575-A820- <https://
> > > www.postgresql.org/message-id/E48B611D-7D61-4575-A820- <http://
> > www.postgresql.org/message-id/E48B611D-7D61-4575-A820->>
> > > > B2C3EC2E0551%40gmx.net <http://40gmx.net>
> > <http://40gmx.net <http://40gmx.net>>
> > > <https://www.postgresql.org/message-id/
> > <https://www.postgresql.org/message-id/> <https://
> > > www.postgresql.org/message-id/ <http://www.postgresql.org/
> > message-id/>>
> > > > E48B611D-7D61-4575-A820-
> > B2C3EC2E0551%40gmx.net <http://40gmx.net>
> > > <http://40gmx.net <http://40gmx.net>>>
> > >
> > > This was for pg_dump output that was streamed
> > to a
> > > Borg archive and as
> > > result had no object offsets in the TOC.
> > >
> > > How are you doing your pg_dump?
> > >
> > >
> > >
> > > --
> > > Adrian Klaver
> > > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> > > <mailto:adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>>
> > >
> > >
> > >
> > > --
> > > Death to <Redacted>, and butter sauce.
> > > Don't boil me, I'm still alive.
> > > <Redacted> lobster!
> > >
> > >
> > >
> > > --
> > > Death to <Redacted>, and butter sauce.
> > > Don't boil me, I'm still alive.
> > > <Redacted> lobster!
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message 張宸瑋 2025-09-19 02:55:58 PostgreSQL Account and Object Timestamp Logging
Previous Message Adrian Klaver 2025-09-18 21:45:17 Re: pg_restore scan