Re: Adding pg_dump flag for parallel export to pipes

From: solai v <solai(dot)cdac(at)gmail(dot)com>
To: Nitin Motiani <nitinmotiani(at)google(dot)com>
Cc: Hannu Krosing <hannuk(at)google(dot)com>, Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding pg_dump flag for parallel export to pipes
Date: 2026-06-12 05:20:46
Message-ID: CAF0whucxFZp2sDMJL4UtwPZrDUXKCJk9+5FPzXNXn7yRZ4hdzQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

On Tue, Jun 9, 2026 at 12:04 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
>
> On Fri, May 22, 2026 at 4:04 PM solai v <solai(dot)cdac(at)gmail(dot)com> wrote:
> > I then tested the patch introducing --pipe support. The feature is
> > quite useful for modern workflows where users want to stream dump
> > output directly to compression or upload pipelines without relying on
> > intermediate storage.
>
> Thank you for the feedback.
>
> > gzip: dump.gz: unexpected end of file
> > This suggests that concurrent writes to a shared output target are not
> > coordinated and can result in invalid dumps. It would be helpful to
> > clarify expected usage patterns here. For example: whether users are
> > expected to generate distinct outputs per worker, or whether
> > safeguards should be implemented to prevent multiple workers from
> > writing to the same destination.
>
> I added a warning for cases where the pipe command provided with
> parallel dump and restore doesn't contain a `%f`. We can also add it
> to the documentation. Let me know what you think.
>
> > scenarios I observed backend logs such as:
> > FATAL: connection to client lost
> > Broken pipe
> > While this is expected when the pipe terminates prematurely, it may be
> > worth considering whether error messaging or cleanup behavior can be
> > made clearer from the user perspective.
>
> I added the failed command to the error message. I'm not sure if we
> can do any auto-cleanup commands which succeeded.
>

Thank you for the updated patches. I reviewed and tested the latest
v17 patch series in my current cluster. I tested the new --pipe
functionality with both serial and parallel directory dumps and
correctly generated:
pipe_dump/
toc.dat
3475.dat
3476.dat
The same behavior was verified with parallel jobs (-j 4). I also
tested compression through an external command:
pg_dump -p 55432 -Fd -j 4 \
--pipe='gzip > pipe_dump/%f.gz' \
postgres
which successfully generated:
pipe_dump/
toc.dat.gz
3475.dat.gz
3476.dat.gz
without creating an intermediate dump directory.
I verified error handling (--pipe='invalid_cmd') as well and correctly
reported the error as "pg_dump: error: pipe command failed:
"invalid_cmd": command not found".
Similarly, --pipe='gzip | false' correctly propagated the child
process failure. I also performed an end-to-end validation of the new
functionality by restoring the generated archive using pg_restore
--pipe. The restore completed successfully, and row counts for the
test tables matched the original database confirming that dump
generation, compression, restore, and data integrity all worked
correctly. One particularly useful improvement I noticed is the
handling of parallel jobs when the %f placeholder is omitted,
producing the warning: "pg_dump: warning: parallel jobs with --pipe
usually require the "%f" placeholder to avoid data corruption from
multiple workers writing to the same file". This is an usability
improvement, as it explicitly warns users against a common misuse that
could otherwise lead to corrupted output when multiple workers write
to the same destination file. Overall, the patch series worked well
and the new --pipe support behaves correctly in my testing, also the
newly added warning for missing %f significantly improves the user
experience for parallel jobs. The patch looks good to me.

Regards,
Solai

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-06-12 05:21:16 Re: [PATCH] REPLICA IDENTITY USING INDEX accepts column with invalid NOT NULL
Previous Message Fujii Masao 2026-06-12 04:27:06 Re: CI failure: postgres_fdw_get_connections