Re: Non-blocking archiver process

From: BharatDB <bharatdbpg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Artem Gavrilov <artem(dot)gavrilov(at)percona(dot)com>, Patrick Stählin <me(at)packi(dot)ch>, Noah Misch <noah(at)leadboat(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Non-blocking archiver process
Date: 2025-11-10 10:41:21
Message-ID: CAAh00EQy3saDEdVpYL=TX8JKYTdg0VFOAkLBJGt9tvhkKr+N7g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear PostgreSQL Community,

I found 8 Bugs in `shell_archive.c`

While studying how `archive_command` works, I discovered **8 real issues**
that affect reliability, performance, and safety:

| # | Bug | Impact | |---|-----|--------| | 1 | Infinite loop on command
failure | Archiver hangs forever | | 2 | Unreachable code after `return` |
Dead logic | | 3 | Discarded `stdout` from archive command | `DROP
DATABASE` hangs for full command duration | | 4 | Aggressive polling with
`Sleep(100)` | CPU waste (already fixed in core) | | 5 | `malloc`/`free` in
backend | **Memory corruption risk** | | 6 | Poor variable names (`dwRc`,
`bytesRead`) | Hard to read | | 7 | Manual `popen` / `CreateProcess` |
Missed PG infrastructure | | 8 | Redundant `return;` in `void` function |
Style issue |

I refactored `src/backend/archive/shell_archive.c` with **PostgreSQL-style
fixes**:

- Replaced `popen()` and `CreateProcess()` → `OpenPipeStream()` - Used
`fileno(archiveFile)` → `archiveFd` correctly - Switched to `palloc()` /
`pfree()` for memory safety - Renamed variables: `dwRc` → `exit_code`,
`bytesRead` → `nread`, etc. - Read `stdout` to prevent `DROP DATABASE`
hangs - Used `WaitLatchOrSocket()` for interruptible waiting - Removed
redundant `return;` and dead code This is my contribution to improve the
PostgreSQL archiver process. I have attached a patch implementing the
discussed fixes — please review and share any suggestions or feedback. Thanks
!
Lakshmi

On Wed, Oct 15, 2025 at 12:20 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Here are a few comments from me:
>
> I think that by calling system(), anything that is printed out by the
> child process ends up in the PostgreSQL log. With the patch, the output of
> the command seems like it will be read into a buffer and discarded. That's
> a significant behavior difference.
>
> This patch has a 10ms polling loop after which it checks for interrupts,
> and then repeats. I think our normal convention in these kinds of cases is
> to use WaitLatchOrSocket(). That allows for a longer sleep time (like 1s)
> which results in fewer processor wakeups, while actually still being more
> responsive because the arrival of an interrupt should set the process latch
> and terminate the wait.
>
> In general, I agree with Artem's comments about writing code that fits the
> PostgreSQL style. We don't want to invent new ways of solving problems for
> which we already have infrastructure, nor do we want to solve problems in
> this case in a way that is different from what we do in other cases. Using
> ereport appropriately is part of that, but there's a lot more to it. Artem
> mentioned malloc and free, but notice, for example, that we also have our
> own wrapper around popen() in OpenPipeStream(). Maybe we shouldn't use that
> here, but we shouldn't *accidentally* not use that here.
>
> I wonder whether we should really be using popen() in any form, actually.
> If what we want is for the output of the command to go to our standard
> output, as it does with system(), that's exactly what popen() is designed
> not to do, so it doesn't seem like a natural fit. Perhaps we should be
> using fork() + exec()? Or maybe we already have some good infrastructure
> for this we can reuse somewhere?
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

Attachment Content-Type Size
0001-refactor-shell_archive.c-use-OpenPipeStream-improve-.patch text/x-patch 6.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-11-10 10:44:27 Re: Improve pg_sync_replication_slots() to wait for primary to advance
Previous Message Xuneng Zhou 2025-11-10 10:27:41 Re: Add tests for object size limits of injection points