Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
Date: 2022-07-25 11:12:00
Message-ID: CALj2ACXGcoZJ4wRZ8u-yArDFKMV8pQjAMJ+on9dMBLAQKyUQHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 25, 2022 at 5:17 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Apr 25, 2022 at 6:38 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Fri, Apr 22, 2022 at 07:17:37PM +0530, Bharath Rupireddy wrote:
> > > Right. We find enough disk space and go to write and suddenly the
> > > write operations fail for some reason or the VM crashes because of a
> > > reason other than disk space. I think the foolproof solution is to
> > > figure out the available disk space before prepadding or compressing
> > > and also use the
> > > write-first-to-temp-file-and-then-rename-it-to-original-file as
> > > proposed in the earlier patches in this thread.
> >
> > Yes, what would count here is only the amount of free space in a
> > partition. The total amount of space available becomes handy once you
> > begin introducing things like percentage-based quota policies for the
> > disk when archiving. The free amount of space could be used to define
> > a policy based on the maximum number of bytes you need to leave
> > around, as well, but this is not perfect science as this depends of
> > what FSes decide to do underneath. There are a couple of designs
> > possible here. When I had to deal with my upthread case I have chosen
> > one as I had no need to worry only about Linux, it does not mean that
> > this is the best choice that would fit with the long-term community
> > picture. This comes down to how much pg_receivewal should handle
> > automatically, and how it should handle it.
>
> Thanks. I'm not sure why we are just thinking of crashes due to
> out-of-disk space. Figuring out free disk space before writing a huge
> file (say a WAL file) is a problem in itself to the core postgres as
> well, not just pg_receivewal.
>
> I think we are off-track a bit here. Let me illustrate what's the
> whole problem is and the idea:
>
> If the node/VM on which pg_receivewal runs, goes down/crashes or fails
> during write operation while padding the target WAL file (the .partial
> file) with zeros, the unfilled target WAL file ((let me call this file
> a partially padded .partial file) will be left over and subsequent
> reads/writes to that it will fail with "write-ahead log file \"%s\"
> has %zd bytes, should be 0 or %d" error which requires manual
> intervention to remove it. In a service, this manual intervention is
> what we would like to avoid. Let's not much bother right now for
> compressed file writes (for now at least) as they don't have a
> prepadding phase.
>
> The proposed solution is to make the prepadding atomic - prepad the
> XXXX.partial file as XXXX.partial.tmp name and after the prepadding
> rename (durably if sync option is chosen for pg_receivewal) to
> XXXX.partial. Before prepadding XXXX.partial.tmp, delete the
> XXXX.partial.tmp if it exists.
>
> The above problem isn't unique to pg_receivewal alone, pg_basebackup
> too uses CreateWalDirectoryMethod and dir_open_for_write via
> ReceiveXlogStream.
>
> IMHO, pg_receivewal checking for available disk space before writing
> any file should better be discussed separately?

Here's the v3 patch after rebasing.

I just would like to reiterate the issue the patch is trying to solve:
At times (when no space is left on the device or when the process is
taken down forcibly (VM/container crash)), there can be leftover
uninitialized .partial files (note that padding i.e. filling 16MB WAL
files with all zeros is done in non-compression cases) due to which
pg_receivewal fails to come up after the crash. To address this, the
proposed patch makes the padding 16MB WAL files atomic (first write a
.temp file, pad it and durably rename it to the original .partial
file, ready to be filled with received WAL records). This approach is
similar to what core postgres achieves atomicity while creating new
WAL file (see how XLogFileInitInternal() creates xlogtemp.%d file
first and then how InstallXLogFileSegment() durably renames it to
original WAL file).

Thoughts?

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v3-0001-Avoid-partially-padded-files-under-pg_receivewal-.patch application/x-patch 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2022-07-25 11:12:05 Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Previous Message shiy.fnst@fujitsu.com 2022-07-25 10:57:43 RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns