Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

From: mahendrakar s <mahendrakarforpg(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
Date: 2022-07-31 15:05:58
Message-ID: CABkiuWryWtGcwZ=cutV7d3FSMP1spE10Az=p-yDWvBi6WwxFcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Bharath,

Idea to atomically allocate WAL file by creating tmp file and renaming it
is nice.
I have one question though:
How is partially temp file created will be cleaned if the VM crashes or out
of disk space cases? Does it endup creating multiple files for every VM
crash/disk space during process of pg_receivewal?

Thoughts?

Thanks,
Mahendrakar.

On Mon, 25 Jul 2022 at 16:42, Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Mon, Apr 25, 2022 at 5:17 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Mon, Apr 25, 2022 at 6:38 AM Michael Paquier <michael(at)paquier(dot)xyz>
> wrote:
> > >
> > > On Fri, Apr 22, 2022 at 07:17:37PM +0530, Bharath Rupireddy wrote:
> > > > Right. We find enough disk space and go to write and suddenly the
> > > > write operations fail for some reason or the VM crashes because of a
> > > > reason other than disk space. I think the foolproof solution is to
> > > > figure out the available disk space before prepadding or compressing
> > > > and also use the
> > > > write-first-to-temp-file-and-then-rename-it-to-original-file as
> > > > proposed in the earlier patches in this thread.
> > >
> > > Yes, what would count here is only the amount of free space in a
> > > partition. The total amount of space available becomes handy once you
> > > begin introducing things like percentage-based quota policies for the
> > > disk when archiving. The free amount of space could be used to define
> > > a policy based on the maximum number of bytes you need to leave
> > > around, as well, but this is not perfect science as this depends of
> > > what FSes decide to do underneath. There are a couple of designs
> > > possible here. When I had to deal with my upthread case I have chosen
> > > one as I had no need to worry only about Linux, it does not mean that
> > > this is the best choice that would fit with the long-term community
> > > picture. This comes down to how much pg_receivewal should handle
> > > automatically, and how it should handle it.
> >
> > Thanks. I'm not sure why we are just thinking of crashes due to
> > out-of-disk space. Figuring out free disk space before writing a huge
> > file (say a WAL file) is a problem in itself to the core postgres as
> > well, not just pg_receivewal.
> >
> > I think we are off-track a bit here. Let me illustrate what's the
> > whole problem is and the idea:
> >
> > If the node/VM on which pg_receivewal runs, goes down/crashes or fails
> > during write operation while padding the target WAL file (the .partial
> > file) with zeros, the unfilled target WAL file ((let me call this file
> > a partially padded .partial file) will be left over and subsequent
> > reads/writes to that it will fail with "write-ahead log file \"%s\"
> > has %zd bytes, should be 0 or %d" error which requires manual
> > intervention to remove it. In a service, this manual intervention is
> > what we would like to avoid. Let's not much bother right now for
> > compressed file writes (for now at least) as they don't have a
> > prepadding phase.
> >
> > The proposed solution is to make the prepadding atomic - prepad the
> > XXXX.partial file as XXXX.partial.tmp name and after the prepadding
> > rename (durably if sync option is chosen for pg_receivewal) to
> > XXXX.partial. Before prepadding XXXX.partial.tmp, delete the
> > XXXX.partial.tmp if it exists.
> >
> > The above problem isn't unique to pg_receivewal alone, pg_basebackup
> > too uses CreateWalDirectoryMethod and dir_open_for_write via
> > ReceiveXlogStream.
> >
> > IMHO, pg_receivewal checking for available disk space before writing
> > any file should better be discussed separately?
>
> Here's the v3 patch after rebasing.
>
> I just would like to reiterate the issue the patch is trying to solve:
> At times (when no space is left on the device or when the process is
> taken down forcibly (VM/container crash)), there can be leftover
> uninitialized .partial files (note that padding i.e. filling 16MB WAL
> files with all zeros is done in non-compression cases) due to which
> pg_receivewal fails to come up after the crash. To address this, the
> proposed patch makes the padding 16MB WAL files atomic (first write a
> .temp file, pad it and durably rename it to the original .partial
> file, ready to be filled with received WAL records). This approach is
> similar to what core postgres achieves atomicity while creating new
> WAL file (see how XLogFileInitInternal() creates xlogtemp.%d file
> first and then how InstallXLogFileSegment() durably renames it to
> original WAL file).
>
> Thoughts?
>
> Regards,
> Bharath Rupireddy.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton A. Melnikov 2022-07-31 15:22:33 Re: Improve TAP tests of pg_upgrade for cross-version tests
Previous Message Justin Pryzby 2022-07-31 13:41:35 pg_upgrade errno