Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

From: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory
Date: 2022-01-03 20:10:32
Message-ID: CAHg+QDdPrVig+h2X5s-2MRR_19Vmehpv0upktyRc85ZA_UQHbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Michael!

On Sun, Jan 2, 2022 at 11:56 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Sun, Jan 02, 2022 at 09:27:43PM -0800, SATYANARAYANA NARLAPURAM wrote:
> > I noticed that pg_receivewal fails to stream when the partial file to
> write
> > is not fully initialized and fails with the error message something like
> > below. This requires an extra step of deleting the partial file that is
> not
> > fully initialized before starting the pg_receivewal. Attaching a simple
> > patch that creates a temp file, fully initialize it and rename the file
> to
> > the desired wal segment name.
>
> Are you referring to the pre-padding when creating a new partial
> segment, aka when we write chunks of XLOG_BLCKSZ full of zeros until
> the file is fully created? What kind of error did you see? I guess
> that a write() with ENOSPC would be more likely, but you got a
> different problem?

I see two cases, 1/ when no space is left on the device and 2/ when the
process is taken down forcibly (a VM/container crash)

> I don't disagree with improving such cases, but we
> should not do things so as there is a risk of leaving behind an
> infinite set of segments in case of repeated errors

Do you see a problem with the proposed patch that leaves the files behind,
at least in my testing I don't see any files left behind?

> , and partial
> segments are already a kind of temporary file.
>

if the .partial file exists with not zero-padded up to the wal segment size
(WalSegSz), then open_walfile fails with the below error. I have two
options here, 1/ to continue padding the existing partial file and let it
zero up to WalSegSz , 2/create a temp file as I did in the patch. I thought
the latter is safe because it can handle corrupt cases as described below.
Thoughts?

* When streaming to files, if an existing file exists we verify that it's
* either empty (just created), or a complete WalSegSz segment (in which
* case it has been created and padded). Anything else indicates a corrupt
* file. Compressed files have no need for padding, so just ignore this
* case.

>
> - if (dir_data->sync)
> + if (shouldcreatetempfile)
> + {
> + if (durable_rename(tmpsuffixpath, targetpath) != 0)
> + {
> + close(fd);
> + unlink(tmpsuffixpath);
> + return NULL;
> + }
> + }
>
> durable_rename() does a set of fsync()'s, but --no-sync should not
> flush any data.
>
I need to look into this further, without this I am seeing random file
close and rename failures and disconnecting the stream. Also it appears we
are calling durable_rename when we are closing the file (dir_close) even
without --no-sync. Should we worry about the padding case?

> --
> Michael
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stanislav Bashkyrtsev 2022-01-03 20:26:32 Re: PostgreSQL stops when adding a breakpoint in CLion
Previous Message Justin Pryzby 2022-01-03 19:56:09 Re: [PATCH] pg_stat_toast v6