Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shutdown indefinitely stuck due to unflushed FPI_FOR_HINT record
Date: 2026-03-04 08:51:56
Message-ID: CAO6_Xqq1h6kggb1o206rgouPS0H5jnjahzZ0We-9ggnBjB2JsA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 3, 2026 at 6:29 PM Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> The approach of calling XLogSetAsyncXactLSN() in RecordTransactionAbort() seems
> more like an improvement than a bug fix. Since it changes
> RecordTransactionAbort(), it could have unintended impact on the system.
>
> It may be a reasonable idea (though I'm not certain yet), but for a bug fix
> I believe we should first apply the minimal change necessary to resolve
> the issue. If needed, this approach could then be proposed later separately as
> an improvement for the next major version.

Agreed, that's definitely a change that can have a large impact. I
will open a separate thread later.

> As a simpler alternative, would it make sense for walsender to call
> XLogFlush(GetXLogInsertRecPtr()) instead of XLogBackgroundFlush() during
> shutdown? I'm not sure why walsender currently uses XLogBackgroundFlush().
> If there isn't a clear reason for that choice, directly calling XLogFlush()
> might be the simpler solution. Thought?

That sounds like a good solution. I've tried it and it fixes the
issue. And this only changes the shutdown behaviour in the walsender.

The use of XLogBackgroundFlush() has been introduced with
c6c333436491, but there's no mention why it was specifically used. I
guess the assumption was that a change would either be flushed with a
commit, or tracked by async LSN through rollback, so
XLogBackgroundFlush() would always write pending records. But this
turns out to be false in the case of this bug.

I've updated the patch with this approach.

Regards,
Anthonin Bonnefoy

Attachment Content-Type Size
v5-0001-Fix-stuck-shutdown-due-to-unflushed-records.patch application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2026-03-04 08:55:40 Re: astreamer_lz4: fix bug of output pointer advancement in decompressor
Previous Message Andrey Silitskiy 2026-03-04 08:47:47 Re: Exit walsender before confirming remote flush in logical replication