Re: archive status ".ready" files may be created too early

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bossartn(at)amazon(dot)com
Cc: a(dot)lubennikova(at)postgrespro(dot)ru, hlinnaka(at)iki(dot)fi, matsumura(dot)ryo(at)fujitsu(dot)com, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive status ".ready" files may be created too early
Date: 2020-12-16 02:01:20
Message-ID: 20201216.110120.887433782054853494.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 15 Dec 2020 19:32:57 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Mon, 14 Dec 2020 18:25:23 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> > I wonder if these are safe assumptions to make. For your example, if
> > we've written record B to the WAL buffers, but neither record A nor B
> > have been written to disk or flushed, aren't we still in trouble?
>
> You're right in that regard. There's a window where partial record is
> written when write location passes F0 after insertion location passes
> F1. However, remembering all spanning records seems overkilling to me.
>
> I modifed the previous patch so that it remembers the start LSN of the
> *oldest* corss-segment continuation record in the last consecutive
> bonded segments, and the end LSN of the latest cross-segmetn
> continuation record. This doesn't foreget past segment boundaries.
> The region is cleard when WAL-write LSN goes beyond the remembered end
> LSN. So the region may contain several wal-segments that are not
> connected to the next one, but that doesn't matter so much.

Mmm. Even tough it'a PoC, it was too bogus. I fixed it to work saner
way.

- Record the beginning LSN of the first cross-seg record and the end
LSN of the last cross-seg recrod in a consecutive segments bonded by
cross-seg recrods. Spcifically X and Y below.

X Z Y
[recA] [recB] [recC]
[seg A] [seg B] [seg C] [seg D] [seg E]
(1) (2.2) (2.2) (2.1) (2.1) (1)

1. If we wrote upto before X or beyond Y at a segment boundary, notify
the finished segment immediately.

1.1. If we have written beyond Y, clear the recorded region.

2. Otherwise we don't notify the segment immediately:

2.1. If write request was up to exactly the current segment boundary
and we know the end LSN of the record there (that is, it is recC
above), extend the request to the end LSN. Then notify the segment
after the record is written to the end.

2.2. Otherwise (that is recA or recB), we don't know whether the
last record of the last segment is ends just at the segment boundary
(Z) or a record spans between segments (recB). Anyway even if there
is such a record there, we don't know where it ends. As the result
what we can do there is only to refrain from notifying. It doesn't
matter so much since we have already inserted recC so we will soon
reach recC and will notify up to seg C.

There might be a case where we insert up to Y before writing up to Z,
the segment-region X-Y contains non-connected segment boundary in that
case. It is processed as if it is a connected segment
boundary. However, like 2.2 above, It doesn't matter since we write up
to Y soon.

At Tue, 15 Dec 2020 19:32:57 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
me> I added an assertion that a record must be shorter than a wal segment
me> to XLogRecordAssemble(). This guarantees the assumption to be true?
me> (The condition is tentative, would need to be adjusted.)

Changed the assertion more direct way.

me> Also, the attached is a PoC.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v4-0001-Avoid-archiving-a-WAL-segment-that-continues-to-t.patch text/x-patch 11.1 KB
v4-0002-debug-print.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-12-16 02:22:35 Re: Add Information during standby recovery conflicts
Previous Message Michael Paquier 2020-12-16 01:40:43 Re: pg_shmem_allocations & documentation