Re: logical decoding and replication of sequences

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences
Date: 2022-08-29 10:21:30
Message-ID: CA+hUKGJOnWuQqj42Q8xKfWN9NxGYUvEDEvEd_a+9Y9dF=78Oyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 8, 2022 at 8:56 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Mon, 08 Aug 2022 17:33:22 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> > If WaitForWALToBecomeAvailable returned by promotion, ReadPageInteral
> > misses the chance to inavlidate reader-state. That state is not an
> > error while in StandbyMode.
>
> Mmm... Maybe I wanted to say: (Still I'm not sure the rewrite works..)
>
> If WaitForWALToBecomeAvailable returned by promotion, ReadPageInteral
> would miss the chance to invalidate reader-state. When XLogPageRead
> is called in blocking mode while in StandbyMode (that is, the
> traditional condition) , the function continues retrying until it
> succeeds, or returns XLRAD_FAIL if promote is triggered. In other
> words, it was not supposed to return non-failure while the header
> validation is failing while in standby mode. But while in nonblocking
> mode, the function can return non-failure with lastSourceFailed =
> true, which seems wrong.

New ideas:

0001: Instead of figuring out when to invalidate the cache, let's
just invalidate it before every read attempt. It is only marked valid
after success (ie state->readLen > 0). No need to worry about error
cases.

0002: While here, I don't like xlogrecovery.c clobbering
xlogreader.c's internal error state, so I think we should have a
function for that with a documented purpose. It was also a little
inconsistent that it didn't clear a flag (but not buggy AFAICS; kinda
wondering if I should just get rid of that flag, but that's for
another day).

0003: Thinking about your comments above made me realise that I don't
really want XLogReadPage() to be internally retrying for obscure
failures while reading ahead. I think I prefer to give up on
prefetching as soon as anything tricky happens, and deal with
complexities once recovery catches up to that point. I am still
thinking about this point.

Here's the patch set I'm testing.

Attachment Content-Type Size
0001-Fix-cache-invalidation-in-rare-recovery_prefetch-cas.patch text/x-patch 2.1 KB
0002-Improve-xlogrecovery.c-xlogreader.c-boundary.patch text/x-patch 2.5 KB
0003-Don-t-retry-inside-XLogPageRead-if-reading-ahead.patch text/x-patch 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2022-08-29 10:47:28 Re: [PATCH] Optimize json_lex_string by batching character copying
Previous Message Dilip Kumar 2022-08-29 10:16:58 Re: standby promotion can create unreadable WAL