Re: pg15b3: recovery fails with wal prefetch enabled

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg15b3: recovery fails with wal prefetch enabled
Date: 2022-09-05 04:54:07
Message-ID: CA+hUKGJXfQJHs2jmVOoOo2J12-6m36E0ytDiyrqp-EvFwupvew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 5, 2022 at 1:28 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I had this more or less figured out on Friday when I wrote last, but I
> got stuck on a weird problem with 026_overwrite_contrecord.pl. I
> think that failure case should report an error, no? I find it strange
> that we end recovery in silence. That was a problem for the new
> coding in this patch, because it is confused by XLREAD_FAIL without
> queuing an error, and then retries, which clobbers the aborted recptr
> state. I'm still looking into that.

On reflection, it'd be better not to clobber any pre-existing error
there, but report one only if there isn't one already queued. I've
done that in this version, which I'm planning to do a bit more testing
on and commit soonish if there are no comments/objections, especially
for that part.

I'll have to check whether a doc change is necessary somewhere to
advertise that maintenance_io_concurrency=0 turns off prefetching, but
IIRC that's kinda already implied.

I've tested quite a lot of scenarios including make check-world with
maintenance_io_concurrency = 0, 1, 10, 1000, and ALTER SYSTEM for all
relevant GUCs on a standby running large pgbench to check expected
effect on pg_stat_recovery_prefetch view and generate system calls.

Attachment Content-Type Size
v2-0001-Fix-recovery_prefetch-with-low-maintenance_io_con.patch text/x-patch 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-09-05 05:15:27 Re: pg15b3: recovery fails with wal prefetch enabled
Previous Message Amit Kapila 2022-09-05 04:32:37 Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)