Re: WIP: WAL prefetch (another approach)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2020-09-01 13:14:26
Message-ID: 20200901131426.5lse2jlepm3uligv@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 13, 2020 at 06:57:20PM +1200, Thomas Munro wrote:
>On Thu, Aug 6, 2020 at 10:47 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> On Thu, Aug 06, 2020 at 02:58:44PM +1200, Thomas Munro wrote:
>> >On Tue, Aug 4, 2020 at 3:47 AM Tomas Vondra
>> >> Any luck trying to reproduce thigs? Should I try again and collect some
>> >> additional debug info?
>> >
>> >No luck. I'm working on it now, and also trying to reduce the
>> >overheads so that we're not doing extra work when it doesn't help.
>>
>> OK, I'll see if I can still reproduce it.
>
>Since someone else ask me off-list, here's a rebase, with no
>functional changes. Soon I'll post a new improved version, but this
>version just fixes the bitrot and hopefully turns cfbot green.

I've decided to do some tests with this patch version, but I immediately
ran into issues. What I did was initializing a 32GB pgbench database,
backed it up (shutdown + tar) and then ran 2h pgbench with archiving.
And then I restored the backed-up data directory and instructed it to
replay WAL from the archive. There's about 16k WAL segments, so about
256GB of WAL.

Unfortunately, the very first thing that happens after starting the
recovery is this:

LOG: starting archive recovery
LOG: restored log file "000000010000001600000080" from archive
LOG: consistent recovery state reached at 16/800000A0
LOG: redo starts at 16/800000A0
LOG: database system is ready to accept read only connections
LOG: recovery started prefetching on timeline 1 at 0/800000A0
LOG: recovery no longer prefetching: unexpected pageaddr 8/84000000 in log segment 000000010000001600000081, offset 0
LOG: restored log file "000000010000001600000081" from archive
LOG: restored log file "000000010000001600000082" from archive

So we start applying 000000010000001600000081 and it fails almost
immediately on the first segment. This is confirmed by prefetch stats,
which look like this:

-[ RECORD 1 ]---+-----------------------------
stats_reset | 2020-09-01 15:02:31.18766+02
prefetch | 1044
skip_hit | 1995
skip_new | 87
skip_fpw | 2108
skip_seq | 27
distance | 0
queue_depth | 0
avg_distance | 135838.95
avg_queue_depth | 8.852459

So we do a little bit of prefetching and then it gets disabled :-(

The segment looks perfectly fine when inspected using pg_waldump, see
the attached file.

I've tested this applied on 6ca547cf75ef6e922476c51a3fb5e253eef5f1b6,
and the failure seems fairly similar to what I reported before, except
that now it happened right at the very beginning.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
000000010000001600000081.log.gz application/gzip 99.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-09-01 13:29:17 Re: clarify "rewritten" in pg_checksums docs
Previous Message Michael Banck 2020-09-01 13:13:58 clarify "rewritten" in pg_checksums docs