Re: WIP: WAL prefetch (another approach)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2020-06-05 20:40:57
Message-ID: 20200605204057.anopw4rntkg6hsag@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 05, 2020 at 10:04:14PM +0200, Tomas Vondra wrote:
>On Fri, Jun 05, 2020 at 05:20:52PM +0200, Tomas Vondra wrote:
>>
>>...
>>
>>which is not particularly great, I guess. There however seems to be
>>something wrong, because with the prefetching I see this in the log:
>>
>>prefetch:
>>2020-06-05 02:47:25.970 CEST 1591318045.970 [22961] LOG: recovery no
>>longer prefetching: unexpected pageaddr 108/E8000000 in log segment
>>0000000100000108000000FF, offset 0
>>
>>prefetch2:
>>2020-06-05 15:29:23.895 CEST 1591363763.895 [26676] LOG: recovery no
>>longer prefetching: unexpected pageaddr 108/E8000000 in log segment
>>000000010000010900000001, offset 0
>>
>>Which seems pretty suspicious, but I have no idea what's wrong. I admit
>>the archive/restore commands are a bit hacky, but I've only seen this
>>with prefetching on the SATA storage, while all other cases seem to be
>>just fine. I haven't seen in on NVME (which processes much more WAL).
>>And the SATA baseline (no prefetching) also worked fine.
>>
>>Moreover, the pageaddr value is the same in both cases, but the WAL
>>segments are different (but just one segment apart). Seems strange.
>>
>
>I suspected it might be due to a somewhat hackish restore_command that
>prefetches some of the WAL segments, so I tried again with a much
>simpler restore_command - essentially just:
>
> restore_command = 'cp /archive/%f %p.tmp && mv %p.tmp %p'
>
>which I think should be fine for testing purposes. And I got this:
>
> LOG: recovery no longer prefetching: unexpected pageaddr 108/57000000
> in log segment 0000000100000108000000FF, offset 0
> LOG: restored log file "0000000100000108000000FF" from archive
>
>which is the same segment as in the earlier examples, but with a
>different pageaddr value. Of course, there's no such pageaddr in the WAL
>segment (and recovery of that segment succeeds).
>
>So I think there's something broken ...
>

BTW in all three cases it happens right after the first restart point in
the WAL stream:

LOG: restored log file "0000000100000108000000FD" from archive
LOG: restartpoint starting: time
LOG: restored log file "0000000100000108000000FE" from archive
LOG: restartpoint complete: wrote 236092 buffers (22.5%); 0 WAL ...
LOG: recovery restart point at 108/FC000028
DETAIL: Last completed transaction was at log time 2020-06-04
15:27:00.95139+02.
LOG: recovery no longer prefetching: unexpected pageaddr
108/57000000 in log segment 0000000100000108000000FF, offset 0
LOG: restored log file "0000000100000108000000FF" from archive

It looks exactly like this in case of all 3 failures ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2020-06-05 20:53:01 Re: repeat() function, CHECK_FOR_INTERRUPTS(), and unlikely()
Previous Message Tomas Vondra 2020-06-05 20:04:14 Re: WIP: WAL prefetch (another approach)