Re: WIP: WAL prefetch (another approach)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2022-04-25 18:11:49
Message-ID: 111657.1650910309@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I believe that the WAL prefetch patch probably accounts for the
intermittent errors that buildfarm member topminnow has shown
since it went in, eg [1]:

diff -U3 /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out /home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results/pg_walinspect.out
--- /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out 2022-04-10 03:05:15.972622440 +0200
+++ /home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results/pg_walinspect.out 2022-04-25 05:09:49.861642059 +0200
@@ -34,11 +34,7 @@
(1 row)

SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_records_info_till_end_of_wal(:'wal_lsn1');
- ok
-----
- t
-(1 row)
-
+ERROR: could not read WAL at 0/1903E40
SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats(:'wal_lsn1', :'wal_lsn2');
ok
----
@@ -46,11 +42,7 @@
(1 row)

SELECT COUNT(*) >= 0 AS ok FROM pg_get_wal_stats_till_end_of_wal(:'wal_lsn1');
- ok
-----
- t
-(1 row)
-
+ERROR: could not read WAL at 0/1903E40
-- ===================================================================
-- Test for filtering out WAL records of a particular table
-- ===================================================================

I've reproduced this manually on that machine, and confirmed that the
proximate cause is that XLogNextRecord() is returning NULL because
state->decode_queue_head == NULL, without bothering to provide an errormsg
(which doesn't seem very well thought out in itself). I obtained the
contents of the xlogreader struct at failure:

(gdb) p *xlogreader
$1 = {routine = {page_read = 0x594270 <read_local_xlog_page_no_wait>,
segment_open = 0x593b44 <wal_segment_open>,
segment_close = 0x593d38 <wal_segment_close>}, system_identifier = 0,
private_data = 0x0, ReadRecPtr = 26230672, EndRecPtr = 26230752,
abortedRecPtr = 26230752, missingContrecPtr = 26230784,
overwrittenRecPtr = 0, DecodeRecPtr = 26230672, NextRecPtr = 26230752,
PrevRecPtr = 0, record = 0x0, decode_buffer = 0xf25428 "\240",
decode_buffer_size = 65536, free_decode_buffer = true,
decode_buffer_head = 0xf25428 "\240", decode_buffer_tail = 0xf25428 "\240",
decode_queue_head = 0x0, decode_queue_tail = 0x0,
readBuf = 0xf173f0 "\020\321\005", readLen = 0, segcxt = {
ws_dir = '\000' <repeats 1023 times>, ws_segsize = 16777216}, seg = {
ws_file = 25, ws_segno = 0, ws_tli = 1}, segoff = 0,
latestPagePtr = 26222592, latestPageTLI = 1, currRecPtr = 26230752,
currTLI = 1, currTLIValidUntil = 0, nextTLI = 0,
readRecordBuf = 0xf1b3f8 "<", readRecordBufSize = 40960,
errormsg_buf = 0xef3270 "", errormsg_deferred = false, nonblocking = false}

I don't have an intuition about where to look beyond that, any
suggestions?

What I do know so far is that while the failure reproduces fairly
reliably under "make check" (more than half the time, which squares
with topminnow's history), it doesn't reproduce at all under "make
installcheck" (after removing NO_INSTALLCHECK), which seems odd.
Maybe it's dependent on how much WAL history the installation has
accumulated?

It could be that this is a bug in pg_walinspect or a fault in its
test case; hard to tell since that got committed at about the same
time as the prefetch changes.

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2022-04-25%2001%3A48%3A47

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-25 18:31:48 Re: WIP: WAL prefetch (another approach)
Previous Message Pavel Stehule 2022-04-25 17:39:58 Re: proposal: possibility to read dumped table's name from file