BRIN desummarization writes junk WAL records

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: BRIN desummarization writes junk WAL records
Date: 2017-04-07 00:27:04
Message-ID: 20191.1491524824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am seeing the database fail to restart after a crash during the
regression tests, due to a divide-by-zero fault in BRIN wal replay.

Core was generated by `postgres: startup'.
Program terminated with signal 8, Arithmetic exception.
#0 brinSetHeapBlockItemptr (buf=<value optimized out>, pagesPerRange=0,
heapBlk=0, tid=...) at brin_revmap.c:169
169 iptr += HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk);
(gdb) bt
#0 brinSetHeapBlockItemptr (buf=<value optimized out>, pagesPerRange=0,
heapBlk=0, tid=...) at brin_revmap.c:169
#1 0x0000000000478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
at brin_xlog.c:274
#2 brin_redo (record=0x2403ac8) at brin_xlog.c:320
#3 0x0000000000513174 in StartupXLOG () at xlog.c:7171
#4 0x00000000006dea91 in StartupProcessMain () at startup.c:217
#5 0x000000000052214a in AuxiliaryProcessMain (argc=2, argv=0x7fff4bb8d1f0)
at bootstrap.c:425
#6 0x00000000006d98b7 in StartChildProcess (type=StartupProcess)
at postmaster.c:5256
#7 0x00000000006ddae6 in PostmasterMain (argc=3, argv=<value optimized out>)
at postmaster.c:1329
#8 0x0000000000658038 in main (argc=3, argv=0x2402b20) at main.c:228

The proximate cause of the exception seems to be that
brinSetHeapBlockItemptr is being passed pagesPerRange = 0,
which is problematic since HEAPBLK_TO_REVMAP_INDEX tries to
divide by that. Looking one level down, the bogus value
seems to be coming out of an xl_brin_desummarize WAL record:

(gdb) f 1
#1 0x0000000000478cdc in brin_xlog_desummarize_page (record=0x2403ac8)
at brin_xlog.c:274
274 brinSetHeapBlockItemptr(buffer, xlrec->pagesPerRange, xlrec->heapBlk, iptr);
(gdb) p *xlrec
$1 = {pagesPerRange = 0, heapBlk = 0, regOffset = 1}

This is, perhaps, not unrelated to the fact that
brinRevmapDesummarizeRange doesn't seem to be bothering to fill
that field of the record.

BTW, is it actually sensible that xl_brin_desummarize's heapBlk
is declared OffsetNumber and not BlockNumber? If there's a reason
why that's correct, the field name seems damn misleading.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-04-07 00:53:30 Re: Remove pg_stat_progress_vacuum from Table 28.2
Previous Message Tatsuo Ishii 2017-04-07 00:17:05 Re: pgbench - allow to store select results into variables