Re: WAL consistency check facility

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: WAL consistency check facility
Date: 2016-08-27 23:56:49
Message-ID: CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 26, 2016 at 7:24 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
>> As the block numbers are different, I was getting the following warning:
>> WARNING: Inconsistent page (at byte 8166) found for record
>> 0/127F4A48, rel 1663/16384/16946, forknum 0, blkno 0, Backup Page
>> Header : (pd_lower: 28 pd_upper: 8152 pd_special: 8192) Current Page
>> Header: (pd_lower: 28 pd_upper: 8152 pd_special: 8192)
>> CONTEXT: xlog redo at 0/127F4A48 for Heap/INSERT+INIT: off 1
>>
>> In heap_xlog_insert, t_ctid is always set to blkno and xlrec->offnum.
>> I think this is why I was getting the above warning.
>
> Umm, really? Then perhaps this *is* a bug. Peter?

It's a matter of perspective, but probably not. The facts are:

* heap_insert() treats speculative insertions differently. In
particular, it does not set ctid in the caller-passed heap tuple
itself, leaving the ctid field to contain a speculative token value --
a per-backend monotonically increasing identifier. This identifier
represents some particular speculative insertion attempt within a
backend.

* On the redo side, heap_xlog_insert() was only changed mechanically
when upsert went in. So, it doesn't actually care about the stuff that
heap_insert() was made to care about to support speculative insertion.

* Furthermore, heap_insert() does *not* WAL log ctid under any
circumstances (that didn't change, either). Traditionally, the ctid
field was completely redundant anyway (since, of course, we're only
dealing with newly inserted tuples in heap_insert()). With speculative
insertions, there is a token within ctid, whose value represents
actual information that cannot be reconstructed after the fact (the
speculative token value). Even still, that isn't WAL-logged (see
comments above xl_heap_header struct). That's okay, because the
speculative insertion token value is only needed due to obscure issues
with "unprincipled deadlocks". The speculative token value itself is
only of interest to other, conflicting inserters, and only for the
instant in time immediately following physical insertion. The token
doesn't matter in the slightest to crash recovery, nor to Hot Standby
replicas.

While this design had some really nice properties (ask me if you are
unclear on this), it does break tools like the proposed WAL-checker
tool. I would compare this speculative token situation to the
situation with hint bits (when checksums are disabled, and
wal_log_hints = off).

I actually have a lot of sympathy for the idea that, in general, cases
like this should be avoided. But, would it really be worth it to
create a useless special case for speculative insertion (to WAL-log
the virtually useless speculative insertion token value)? I'm certain
that the answer must be "no": This tool ought to deal with speculative
insertion as a special case, and not vice-versa.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-08-28 00:24:46 src/include/catalog/pg_foreign_table.h still refers genbki.sh
Previous Message Andres Freund 2016-08-27 21:48:29 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)