|From:||Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>|
|Cc:||noah(at)leadboat(dot)com, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org|
|Subject:||Re: [HACKERS] WAL logging problem in 9.4.3?|
|Views:||Raw Message | Whole Thread | Download mbox|
At Thu, 4 Apr 2019 10:52:59 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoZE0jW0jbQxAtoJgJNwrR1hyx3x8pUjQr=ggenLxnPoEQ(at)mail(dot)gmail(dot)com>
> On Wed, Apr 3, 2019 at 10:03 PM Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > * Insert log record, using delete or insert instead of update log
> > > * when only one of the two buffers needs WAL-logging. If this were a
> > > * HOT-update, redoing the WAL record would result in a broken
> > > * hot-chain. However, that never happens because updates complete on
> > > * a single page always use log_update.
> It makes sense grammatically, but I'm not sure I believe that it's
Great to hear that! I rewrote it as the following.
+ * Insert log record. When we are not running WAL-skipping, always use
+ * update log. Otherwise use delete or insert log instead when only
+ * one of the two buffers needs WAL-logging. If this were a
+ * HOT-update, redoing the WAL record would result in a broken
+ * hot-chain. However, that never happens because updates complete on
+ * a single page always use log_update.
+ * Using delete or insert log in place of udpate log leads to
+ * inconsistent series of WAL records. But note that WAL-skipping
+ * happens only when we are updating a tuple in a relation that has
+ * been create in the same transaction. Once commited, the WAL records
+ * recovers the same state of the relation as the synced state at the
+ * commit. Or the maybe-broken relation due to a crash before commit
+ * will be removed in recovery.
> sound technically. Even though it's only used in the non-HOT case,
> it's still important that the CTID, XMIN, and XMAX fields are set
> correctly during both normal operation and recovery.
log_heap_delete()/log_heap_update() record the infomasks of the
deleted tuple as is. Xmax is stored from the same
variable. offnum is taken from the deleted tuple and buffer is
registered and xlrec.flags is set to the same value. As the
result Xmax, infomasks and ctid are restored to the same state by
heap_xlog_xlog_delete(). I didn't add a comment about that.
log_heap_insert()/log_heap_update() record the infomasks of the
inserted tuple as is. Xmin/Cmin and ctid related info are handled
the same way. But log_heap_insert() assumes that Xmax =
invalid. But that happens only when another transaction can see
it, which is not the case here. I added a command and assertion
before calling log_heap_insert().
+ * Coming here means that the old tuple is invisible and
+ * inoperable to another transaction. So xmax_new_tuple is
+ * expected to be InvalidTransactionId here.
+ Assert (xmax_new_tuple == InvalidTransactionId);
+ recptr = log_heap_insert(relation, buffer, newtup,
I noticed that I accidentally moved log_heap_new_cid stuff to
log_heap_insert/delete(). I restored them.
The attached v11 is the new version addressing the aboves and
NTT Open Source Software Center
|Next Message||Tom Lane||2019-04-05 03:57:58||Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits|
|Previous Message||Pavan Deolasee||2019-04-05 03:55:18||Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits|