Re: Reviewing freeze map code

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-07-08 13:24:41
Message-ID: CAA4eK1LbEeyJkvZR7aad5vTpSpKNCG+hcckFUV1a6-_Y0w755A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 7, 2016 at 12:34 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Than you for reviewing!
>
> On Thu, Jul 7, 2016 at 7:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2016-07-05 23:37:59 +0900, Masahiko Sawada wrote:
>>> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
>>> index 57da57a..fd66527 100644
>>> --- a/src/backend/access/heap/heapam.c
>>> +++ b/src/backend/access/heap/heapam.c
>>> @@ -3923,6 +3923,17 @@ l2:
>>>
>>> if (need_toast || newtupsize > pagefree)
>>> {
>>> + /*
>>> + * To prevent data corruption due to updating old tuple by
>>> + * other backends after released buffer
>>
>> That's not really the reason, is it? The prime problem is crash safety /
>> replication. The row-lock we're faking (by setting xmax to our xid),
>> prevents concurrent updates until this backend died.
>
> Fixed.
>
>>> , we need to emit that
>>> + * xmax of old tuple is set and clear visibility map bits if
>>> + * needed before releasing buffer. We can reuse xl_heap_lock
>>> + * for this purpose. It should be fine even if we crash midway
>>> + * from this section and the actual updating one later, since
>>> + * the xmax will appear to come from an aborted xid.
>>> + */
>>> + START_CRIT_SECTION();
>>> +
>>> /* Clear obsolete visibility flags ... */
>>> oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
>>> oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>>> @@ -3936,6 +3947,46 @@ l2:
>>> /* temporarily make it look not-updated */
>>> oldtup.t_data->t_ctid = oldtup.t_self;
>>> already_marked = true;
>>> +
>>> + /* Clear PD_ALL_VISIBLE flags */
>>> + if (PageIsAllVisible(BufferGetPage(buffer)))
>>> + {
>>> + all_visible_cleared = true;
>>> + PageClearAllVisible(BufferGetPage(buffer));
>>> + visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
>>> + vmbuffer);
>>> + }
>>> +
>>> + MarkBufferDirty(buffer);
>>> +
>>> + if (RelationNeedsWAL(relation))
>>> + {
>>> + xl_heap_lock xlrec;
>>> + XLogRecPtr recptr;
>>> +
>>> + /*
>>> + * For logical decoding we need combocids to properly decode the
>>> + * catalog.
>>> + */
>>> + if (RelationIsAccessibleInLogicalDecoding(relation))
>>> + log_heap_new_cid(relation, &oldtup);
>>
>> Hm, I don't see that being necessary here. Row locks aren't logically
>> decoded, so there's no need to emit this here.
>
> Fixed.
>
>>
>>> + /* Clear PD_ALL_VISIBLE flags */
>>> + if (PageIsAllVisible(page))
>>> + {
>>> + Buffer vmbuffer = InvalidBuffer;
>>> + BlockNumber block = BufferGetBlockNumber(*buffer);
>>> +
>>> + all_visible_cleared = true;
>>> + PageClearAllVisible(page);
>>> + visibilitymap_pin(relation, block, &vmbuffer);
>>> + visibilitymap_clear(relation, block, vmbuffer);
>>> + }
>>> +
>>
>> That clears all-visible unnecessarily, we only need to clear all-frozen.
>>
>
> Fixed.
>
>>
>>> @@ -8694,6 +8761,23 @@ heap_xlog_lock(XLogReaderState *record)
>>> }
>>> HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
>>> HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
>>> +
>>> + /* The visibility map need to be cleared */
>>> + if ((xlrec->infobits_set & XLHL_ALL_VISIBLE_CLEARED) != 0)
>>> + {
>>> + RelFileNode rnode;
>>> + Buffer vmbuffer = InvalidBuffer;
>>> + BlockNumber blkno;
>>> + Relation reln;
>>> +
>>> + XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
>>> + reln = CreateFakeRelcacheEntry(rnode);
>>> +
>>> + visibilitymap_pin(reln, blkno, &vmbuffer);
>>> + visibilitymap_clear(reln, blkno, vmbuffer);
>>> + PageClearAllVisible(page);
>>> + }
>>> +
>>
>>
>>> PageSetLSN(page, lsn);
>>> MarkBufferDirty(buffer);
>>> }
>>> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
>>> index a822d0b..41b3c54 100644
>>> --- a/src/include/access/heapam_xlog.h
>>> +++ b/src/include/access/heapam_xlog.h
>>> @@ -242,6 +242,7 @@ typedef struct xl_heap_cleanup_info
>>> #define XLHL_XMAX_EXCL_LOCK 0x04
>>> #define XLHL_XMAX_KEYSHR_LOCK 0x08
>>> #define XLHL_KEYS_UPDATED 0x10
>>> +#define XLHL_ALL_VISIBLE_CLEARED 0x20
>>
>> Hm. We can't easily do that in the back-patched version; because a
>> standby won't know to check for the flag . That's kinda ok, since we
>> don't yet need to clear all-visible yet at that point of
>> heap_update. But that better means we don't do so on the master either.
>>
>
> Attached latest version patch.

+ /* Clear only the all-frozen bit on visibility map if needed */

+ if (PageIsAllVisible(BufferGetPage(buffer)) &&

+ VM_ALL_FROZEN(relation, block, &vmbuffer))
+ {
+ visibilitymap_clear_extended(relation, block, vmbuffer,
+ VISIBILITYMAP_ALL_FROZEN);
+ }
+

+ if (RelationNeedsWAL(relation))
+ {
..

+ XLogBeginInsert();
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+
+ xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
+ xlrec.locking_xid = xmax_old_tuple;
+ xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
+ oldtup.t_data->t_infomask2);
+ XLogRegisterData((char *) &xlrec, SizeOfHeapLock);
+ recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
..

One thing that looks awkward in this code is that it doesn't record
whether the frozen bit is actually cleared during the actual operation
and then during replay, it always clear the frozen bit irrespective of
whether it has been cleared by the actual operation or not.

+ /* Clear only the all-frozen bit on visibility map if needed */
+ if (PageIsAllVisible(page) &&
+ VM_ALL_FROZEN(relation, BufferGetBlockNumber(*buffer), &vmbuffer))
+ {
+ BlockNumber block = BufferGetBlockNumber(*buffer);
+
+ visibilitymap_pin(relation, block, &vmbuffer);

I think it is not right to call visibilitymap_pin after holding a
buffer lock (visibilitymap_pin can perform I/O). Refer heap_update
for how to pin the visibility map.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2016-07-08 13:49:03 Re: can we optimize STACK_DEPTH_SLOP
Previous Message Amit Kapila 2016-07-08 12:27:34 Re: Showing parallel status in \df+