Re: Reviewing freeze map code

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-07-11 14:51:05
Message-ID: CAD21AoAqXDbR0TwK42YLJsujK9HqA=pT9O_yRWmyq5WiW9GoaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 8, 2016 at 10:24 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Jul 7, 2016 at 12:34 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Than you for reviewing!
>>
>> On Thu, Jul 7, 2016 at 7:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> On 2016-07-05 23:37:59 +0900, Masahiko Sawada wrote:
>>>> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
>>>> index 57da57a..fd66527 100644
>>>> --- a/src/backend/access/heap/heapam.c
>>>> +++ b/src/backend/access/heap/heapam.c
>>>> @@ -3923,6 +3923,17 @@ l2:
>>>>
>>>> if (need_toast || newtupsize > pagefree)
>>>> {
>>>> + /*
>>>> + * To prevent data corruption due to updating old tuple by
>>>> + * other backends after released buffer
>>>
>>> That's not really the reason, is it? The prime problem is crash safety /
>>> replication. The row-lock we're faking (by setting xmax to our xid),
>>> prevents concurrent updates until this backend died.
>>
>> Fixed.
>>
>>>> , we need to emit that
>>>> + * xmax of old tuple is set and clear visibility map bits if
>>>> + * needed before releasing buffer. We can reuse xl_heap_lock
>>>> + * for this purpose. It should be fine even if we crash midway
>>>> + * from this section and the actual updating one later, since
>>>> + * the xmax will appear to come from an aborted xid.
>>>> + */
>>>> + START_CRIT_SECTION();
>>>> +
>>>> /* Clear obsolete visibility flags ... */
>>>> oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
>>>> oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>>>> @@ -3936,6 +3947,46 @@ l2:
>>>> /* temporarily make it look not-updated */
>>>> oldtup.t_data->t_ctid = oldtup.t_self;
>>>> already_marked = true;
>>>> +
>>>> + /* Clear PD_ALL_VISIBLE flags */
>>>> + if (PageIsAllVisible(BufferGetPage(buffer)))
>>>> + {
>>>> + all_visible_cleared = true;
>>>> + PageClearAllVisible(BufferGetPage(buffer));
>>>> + visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
>>>> + vmbuffer);
>>>> + }
>>>> +
>>>> + MarkBufferDirty(buffer);
>>>> +
>>>> + if (RelationNeedsWAL(relation))
>>>> + {
>>>> + xl_heap_lock xlrec;
>>>> + XLogRecPtr recptr;
>>>> +
>>>> + /*
>>>> + * For logical decoding we need combocids to properly decode the
>>>> + * catalog.
>>>> + */
>>>> + if (RelationIsAccessibleInLogicalDecoding(relation))
>>>> + log_heap_new_cid(relation, &oldtup);
>>>
>>> Hm, I don't see that being necessary here. Row locks aren't logically
>>> decoded, so there's no need to emit this here.
>>
>> Fixed.
>>
>>>
>>>> + /* Clear PD_ALL_VISIBLE flags */
>>>> + if (PageIsAllVisible(page))
>>>> + {
>>>> + Buffer vmbuffer = InvalidBuffer;
>>>> + BlockNumber block = BufferGetBlockNumber(*buffer);
>>>> +
>>>> + all_visible_cleared = true;
>>>> + PageClearAllVisible(page);
>>>> + visibilitymap_pin(relation, block, &vmbuffer);
>>>> + visibilitymap_clear(relation, block, vmbuffer);
>>>> + }
>>>> +
>>>
>>> That clears all-visible unnecessarily, we only need to clear all-frozen.
>>>
>>
>> Fixed.
>>
>>>
>>>> @@ -8694,6 +8761,23 @@ heap_xlog_lock(XLogReaderState *record)
>>>> }
>>>> HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
>>>> HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
>>>> +
>>>> + /* The visibility map need to be cleared */
>>>> + if ((xlrec->infobits_set & XLHL_ALL_VISIBLE_CLEARED) != 0)
>>>> + {
>>>> + RelFileNode rnode;
>>>> + Buffer vmbuffer = InvalidBuffer;
>>>> + BlockNumber blkno;
>>>> + Relation reln;
>>>> +
>>>> + XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
>>>> + reln = CreateFakeRelcacheEntry(rnode);
>>>> +
>>>> + visibilitymap_pin(reln, blkno, &vmbuffer);
>>>> + visibilitymap_clear(reln, blkno, vmbuffer);
>>>> + PageClearAllVisible(page);
>>>> + }
>>>> +
>>>
>>>
>>>> PageSetLSN(page, lsn);
>>>> MarkBufferDirty(buffer);
>>>> }
>>>> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
>>>> index a822d0b..41b3c54 100644
>>>> --- a/src/include/access/heapam_xlog.h
>>>> +++ b/src/include/access/heapam_xlog.h
>>>> @@ -242,6 +242,7 @@ typedef struct xl_heap_cleanup_info
>>>> #define XLHL_XMAX_EXCL_LOCK 0x04
>>>> #define XLHL_XMAX_KEYSHR_LOCK 0x08
>>>> #define XLHL_KEYS_UPDATED 0x10
>>>> +#define XLHL_ALL_VISIBLE_CLEARED 0x20
>>>
>>> Hm. We can't easily do that in the back-patched version; because a
>>> standby won't know to check for the flag . That's kinda ok, since we
>>> don't yet need to clear all-visible yet at that point of
>>> heap_update. But that better means we don't do so on the master either.
>>>
>>
>> Attached latest version patch.
>
> + /* Clear only the all-frozen bit on visibility map if needed */
>
> + if (PageIsAllVisible(BufferGetPage(buffer)) &&
>
> + VM_ALL_FROZEN(relation, block, &vmbuffer))
> + {
> + visibilitymap_clear_extended(relation, block, vmbuffer,
> + VISIBILITYMAP_ALL_FROZEN);
> + }
> +
>
> + if (RelationNeedsWAL(relation))
> + {
> ..
>
> + XLogBeginInsert();
> + XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
> +
> + xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
> + xlrec.locking_xid = xmax_old_tuple;
> + xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
> + oldtup.t_data->t_infomask2);
> + XLogRegisterData((char *) &xlrec, SizeOfHeapLock);
> + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
> ..
>
> One thing that looks awkward in this code is that it doesn't record
> whether the frozen bit is actually cleared during the actual operation
> and then during replay, it always clear the frozen bit irrespective of
> whether it has been cleared by the actual operation or not.
>

I changed it so that we look all-frozen bit up first, and then clear
it if needed.

> + /* Clear only the all-frozen bit on visibility map if needed */
> + if (PageIsAllVisible(page) &&
> + VM_ALL_FROZEN(relation, BufferGetBlockNumber(*buffer), &vmbuffer))
> + {
> + BlockNumber block = BufferGetBlockNumber(*buffer);
> +
> + visibilitymap_pin(relation, block, &vmbuffer);
>
> I think it is not right to call visibilitymap_pin after holding a
> buffer lock (visibilitymap_pin can perform I/O). Refer heap_update
> for how to pin the visibility map.
>

Thank you for your advice!
Fixed.

Attached separated two patched, please give me feedbacks.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
0001-Fix-heap_udpate-set-xmax-without-WAL-logging-in-the-.patch text/plain 2.0 KB
0002-Clear-all-frozen-bit-on-visibility-map-when-xmax-is-.patch text/plain 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-07-11 15:20:59 Re: pg_xlogdump bad error msg?
Previous Message Tom Lane 2016-07-11 14:46:21 Re: Showing parallel status in \df+