Re: free space map and visibility map

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: free space map and visibility map
Date: 2017-03-27 07:49:08
Message-ID: CAD21AoAnCw8y37dJSEhdbpue5H1v5FVLKUA_uh2MhZe331HNyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA(at)mail(dot)gmail(dot)com>
>> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
>> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ(at)mail(dot)
>> > gmail.com>
>> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> > wrote:
>> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
>> > wrote:
>> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which
>> > then
>> > > >> can't leave the block as all visible or all frozen). I think the
>> > issue is
>> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
>> > correctly,
>> > > >> that neither of those ever update the FSM, regardless of FPI?
>> > > >
>> > > > Yes, updates to the FSM are never logged. Forcing replay of
>> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> > > >
>> > >
>> > > I think I was missing something. I imaged your situation is that FPI
>> > > is replayed during crash recovery after the crashed server vacuums the
>> > > page and marked it as all-frozen. But this situation is also resolved
>> > > by that solution.
>> >
>> > # HEAP2_CLEAN is issued in lazy_vacuum_page
>> >
>> > It will work but I'm not sure it is right direction for
>> > HEAP2_FREEZE_PAGE to touch FSM.
>> >
>> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
>> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
>> > think only the latter can happen. The comment in heap_xlog_clean
>> > below is right generally but if a page filled with tuples becomes
>> > almost empty and freezable by this cleanup, a problematic
>> > situation like this occurs.
>> >
>>
>> I now think this is not the cause of the problem I am seeing. I made the
>> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
>> did not fix it. With frequent crashes, it still accumulated a lot of
>> frozen and empty (but full according to FSM) pages. I also set up replica
>> streaming and turned off crashing on the master, and the FSM of the replica
>> stays accurate, so the WAL stream and replay logic is doing the right thing
>> on the replica.
>>
>> I now think the dirtied FSM pages are somehow not getting marked as dirty,
>> or are getting marked as dirty but somehow the checkpoint is skipping
>> them. It looks like MarkBufferDirtyHint does do some operations unlocked
>> which could explain lost update, but it seems unlikely that that would
>> happen often enough to see the amount of lost updates I am seeing.
>
> Hmm.. clearing dirty hint seems already protected by exclusive
> lock. And I think it can occur without lock failure.
>
> Other than by FPI, FSM update is omitted when record LSN is older
> than page LSN. If heap page is evicted but FSM page is not after
> vacuuming and before power cut, replaying HEAP2_CLEAN skips
> update of FSM even though FPI is not attached. Of course this
> cannot occur on standby. One FSM page covers as many heap pages
> as about 4k, so FSM can stay far longer than heap pages.
>
> ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> is already empty when entering lazy_sacn_heap, or a page of
> non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> issued to set ALL_FROZEN.
>
> Perhaps the problem will be fixed by forcing heap_xlog_visible to
> update FSM (addition to FREEZE_PAGE), or the same in
> heap_xlog_clean. (As menthined in the previous mail, I prefer the
> latter.)

Maybe it's enough just to make both heap_xlog_visible and
heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
might be unnecessary). Because the problem happens on the page that is
full according to FSM but is empty and marked as all-visible or
all-frozen. Though heap_xlog_clean loads the heap page to the memory
for redo operation, forcing heap_xlog_clean to update FSM might be
overkill for this solution. Because it can happen on every pages that
are not marked as neither all-visible nor all-frozen. Basically 100%
accuracy of FSM is not required. On the other hand, if we makes
heap_xlog_visible updates the FSM, it requires to load both heap page
and FSM page, which can also be overhead. Another idea is, we can
heap_xlog_visible to have the freespace of corresponding heap page,
and then update FSM during recovery.

>
>> > > /*
>> > > * Update the FSM as well.
>> > > *
>> > > * XXX: Don't do this if the page was restored from full page image. We
>> > > * don't bother to update the FSM in that case, it doesn't need to be
>> > > * totally accurate anyway.
>> > > */
>> >
>>
>> What does that save us? If we restored from FPI, we already have the block
>> in memory (we don't need to see the old version, just the new one), so it
>> doesn't save us a random read IO.
>
> Updates on random pages can cause visits to many unloaded FSM
> pages. It may be intending to avoid that. Or, especially for
> INSERT, successive operations tends to occur on the same heap
> page, the complexity of calculating FSM wouldn't be so small
> relatively. FMS tells a lie that the page has spare space after
> that but it doesn't harm. But I think that the things are
> different for operations that increments free space.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-03-27 07:51:32 Re: Monitoring roles patch
Previous Message Craig Ringer 2017-03-27 07:42:08 Re: PATCH: Batch/pipelining support for libpq