Re: free space map and visibility map

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: jeff(dot)janes(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: free space map and visibility map
Date: 2017-03-27 05:38:27
Message-ID: 20170327.143827.52033775.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA(at)mail(dot)gmail(dot)com>
> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ(at)mail(dot)
> > gmail.com>
> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> > wrote:
> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> > wrote:
> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update? (Which
> > then
> > > >> can't leave the block as all visible or all frozen). I think the
> > issue is
> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
> > correctly,
> > > >> that neither of those ever update the FSM, regardless of FPI?
> > > >
> > > > Yes, updates to the FSM are never logged. Forcing replay of
> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> > > >
> > >
> > > I think I was missing something. I imaged your situation is that FPI
> > > is replayed during crash recovery after the crashed server vacuums the
> > > page and marked it as all-frozen. But this situation is also resolved
> > > by that solution.
> >
> > # HEAP2_CLEAN is issued in lazy_vacuum_page
> >
> > It will work but I'm not sure it is right direction for
> > HEAP2_FREEZE_PAGE to touch FSM.
> >
> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> > think only the latter can happen. The comment in heap_xlog_clean
> > below is right generally but if a page filled with tuples becomes
> > almost empty and freezable by this cleanup, a problematic
> > situation like this occurs.
> >
>
> I now think this is not the cause of the problem I am seeing. I made the
> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
> did not fix it. With frequent crashes, it still accumulated a lot of
> frozen and empty (but full according to FSM) pages. I also set up replica
> streaming and turned off crashing on the master, and the FSM of the replica
> stays accurate, so the WAL stream and replay logic is doing the right thing
> on the replica.
>
> I now think the dirtied FSM pages are somehow not getting marked as dirty,
> or are getting marked as dirty but somehow the checkpoint is skipping
> them. It looks like MarkBufferDirtyHint does do some operations unlocked
> which could explain lost update, but it seems unlikely that that would
> happen often enough to see the amount of lost updates I am seeing.

Hmm.. clearing dirty hint seems already protected by exclusive
lock. And I think it can occur without lock failure.

Other than by FPI, FSM update is omitted when record LSN is older
than page LSN. If heap page is evicted but FSM page is not after
vacuuming and before power cut, replaying HEAP2_CLEAN skips
update of FSM even though FPI is not attached. Of course this
cannot occur on standby. One FSM page covers as many heap pages
as about 4k, so FSM can stay far longer than heap pages.

ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
is already empty when entering lazy_sacn_heap, or a page of
non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
issued to set ALL_FROZEN.

Perhaps the problem will be fixed by forcing heap_xlog_visible to
update FSM (addition to FREEZE_PAGE), or the same in
heap_xlog_clean. (As menthined in the previous mail, I prefer the
latter.)

> > > /*
> > > * Update the FSM as well.
> > > *
> > > * XXX: Don't do this if the page was restored from full page image. We
> > > * don't bother to update the FSM in that case, it doesn't need to be
> > > * totally accurate anyway.
> > > */
> >
>
> What does that save us? If we restored from FPI, we already have the block
> in memory (we don't need to see the old version, just the new one), so it
> doesn't save us a random read IO.

Updates on random pages can cause visits to many unloaded FSM
pages. It may be intending to avoid that. Or, especially for
INSERT, successive operations tends to occur on the same heap
page, the complexity of calculating FSM wouldn't be so small
relatively. FMS tells a lie that the page has spare space after
that but it doesn't harm. But I think that the things are
different for operations that increments free space.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-03-27 05:43:14 Re: WIP: Faster Expression Processing v4
Previous Message Rushabh Lathia 2017-03-27 05:29:21 Re: crashes due to setting max_parallel_workers=0