Quick Links

Fwd: free space map and visibility map

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Fwd: free space map and visibility map
Date:	2017-03-28 15:50:58
Message-ID:	CAMkU=1zKfqGePWG+qqKthmWERBn8UAA2_9Sb+qTUUREhFkqLCA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I accidentally sent this off-list, sending to the list now:

On Sun, Mar 26, 2017 at 10:38 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote in <CAMkU=1x3+DPsfSU+AF7WAzAVugmEhUA2+jNf7SuAL-MSKQ+_KA(at)mail(dot)
> gmail.com>
> > On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada <
> sawada(dot)mshk(at)gmail(dot)com>
> > > wrote in <CAD21AoAq2YHs3MvSky6TxX1oKqyiPwUphdSa2sJCab_V4ci4VQ(at)mail(dot)
> > > gmail.com>
> > > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas <robertmhaas(at)gmail(dot)com
> >
> > > wrote:
> > > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> > > wrote:
> > > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?
> (Which
> > > then
> > > > >> can't leave the block as all visible or all frozen). I think the
> > > issue is
> > > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE. Am I reading this
> > > correctly,
> > > > >> that neither of those ever update the FSM, regardless of FPI?
> > > > >
> > > > > Yes, updates to the FSM are never logged. Forcing replay of
> > > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
> > > > >
> > > >
> > > > I think I was missing something. I imaged your situation is that FPI
> > > > is replayed during crash recovery after the crashed server vacuums
> the
> > > > page and marked it as all-frozen. But this situation is also resolved
> > > > by that solution.
> > >
> > > # HEAP2_CLEAN is issued in lazy_vacuum_page
> > >
> > > It will work but I'm not sure it is right direction for
> > > HEAP2_FREEZE_PAGE to touch FSM.
> > >
> > > As Masahiko said, the situation must be created by HEAP2_VISIBLE
> > > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
> > > think only the latter can happen. The comment in heap_xlog_clean
> > > below is right generally but if a page filled with tuples becomes
> > > almost empty and freezable by this cleanup, a problematic
> > > situation like this occurs.
> > >
> >
> > I now think this is not the cause of the problem I am seeing. I made the
> > replay of FREEZE_PAGE update the FSM (both with and without FPI), but
> that
> > did not fix it. With frequent crashes, it still accumulated a lot of
> > frozen and empty (but full according to FSM) pages. I also set up
> replica
> > streaming and turned off crashing on the master, and the FSM of the
> replica
> > stays accurate, so the WAL stream and replay logic is doing the right
> thing
> > on the replica.
> >
> > I now think the dirtied FSM pages are somehow not getting marked as
> dirty,
> > or are getting marked as dirty but somehow the checkpoint is skipping
> > them. It looks like MarkBufferDirtyHint does do some operations unlocked
> > which could explain lost update, but it seems unlikely that that would
> > happen often enough to see the amount of lost updates I am seeing.
>
> Hmm.. clearing dirty hint seems already protected by exclusive
> lock. And I think it can occur without lock failure.
>
> Other than by FPI, FSM update is omitted when record LSN is older
> than page LSN. If heap page is evicted but FSM page is not after
> vacuuming and before power cut, replaying HEAP2_CLEAN skips
> update of FSM even though FPI is not attached. Of course this
> cannot occur on standby. One FSM page covers as many heap pages
> as about 4k, so FSM can stay far longer than heap pages.
>

This corresponds to action == BLK_DONE case, right?

>
> ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> is already empty when entering lazy_sacn_heap, or a page of
> non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> issued to set ALL_FROZEN.
>
> Perhaps the problem will be fixed by forcing heap_xlog_visible to
> update FSM (addition to FREEZE_PAGE), or the same in
> heap_xlog_clean. (As menthined in the previous mail, I prefer the
> latter.)
>

When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on
BLK_DONE), it solves the problem I was seeing. Which still leaves me
wondering why the problem doesn't show up on the standby because, unlike
BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on
a recovering master, shouldn't it? Maybe the difference is that the
existence a replication slot delays the clean up in a way that causes a
different pattern of WAL records.

> > > > /*
> > > > * Update the FSM as well.
> > > > *
> > > > * XXX: Don't do this if the page was restored from full page image.
> We
> > > > * don't bother to update the FSM in that case, it doesn't need to be
> > > > * totally accurate anyway.
> > > > */
> > >
> >
> > What does that save us? If we restored from FPI, we already have the
> block
> > in memory (we don't need to see the old version, just the new one), so it
> > doesn't save us a random read IO.
>
> Updates on random pages can cause visits to many unloaded FSM
> pages. It may be intending to avoid that.

But I think that that would be no worse for BLK_RESTORED than it is for
BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth
optimizing either one?

Cheers,

Jeff

Attachment	Content-Type	Size
fsm_clean.patch	application/octet-stream	1.8 KB

In response to

Re: free space map and visibility map at 2017-03-27 05:38:27 from Kyotaro HORIGUCHI

Responses

Re: free space map and visibility map at 2017-03-29 01:40:07 from Kyotaro HORIGUCHI

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2017-03-28 15:51:20	Re: Removing binaries
Previous Message	Robert Haas	2017-03-28 15:50:35	Re: O(1) DSM handle operations