Quick Links

Re: free space map and visibility map

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	jeff(dot)janes(at)gmail(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: free space map and visibility map
Date:	2017-03-29 01:40:07
Message-ID:	20170329.104007.89580821.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

At Tue, 28 Mar 2017 08:50:58 -0700, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote in <CAMkU=1zKfqGePWG+qqKthmWERBn8UAA2_9Sb+qTUUREhFkqLCA(at)mail(dot)gmail(dot)com>
> > > I now think this is not the cause of the problem I am seeing. I made the
> > > replay of FREEZE_PAGE update the FSM (both with and without FPI), but
> > that
> > > did not fix it. With frequent crashes, it still accumulated a lot of
> > > frozen and empty (but full according to FSM) pages. I also set up
> > replica
> > > streaming and turned off crashing on the master, and the FSM of the
> > replica
> > > stays accurate, so the WAL stream and replay logic is doing the right
> > thing
> > > on the replica.
> > >
> > > I now think the dirtied FSM pages are somehow not getting marked as
> > dirty,
> > > or are getting marked as dirty but somehow the checkpoint is skipping
> > > them. It looks like MarkBufferDirtyHint does do some operations unlocked
> > > which could explain lost update, but it seems unlikely that that would
> > > happen often enough to see the amount of lost updates I am seeing.
> >
> > Hmm.. clearing dirty hint seems already protected by exclusive
> > lock. And I think it can occur without lock failure.
> >
> > Other than by FPI, FSM update is omitted when record LSN is older
> > than page LSN. If heap page is evicted but FSM page is not after
> > vacuuming and before power cut, replaying HEAP2_CLEAN skips
> > update of FSM even though FPI is not attached. Of course this
> > cannot occur on standby. One FSM page covers as many heap pages
> > as about 4k, so FSM can stay far longer than heap pages.
> >
>
> This corresponds to action == BLK_DONE case, right?

Yes. WAL with older LSN results in BLK_DONE. It works as long as
heap page and FSM are consistent but leaves FSM broken during
crach-recovery for the situation.

> > ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> > is already empty when entering lazy_sacn_heap, or a page of
> > non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> > issued to set ALL_FROZEN.
> >
> > Perhaps the problem will be fixed by forcing heap_xlog_visible to
> > update FSM (addition to FREEZE_PAGE), or the same in
> > heap_xlog_clean. (As menthined in the previous mail, I prefer the
> > latter.)
> >
>
> When I make heap_xlog_clean update FSM even on BLK_RESTORED (but not on
> BLK_DONE), it solves the problem I was seeing. Which still leaves me
> wondering why the problem doesn't show up on the standby because, unlike
> BLK_DONE, BLK_RESTORED should have the same issue on standby as it does on
> a recovering master, shouldn't it? Maybe the difference is that the
> existence a replication slot delays the clean up in a way that causes a
> different pattern of WAL records.

While all WAL records are new to target page during standby
recovery, several WAL records at the beginning can be old in
a crash-recovery.

> > > > > /*
> > > > > * Update the FSM as well.
> > > > > *
> > > > > * XXX: Don't do this if the page was restored from full page image.
> > We
> > > > > * don't bother to update the FSM in that case, it doesn't need to be
> > > > > * totally accurate anyway.
> > > > > */
> > > >
> > >
> > > What does that save us? If we restored from FPI, we already have the
> > block
> > > in memory (we don't need to see the old version, just the new one), so it
> > > doesn't save us a random read IO.
> >
> > Updates on random pages can cause visits to many unloaded FSM
> > pages. It may be intending to avoid that.
>
>
> But I think that that would be no worse for BLK_RESTORED than it is for
> BLK_NEEDS_REDO. Why optimize only one of the cases, if it is worth
> optimizing either one?

I agree with you. FPI increases and descreases free space just
the same as redoing WAL record. The following is the discussion
about that.

https://www.postgresql.org/message-id/49072021.7010801%40enterprisedb.com

https://www.postgresql.org/message-id/24334.1225205478%40sss.pgh.pa.us

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > One issue with this patch is that it doesn't update the FSM at all when
> > pages are restored from full page images. It would require fetching the
> > page and checking the free space on it, or peeking into the size of the
> > backup block data, and I'm not sure if it's worth the extra code to do that.
>
> I'd vote not to bother, at least not in the first cut. As you say, 100%
> accuracy isn't required, and I think that in typical scenarios an
> insert/update that causes a page to become full would be relatively less
> likely to have a full-page image.

So, the reason seems to be that it just doesn't seem necessary.

Including another branch of this thread, the following options
are proposed.

- Let FREEZE_PAGE and VISIBLE update FSM.

This causes extra fetch of a heap page, summing up of free
space and FSM update for every frozen pages.

- Let CLEAN always update FSM.

This causes extra counting of free space and FSM update for
every vacuuming of heap pages regardless of frozen-ness.

- Let FREEZE_PAGE/VISIBLE or CLEAN records have free space.

This doesn't need to fetch a heap page. But breaks the policy
(really?) that FSM is not WAL-logged, or that FSM is updated
just as the result of heap udpates.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Fwd: free space map and visibility map at 2017-03-28 15:50:58 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Claudio Freire	2017-03-29 01:46:12	Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message	Tom Lane	2017-03-29 01:23:29	Re: Getting server crash after running sqlsmith