Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
Date: 2023-11-15 21:21:45
Message-ID: CAAKRu_a7oMNTWh3T-=vM=VNuV6pV_fMHnb5M69OQ6oPhx2201A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 14, 2023 at 7:15 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-11-14 07:46:10 -0500, Melanie Plageman wrote:
> > > FreeSpaceMapVacuumRange()'s comment says:
> > > * As above, but assume that only heap pages between start and end-1 inclusive
> > > * have new free-space information, so update only the upper-level slots
> > > * covering that block range. end == InvalidBlockNumber is equivalent to
> > > * "all the rest of the relation".
> > >
> > > So FreeSpaceMapVacuumRange(..., blkno) will not actually process the "effects"
> > > of the RecordPageWithFreeSpace() above it - which seems confusing.
> >
> > Ah, so shall I pass blkno + 1 as end?
>
> I think there's no actual overflow danger, because MaxBlockNumber + 1 is
> InvalidBlockNumber, which scans the rest of the relation (i.e. exactly the
> intended block). Perhaps worth noting?

Attached

> > > =# DELETE FROM copytest_0;
> > > =# VACUUM (VERBOSE) copytest_0;
> > > ...
> > > INFO: 00000: table "copytest_0": truncated 146264 to 110934 pages
> > > ...
> > > tuples missed: 5848 dead from 89 pages not removed due to cleanup lock contention
> > > ...
> > >
> > > A bit of debugging later I figured out that this is due to the background
> > > writer. If I SIGSTOP bgwriter, the whole relation is truncated...
> >
> > That's a bit sad. But isn't that what you would expect bgwriter to do?
> > Write out dirty buffers? It doesn't know that those pages consist of
> > only dead tuples and that you will soon truncate them.
>
> I think the issue more that it's feels wrong that a pin by bgwriter blocks
> vacuum cleaning up. I think the same happens with on-access pruning - where
> it's more likely for bgwriter to focus on those pages. Checkpointer likely
> causes the same. Even normal backends can cause this while writing out the
> page.
>
> It's not like bgwriter/checkpointer would actually have a problem if the page
> were pruned - the page is locked while being written out and neither keep
> pointers into the page after unlocking it again.
>
>
> At this point I started to wonder if we should invent a separate type of pin
> for a buffer undergoing IO. We basically have the necessary state already:
> BM_IO_IN_PROGRESS. We'd need to look at that in some places, e.g. in
> InvalidateBuffer(), instead of BUF_STATE_GET_REFCOUNT(), we'd need to also
> look at BM_IO_IN_PROGRESS.
>
>
> Except of course that that'd not change anything -
> ConditionalLockBufferForCleanup() locks the buffer conditionally, *before*
> even looking at the refcount and returns false if not. And writing out a
> buffer takes a content lock. Which made me realize that
> "tuples missed: 5848 dead from 89 pages not removed due to cleanup lock contention"
>
> is often kinda wrong - the contention is *not* cleanup lock specific. It's
> often just plain contention on the lwlock.

Do you think we should change the error message?

> Perhaps we should just treat IO_IN_PROGRESS buffers differently in
> lazy_scan_heap() and heap_page_prune_opt() and wait?

Hmm. This is an interesting idea. lazy_scan_heap() waiting for
checkpointer to write out a buffer could have an interesting property
of shifting who writes out the FPI. I wonder if it would matter.

- Melanie

Attachment Content-Type Size
v2-0001-Release-lock-on-heap-buffer-before-vacuuming-FSM.patch text/x-patch 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-11-15 21:32:48 Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM
Previous Message Peter Eisentraut 2023-11-15 21:20:41 Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression