Re: massive FPI_FOR_HINT load after promote

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, James Coleman <jtc331(at)gmail(dot)com>
Subject: Re: massive FPI_FOR_HINT load after promote
Date: 2020-08-11 17:41:59
Message-ID: 20200811174159.GA9374@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-Aug-11, Masahiko Sawada wrote:

> On Tue, 11 Aug 2020 at 07:56, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:

> > So if you have some table where tuples gain hint bits in bulk, and
> > rarely modify the pages afterwards, and promote before those pages are
> > frozen, then you may end up with a massive amount of pages that will
> > need hinting after the promote, which can become troublesome.
>
> Did the case you observed not use hot standby? I thought the impact of
> this issue could be somewhat alleviated in hot standby cases since
> read queries on the hot standby can set hint bits.

Oh, interesting, I didn't know that. However, it's not 100% true: the
standby can set the bit in shared buffers, but it does not mark the page
dirty. So when the page is evicted, those bits that were set are lost.
That's not great. See MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() &&
(pg_atomic_read_u32(&bufHdr->state) & BM_PERMANENT))
{
/*
* If we must not write WAL, due to a relfilenode-specific
* condition or being in recovery, don't dirty the page. We can
* set the hint, just not dirty the page as a result so the hint
* is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress() ||
RelFileNodeSkippingWAL(bufHdr->tag.rnode))
return;

> > One simple idea to try to forestall this problem would be to modify the
> > algorithm so that all tuples are scanned and hinted if the page is going
> > to be dirtied -- then send a single FPI setting bits for all tuples,
> > instead of just on the first tuple.
>
> This idea seems good to me but I'm concerned a bit that the
> probability of concurrent processes writing FPI for the same page
> might get higher since concurrent processes could set hint bits at the
> same time. If it's true, I wonder if we can advertise hint bits are
> being updated to prevent concurrent FPI writes for the same page.

Hmm, a very good point. Sounds like we would need to obtain an
exclusive lock on the page .. but that would be very problematic.

I don't have a concrete proposal to solve this problem ATM, but it's
more and more looking like it's a serious problem.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-11 17:46:48 Re: autovac issue with large number of tables
Previous Message Andres Freund 2020-08-11 17:22:41 Re: posgres 12 bug (partitioned table)