Re: massive FPI_FOR_HINT load after promote

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, James Coleman <jtc331(at)gmail(dot)com>
Subject: Re: massive FPI_FOR_HINT load after promote
Date: 2020-08-11 06:55:11
Message-ID: CA+fd4k5hSTFnWNs0wziETSL5AH-d4kZRgF3Orb3wBMdTJ11zbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 11 Aug 2020 at 07:56, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>
> Last week, James reported to us that after promoting a replica, some
> seqscan was taking a huge amount of time; on investigation he saw that
> there was a high rate of FPI_FOR_HINT wal messages by the seqscan.
> Looking closely at the generated traffic, HEAP_XMIN_COMMITTED was being
> set on some tuples.
>
> Now this may seem obvious to some as a drawback of the current system,
> but I was taken by surprise. The problem was simply that when a page is
> examined by a seqscan, we do HeapTupleSatisfiesVisibility of each tuple
> in isolation; and for each tuple we call SetHintBits(). And only the
> first time the FPI happens; by the time we get to the second tuple, the
> page is already dirty, so there's no need to emit an FPI. But the FPI
> we sent only had the bit on the first tuple ... so the standby will not
> have the bit set for any subsequent tuple. And on promotion, the
> standby will have to have the bits set for all those tuples, unless you
> happened to dirty the page again later for other reasons.
>
> So if you have some table where tuples gain hint bits in bulk, and
> rarely modify the pages afterwards, and promote before those pages are
> frozen, then you may end up with a massive amount of pages that will
> need hinting after the promote, which can become troublesome.

Did the case you observed not use hot standby? I thought the impact of
this issue could be somewhat alleviated in hot standby cases since
read queries on the hot standby can set hint bits.

>
> One simple idea to try to forestall this problem would be to modify the
> algorithm so that all tuples are scanned and hinted if the page is going
> to be dirtied -- then send a single FPI setting bits for all tuples,
> instead of just on the first tuple.
>

This idea seems good to me but I'm concerned a bit that the
probability of concurrent processes writing FPI for the same page
might get higher since concurrent processes could set hint bits at the
same time. If it's true, I wonder if we can advertise hint bits are
being updated to prevent concurrent FPI writes for the same page.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-08-11 07:09:22 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Michael Paquier 2020-08-11 06:29:56 Re: Add information to rm_redo_error_callback()