massive FPI_FOR_HINT load after promote

From: Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: James Coleman <jtc331(at)gmail(dot)com>
Subject: massive FPI_FOR_HINT load after promote
Date: 2020-08-10 22:56:37
Message-ID: 20200810225637.GA2424@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Last week, James reported to us that after promoting a replica, some
seqscan was taking a huge amount of time; on investigation he saw that
there was a high rate of FPI_FOR_HINT wal messages by the seqscan.
Looking closely at the generated traffic, HEAP_XMIN_COMMITTED was being
set on some tuples.

Now this may seem obvious to some as a drawback of the current system,
but I was taken by surprise. The problem was simply that when a page is
examined by a seqscan, we do HeapTupleSatisfiesVisibility of each tuple
in isolation; and for each tuple we call SetHintBits(). And only the
first time the FPI happens; by the time we get to the second tuple, the
page is already dirty, so there's no need to emit an FPI. But the FPI
we sent only had the bit on the first tuple ... so the standby will not
have the bit set for any subsequent tuple. And on promotion, the
standby will have to have the bits set for all those tuples, unless you
happened to dirty the page again later for other reasons.

So if you have some table where tuples gain hint bits in bulk, and
rarely modify the pages afterwards, and promote before those pages are
frozen, then you may end up with a massive amount of pages that will
need hinting after the promote, which can become troublesome.

Attached is a TAP file that reproduces the problem. It always fails,
but in the log file you can see the tuples in the primary are all hinted
committed, while on the standby only the first one is hinted committed.

One simple idea to try to forestall this problem would be to modify the
algorithm so that all tuples are scanned and hinted if the page is going
to be dirtied -- then send a single FPI setting bits for all tuples,
instead of just on the first tuple.

--
Álvaro Herrera

Attachment Content-Type Size
021_hintbits.pl text/x-perl 950 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-08-10 23:23:15 Re: Replace remaining StrNCpy() by strlcpy()
Previous Message Robert Haas 2020-08-10 22:27:17 Re: Add LWLock blocker(s) information