Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "Amit(dot)Kapila(at)fujitsu(dot)com" <Amit(dot)Kapila(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication
Date: 2021-05-06 12:40:18
Message-ID: CAD21AoDzgc8_MYrA5m1fyydomw_eVKtQiYh7sfDK4KEhdMsf_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 29, 2021 at 1:22 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Apr 28, 2021 at 7:34 PM tanghy(dot)fnst(at)fujitsu(dot)com
> <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> > >TRAP: FailedAssertion("!all_visible_according_to_vm || prunestate.all_visible", File: "vacuumlazy.c", Line: 1347, PID: 1274675)
> >
> > BTW, in asynchronous mode, the same problem can also happen but in a low frequency.(I tried many times, but the problem happened only 2 times)
> > As for synchronous mode, I found it seems easier to reproduce the problem with setting "autovacuum_naptime = 1".
> > But it still can't be 100% to reproduced it. (I tested it 5 times, 3 of them reproduced it.)

Thanks for reporting the issue!

>
> Is setting all_visible_according_to_vm false as below enough to avoid
> the assertion failure?
>
> diff --git a/src/backend/access/heap/vacuumlazy.c
> b/src/backend/access/heap/vacuumlazy.c
> index c3fc12d76c..76c17e063e 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1146,6 +1146,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
> *params, bool aggressive)
> {
> ReleaseBuffer(vmbuffer);
> vmbuffer = InvalidBuffer;
> + all_visible_according_to_vm = false;
> }
>
> /* Remove the collected garbage tuples from table and indexes */

Since we set all_visible_according_to_vm before acquiring the buffer
lock it's likely to happen that the page gets modified and all-visible
bit is cleared after setting true to all_visible_according_to_vm. This
assertion can easily be reproduced by adding a delay before the buffer
lock and invoking autovacuums frequently:

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index c3fc12d76c..76f067a7e4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1180,6 +1180,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno,
RBM_NORMAL, vacrel->bstrategy);

+ pg_usleep(100000);
+
/*
* We need buffer cleanup lock so that we can prune HOT chains and
* defragment the page.

So we should recheck also visibility map bit there but I think we can
remove this assertion since we already do that in later code and we
don’t treat this case as a should-not-happen case:

/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if
* the page-level bit is clear. However, it's possible that the bit
* got cleared after we checked it and before we took the buffer
* content lock, so we must recheck before jumping to the conclusion
* that something bad has happened.
*/
else if (all_visible_according_to_vm && !PageIsAllVisible(page)
&& VM_ALL_VISIBLE(vacrel->rel, blkno, &vmbuffer))
{
elog(WARNING, "page is not marked all-visible but
visibility map bit is set in relation \"%s\" page %u",
vacrel->relname, blkno);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
}

I've attached a patch removing the assertion.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
remove_assertion_lazy_vacuum.patch application/x-patch 613 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-06 13:30:33 Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Previous Message Dilip Kumar 2021-05-06 12:32:29 Re: decoupling table and index vacuum