From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BF mamba failure |
Date: | 2023-03-20 06:10:46 |
Message-ID: | CAHut+PvVrjwJm_9ZqnXJk4x9k8dN0dYrV+T5_Rd30BSneDhv1A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Mar 19, 2023 at 2:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> Hi,
>
> 18.03.2023 07:26, Tom Lane wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>
> Peter Smith has recently reported a BF failure [1]. AFAICS, the call
> stack of failure [2] is as follows:
>
> Note the assertion report a few lines further up:
>
> TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 560, PID: 25004
>
>
> This assertion failure can be reproduced easily with the attached patch:
> ============== running regression test queries ==============
> test oldest_xmin ... ok 55 ms
> test oldest_xmin ... FAILED (test process exited with exit code 1) 107 ms
> test oldest_xmin ... FAILED (test process exited with exit code 1) 8 ms
> ============== shutting down postmaster ==============
>
> contrib/test_decoding/output_iso/log/postmaster.log contains:
> TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount) == 0"), File: "pgstat_shmem.c", Line: 561, PID: 456844
>
> With the sleep placed above Assert(entry_ref->shared_entry->dropped) this Assert fails too.
>
> Best regards,
> Alexander
I used a slightly modified* patch of Alexander's [1] applied to the
latest HEAD code (but with my "toptxn" patch reverted).
--- the patch was modified in that I injected 'sleep' both above and
below the Assert(entry_ref->shared_entry->dropped).
Using this I was also able to reproduce the problem. But test failures
were rare. The make check-world seemed OK, and indeed the
test_decoding tests would also appear to PASS around 14 out of 15
times.
============== running regression test queries ==============
test oldest_xmin ... ok 342 ms
test oldest_xmin ... ok 121 ms
test oldest_xmin ... ok 283 ms
============== shutting down postmaster ==============
============== removing temporary instance ==============
=====================
All 3 tests passed.
=====================
~~
Often (but not always) depite the test_decoding reported PASS all 3
tests as "ok", I still observed there was a TRAP in the logfile
(contrib/test_decoding/output_iso/log/postmaster.log).
TRAP: failed Assert("entry_ref->shared_entry->dropped")
~~
Occasionally (about 1 in 15 test runs) the test would fail the same
way as described by Alexander [1], with the accompanying TRAP.
TRAP: failed Assert("pg_atomic_read_u32(&entry_ref->shared_entry->refcount)
== 0"), File: "pgstat_shmem.c", Line: 562, PID: 32013
============== running regression test queries ==============
test oldest_xmin ... ok 331 ms
test oldest_xmin ... ok 91 ms
test oldest_xmin ... FAILED 702 ms
============== shutting down postmaster ==============
======================
1 of 3 tests failed.
======================
~~
FWIW, the "toptxn" patch. whose push coincided with the build-farm
error I first reported [2], turns out to be an innocent party in this
TRAP. We know this because all of the above results were running using
HEAD code but with that "toptxn" patch reverted.
------
[1] https://www.postgresql.org/message-id/1941b7e2-be7c-9c4c-8505-c0fd05910e9a%40gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPsHdWFjU43VEX%2BR-8de6dFQ-_JWrsqs%3DvWek1hULexP4Q%40mail.gmail.com
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2023-03-20 06:17:51 | Re: Privileges on PUBLICATION |
Previous Message | Masahiko Sawada | 2023-03-20 05:24:38 | Re: [PoC] Improve dead tuple storage for lazy vacuum |