Re: BUG #17695: Failed Assert in logical replication snapbuild.

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: bowenshi <zxwsbg(at)qq(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17695: Failed Assert in logical replication snapbuild.
Date: 2023-05-18 14:00:00
Message-ID: 7e4d4a80-3e3c-231f-f886-6cada2aa582b@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Sawada-san,

17.05.2023 08:34, Masahiko Sawada wrote:
>
> When it comes to the original issue, I already shared the reproducible
> steps[4] and I've confirmed again with the steps that the issue still
> happens on 14 or later and the patch . However I don't find a way to
> reproduce it without sleep/gdb attach.

I can easily (without gdb and sleep()) reproduce the issue on master with
the following script:
numclients=10
rm -rf contrib/test_decoding_*
for ((c=1;c<=numclients;c++)); do
  cp -r contrib/test_decoding contrib/test_decoding_$c
done

for ((c=1;c<=numclients;c++)); do
  EXTRA_REGRESS_OPTS="--dbname=regress_$c" make -s installcheck-force -C contrib/test_decoding_$c USE_MODULE_DB=1
>"installcheck-$c.log" 2>&1 &
done
wait

It leads to:
TRAP: failed Assert("builder->next_phase_at == InvalidTransactionId"), File: "snapbuild.c", Line: 1628, PID: 907918
...
2023-05-18 16:23:33.290 MSK [907502] LOG:  server process (PID 907918) was terminated by signal 6: Aborted
2023-05-18 16:23:33.290 MSK [907502] DETAIL:  Failed process was running: SELECT count(*) FROM
pg_logical_slot_get_changes('regression_slot_stats1', NULL, NULL, 'skip-empty-xacts', '1');

...
Core was generated by `postgres: postgres regress_10 [local] SELECT                  '.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/907918' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140405033059264) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140405033059264) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140405033059264) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140405033059264, signo=signo(at)entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fb29a0cc476 in __GI_raise (sig=sig(at)entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fb29a0b27f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000557371bd57bb in ExceptionalCondition (
    conditionName=conditionName(at)entry=0x557371d56860 "builder->next_phase_at == InvalidTransactionId",
    fileName=fileName(at)entry=0x557371d572e7 "snapbuild.c", lineNumber=lineNumber(at)entry=1628) at assert.c:66
#6  0x0000557371a28a29 in SnapBuildSerialize (builder=builder(at)entry=0x557372879158, lsn=lsn(at)entry=312723008)
    at snapbuild.c:1628
#7  0x0000557371a2a657 in SnapBuildProcessRunningXacts (builder=builder(at)entry=0x557372879158, lsn=312723008,
    running=running(at)entry=0x557373095190) at snapbuild.c:1230
...

If it would be helpful, I can reduce it to concrete sql queries.

Best regards,
Alexander

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2023-05-18 22:15:24 Re: llvmjit.so: undefined symbol: LLVMBuildGEP Fedora 38
Previous Message Tom Lane 2023-05-18 13:28:33 Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)