Re: BUG #17695: Failed Assert in logical replication snapbuild.

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, bowenshi <zxwsbg(at)qq(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17695: Failed Assert in logical replication snapbuild.
Date: 2023-05-23 11:00:00
Message-ID: 102cb85d-2205-c8ec-ac37-797c03e025e1@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

22.05.2023 03:56, Masahiko Sawada wrote:
> On Thu, May 18, 2023 at 11:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
>> I can easily (without gdb and sleep()) reproduce the issue on master with
>> the following script:
>> ...
> Thank you for sharing the script. But it seems not stable as I could
> not reproduce the issue in my environment. I think we need a stable
> reproducer so that we can include it in core regression tests. Or it
> may be okay not to include it if we could not find a convenient way
> and the fix is trivial.

I've came to the minimal reproducer:
numclients=40
for ((c=1;c<=numclients;c++)); do
createdb regress_$c
done

for ((c=1;c<=numclients;c++)); do
(
echo "
CREATE TABLE replication_example(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_$c', 'test_decoding');
SELECT data FROM pg_logical_slot_get_changes('regression_slot_$c', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts',
'1');
" | psql -d regress_$c >psql-$c.log
) &
done
wait
grep TRAP server.log

(I've set
fsync = off
wal_level = logical
in postgresql.conf)

When using a build made with ASAN (and gcc-12), I get several asserts at once:
grep TRAP server.log  | wc -l
12
Without ASAN, I get no failures with numclients = 40, but still get series of
those with numclients=80...

It's hardly suitable for the regression test, but it clearly demonstrates the
issue without using gdb. With the fix from [1] applied, I've got no failures,
even with numclients=100, for 10 runs.

I also think, that the fix is simple enough to be committed without a
complicated/resource-intensive regression test.

[1] https://www.postgresql.org/message-id/CAD21AoDNv09ZMr-E%2BfNzhduvkE6eK2fjCRA7wJHOhF8APH5JdQ%40mail.gmail.com

Best regards,
Alexander

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dippu Kumar 2023-05-23 11:20:39 Re: Need Support to Upgrade from 13.6 to 15.3
Previous Message Alvaro Herrera 2023-05-23 09:45:13 Re: Need Support to Upgrade from 13.6 to 15.3