Re: START_REPLICATION SLOT causing a crash in an assert build

From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: START_REPLICATION SLOT causing a crash in an assert build
Date: 2022-09-13 05:39:45
Message-ID: YyAXoU4CQhlZ4/ZN@ahch-to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 07, 2022 at 12:39:08PM -0700, Andres Freund wrote:
> Hi,
>
> On 2022-09-06 18:40:49 -0500, Jaime Casanova wrote:
> > I'm not sure what is causing this, but I have seen this twice. The
> > second time without activity after changing the set of tables in a
> > PUBLICATION.

This crash happens after a reset of statistics for a slot replication

> Can you describe the steps to reproduce?
>

bin/pg_ctl -D data1 initdb
bin/pg_ctl -D data1 -l logfile1 -o "-c port=54315 -c wal_level=logical" start
bin/psql -p 54315 postgres <<EOF
create table t1 (i int primary key);
create publication pub1 for table t1;
EOF

bin/pg_ctl -D data2 initdb
bin/pg_ctl -D data2 -l logfile2 -o "-c port=54316" start
bin/psql -p 54316 postgres <<EOF
create table t1 (i int primary key);
create subscription sub1 connection 'host=/tmp port=54315 dbname=postgres' publication pub1;
EOF

bin/psql -p 54315 postgres <<EOF
select pg_stat_reset_replication_slot('sub1');
insert into t1 values(1);
EOF

> Which git commit does this happen on?
>

just tested again on f5047c1293acce3c6c3802b06825aa3a9f9aa55a

>
> > gdb says that debug_query_string contains:
> >
> > """
> > START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"')START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"')
> > """
> >
> > attached the backtrace.
> >
>
> > #2 0x00005559bfd4f0ed in ExceptionalCondition (
> > conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0", errorType=0x5559bff30e0d "FailedAssertion", fileName=0x5559bff30dbb "pgstat_replslot.c",
> > lineNumber=89) at assert.c:69
>
> what are statent->slotname and slot->data.name?
>

and the problem seems to be that after zero'ing the stats that includes
the name of the replication slot, this simple patch fixes it... not sure
if it's the right fix though...

--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment Content-Type Size
fix_reset_stat_replslot.patch text/x-diff 507 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-09-13 05:54:51 RE: why can't a table be part of the same publication as its schema
Previous Message Andrey Borodin 2022-09-13 05:38:13 Re: pg_stat_statements locking