Re: Catalog_xmin is not advanced when a logical slot is lost

From: sirisha chamarthi <sirichamarthi22(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: Catalog_xmin is not advanced when a logical slot is lost
Date: 2022-11-21 17:35:53
Message-ID: CAKrAKeVW2uw70aXCANWz16QCVY38kOzCj7uhmm7HhqaSWiGspA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2022 at 9:12 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
wrote:

> On 2022-Nov-21, sirisha chamarthi wrote:
>
> > On Mon, Nov 21, 2022 at 8:05 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> > wrote:
>
> > > Thank you. I had pushed mine for CirrusCI to test, and it failed the
> > > assert I added in slot.c:
> > > https://cirrus-ci.com/build/4786354503548928
> > > Not yet sure why, looking into it.
> >
> > Can this be because restart_lsn is not set to InvalidXLogRecPtr for the
> > physical slots?
>
> Hmm, that makes no sense. Is that yet another bug? Looking.
>

It appears to be. wal_sender is setting restart_lsn to a valid LSN even
when the slot is invalidated.

postgres=# select pg_Create_physical_replication_slot('s1');
pg_create_physical_replication_slot
-------------------------------------
(s1,)
(1 row)

postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active |
active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
s1 | | physical | | | f | f |
| | | | |
| -8254390272 | f
(1 row)

postgres=# checkpoint;
CHECKPOINT
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active |
active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
s1 | | physical | | | f | f |
| | | | |
| -8374095064 | f
(1 row)

postgres=# \q
postgres(at)pgvm:~$ /usr/local/pgsql/bin/pg_receivewal -S s1 -D .
pg_receivewal: error: unexpected termination of replication stream: ERROR:
requested WAL segment 0000000100000000000000EB has already been removed
pg_receivewal: disconnected; waiting 5 seconds to try again
^Cpostgres(at)pgvm:~$ /usr/local/pgsql/bin/psql
psql (16devel)
Type "help" for help.

postgres=# select * from pg_replication_slots;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!?> ^C
!?>

In the log:
2022-11-21 17:31:48.159 UTC [3953664] STATEMENT: START_REPLICATION SLOT
"s1" 0/EB000000 TIMELINE 1
TRAP: failed Assert("XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)"),
File: "slotfuncs.c", Line: 371, PID: 3953707

>
> --
> Álvaro Herrera 48°01'N 7°57'E —
> https://www.EnterpriseDB.com/
> "No es bueno caminar con un hombre muerto"
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-11-21 17:37:34 Re: Damage control for planner's get_actual_variable_endpoint() runaway
Previous Message Tom Lane 2022-11-21 17:35:15 Re: pgsql: Prevent instability in contrib/pageinspect's regression test.