Re: Failure of subscription tests with topminnow

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Failure of subscription tests with topminnow
Date: 2021-08-16 08:32:47
Message-ID: CAA4eK1LT-zf5cU5paubxo1=Us2WiWdTMjFU4utLjNuTwgjh65w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 16, 2021 at 9:24 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> Hi all,
>
> topminnow has just failed in a weird fashion:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=topminnow&dt=2021-08-15%2013%3A24%3A58
> # SELECT pid != FROM pg_stat_replication WHERE application_name = 'tap_sub';
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # ERROR: syntax error at or near "FROM"
> # LINE 1: SELECT pid != FROM pg_stat_replication WHERE application_na...
>
> Looking at the logs, it seems like the problem boils down to an active
> slot when using ALTER SUBSCRIPTION tap_sub CONNECTION:
> 2021-08-15 18:44:38.027 CEST [16473:2] ERROR: could not start WAL
> streaming: ERROR: replication slot "tap_sub" is active for PID 16336
>

The "ALTER SUBSCRIPTION tap_sub CONNECTION" would lead to restart the
walsender process. Now, here the problem seems to be that the previous
walsender process (16336) didn't exit and the new process started to
use the slot which leads to the error shown in the log. This is
evident from the below part of log where we can see that 16336 has
exited after new process started to use the slot.

2021-08-15 18:44:38.027 CEST [16475:6] tap_sub LOG: received
replication command: START_REPLICATION SLOT "tap_sub" LOGICAL
0/16BEEB0 (proto_version '1', publication_names
'"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.027 CEST [16475:7] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
publication_names '"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.027 CEST [16475:8] tap_sub ERROR: replication
slot "tap_sub" is active for PID 16336
2021-08-15 18:44:38.027 CEST [16475:9] tap_sub STATEMENT:
START_REPLICATION SLOT "tap_sub" LOGICAL 0/16BEEB0 (proto_version '1',
publication_names '"tap_pub","tap_pub_ins_only"')
2021-08-15 18:44:38.041 CEST [16475:10] tap_sub LOG: disconnection:
session time: 0:00:00.036 user=nm database=postgres host=[local]
2021-08-15 18:44:38.043 CEST [16336:14] tap_sub LOG: disconnection:
session time: 0:00:06.367 user=nm database=postgres host=[local]

One idea to solve this is to first disable the subscription, wait for
the walsender process to exit, and then change the connection string
and re-enable the subscription.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Wenjing 2021-08-16 09:15:05 Is it worth pushing conditions to sublink/subplan?
Previous Message Greg Nancarrow 2021-08-16 08:30:46 Re: Skipping logical replication transactions on subscriber side