Re: subscriptionCheck failures on nightjar

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: subscriptionCheck failures on nightjar
Date: 2019-09-20 17:08:31
Message-ID: 20190920170831.aaljabal6lyivre5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-09-19 17:20:15 +0530, Kuntal Ghosh wrote:
> It seems there is a pattern how the error is occurring in different
> systems. Following are the relevant log snippets:
>
> nightjar:
> sub3 LOG: received replication command: CREATE_REPLICATION_SLOT
> "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
> sub3 LOG: logical decoding found consistent point at 0/160B578
> sub1 PANIC: could not open file
> "pg_logical/snapshots/0-160B578.snap": No such file or directory
>
> dromedary scenario 1:
> sub3_16414_sync_16399 LOG: received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694
> sub2 PANIC: could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
>
>
> dromedary scenario 2:
> sub3_16414_sync_16399 LOG: received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694
> sub1 PANIC: could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
>
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.

Since now a number of people (I tried as well), failed to reproduce this
locally, I propose that we increase the log-level during this test on
master. And perhaps expand the set of debugging information. With the
hope that the additional information on the cases encountered on the bf
helps us build a reproducer or, even better, diagnose the issue
directly. If people agree, I'll come up with a patch.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-09-20 17:46:27 Re: log bind parameter values on error
Previous Message Tom Lane 2019-09-20 16:59:27 Re: A problem presentaion about ECPG, DECLARE STATEMENT