Re: Failure of subscription tests with topminnow

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Failure of subscription tests with topminnow
Date: 2021-08-25 14:04:23
Message-ID: CAFPTHDbGSt4G9JdsTv-0ACZZWiTKExNxc5w4e5z=8YbCC+Ft5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 25, 2021 at 11:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Aug 25, 2021 at 6:10 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I did a quick check with the following tap test code:
> >
> > $node_publisher->poll_query_until('postgres',
> > qq(
> > select 1 != foo.column1 from (values(0), (1)) as foo;
> > ));
> >
> > The query returns {t, f} but poll_query_until() never finished. The
> > same is true when the query returns {f, t}.
> >

Yes, this is true, I also see the same behaviour.

>
> This means something different is going on in Ajin's setup. Ajin, can
> you please share how did you confirm your findings about poll_query?

Relooking at my logs, I think what happens is this:

1. First walsender 'a' is running.
2. Second walsender 'b' starts and attempts at acquiring the slot
finds that the slot is active for pid a.
3. Now both walsenders are active, the query does not return.
4. First walsender 'a' times out and exits.
5. Now only the second walsender is active and the query returns OK
because pid != a.
6. Second walsender exits with error.
7. Another query attempts to get the pid of the running walsender for
tap_sub but returns null because both walsender exits.
8. This null return value results in the next query erroring out and
the test failing.

>Can you additionally check the value of 'state' from
>pg_stat_replication for both the old and new walsender sessions?

Yes, will try this and post a patch tomorrow.

regards,
Ajin Cherian
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-08-25 14:05:54 Re: Mark all GUC variable as PGDLLIMPORT
Previous Message Dagfinn Ilmari Mannsåker 2021-08-25 14:00:13 Re: Remove Value node struct