Re: logical replication deranged sender

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication deranged sender
Date: 2017-05-09 18:06:25
Message-ID: bb4ea17b-82a4-6794-f2f1-02740c8e7bcb@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/05/17 19:13, Jeff Janes wrote:
> On Tue, May 9, 2017 at 9:18 AM, Petr Jelinek
> <petr(dot)jelinek(at)2ndquadrant(dot)com <mailto:petr(dot)jelinek(at)2ndquadrant(dot)com>> wrote:
>
> On 08/05/17 13:47, Petr Jelinek wrote:
> > On 08/05/17 01:17, Jeff Janes wrote:
> >> After dropping a subscription, it says it succeeded and that it dropped
> >> the slot on the publisher.
> >>
> >> But the publisher still has the slot, and a full-tilt process described
> >> by ps as
> >>
> >> postgres: wal sender process jjanes [local] idle in transaction
> >>
> >> Strace shows that this process is doing nothing but opening, reading,
> >> lseek, and closing from pg_wal, and calling sbrk. It never sends anything.
> >>
> >> This is not how it should work, correct?
> >>
> >
> > No, and I don't see how this happens though, we only report success if
> > the publisher side said that DROP_REPLICATION_SLOT succeeded. So far I
> > don't see anything in source that would explain this. I will need to
> > reproduce it first to see what's happening (wasn't able to do that yet,
> > but it might just need more time since you say it does no happen always).
> >
>
> Hm I wonder are there any workers left on subscriber when this happens?
>
>
> Yes. using ps, I get this:
>
> postgres: bgworker: logical replication worker for subscription 16408
> sync 16391
> postgres: bgworker: logical replication worker for subscription 16408
> sync 16388
>
> They seem to be permanently blocked on a socket to read from the publisher.
>
> On the publisher side, I think it is very slowly assembling a snapshot.
> It seems to be adding one xid at a time, and then re-sorting the entire
> list. Over and over.
>

Okay, then it's the same issue Masahiko Sawada reported in nearby
thread, or at least has same cause.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erez Segal 2017-05-09 18:18:02 COMPRESS VALUES feature request
Previous Message Pavel Stehule 2017-05-09 18:01:35 Re: proposal psql \gdesc