Re: Unresolved repliaction hang and stop problem.

From: Lukasz Biegaj <lukasz(dot)biegaj(at)unitygroup(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Krzysztof Kois <krzysztof(dot)kois(at)unitygroup(dot)com>
Subject: Re: Unresolved repliaction hang and stop problem.
Date: 2021-05-04 13:21:07
Message-ID: 224ac459-9ea9-32ab-8b5e-18fb885267e3@unitygroup.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hey, thanks for reaching out and sorry for the late reply - we had few
days of national holidays.

On 29.04.2021 15:55, Alvaro Herrera wrote:
> https://www.postgresql.org/message-id/flat/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK%2BNqhAWJyw8psQ%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com
>
> Krzysztof said "after upgrading to pg13 we started having problems",
> which implicitly indicates that the same thing worked well in pg10 ---
> but if the problem has been correctly identified, then this wouldn't
> have worked in pg10 either. So something in the story doesn't quite
> match up. Maybe it's not the same problem after all, or maybe they
> weren't doing X in pg10 which they are attempting in pg13.
>

The problem started occurring after upgrade from pg10 to pg13. No other
changes were performed, especially not within the database structure nor
performed operations.

The problem is as described in
https://www.postgresql.org/message-id/flat/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com

It does occur on two separate production clusters and one test cluster -
all belonging to the same customer, although processing slightly
different data (it's an e-commerce store with multiple languages and
separate production databases for each language).

We've tried recreating the database from dump, and recreating the
replication, but without any positive effect - the problem persists.

We did not rollback the databases to pg10, instead we've stayed with
pg13 and implemented a shell script to kill the walsender process if it
seems stuck in `hash_seq_search`. It's ugly, but it works and we backup
and monitor the data integrity anyway.

I'd be happy to help in debugging the issue had I known how to do it
:-). If you'd like then we can also try to rollback the installation
back to pg10 to get certainty that this was not caused by schema changes.

--
Lukasz Biegaj | Unity Group | https://www.unitygroup.com/
System Architect, AWS Certified Solutions Architect

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-04 13:35:19 Re: Simplify backend terminate and wait logic in postgres_fdw test
Previous Message vignesh C 2021-05-04 13:20:15 Re: Identify missing publications from publisher while create/alter subscription.