Re: Slow synchronous logical replication

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow synchronous logical replication
Date: 2017-10-07 19:58:55
Message-ID: 59D931FF.5010408@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/07/2017 10:42 PM, Andres Freund wrote:
> Hi,
>
> On 2017-10-07 22:39:09 +0300, konstantin knizhnik wrote:
>> In our sharded cluster project we are trying to use logical relication for providing HA (maintaining redundant shard copies).
>> Using asynchronous logical replication has not so much sense in context of HA. This is why we try to use synchronous logical replication.
>> Unfortunately it shows very bad performance. With 50 shards and level of redundancy=1 (just one copy) cluster is 20 times slower then without logical replication.
>> With asynchronous replication it is "only" two times slower.
>>
>> As far as I understand, the reason of such bad performance is that synchronous replication mechanism was originally developed for streaming replication, when all replicas have the same content and LSNs. When it is used for logical replication, it behaves very inefficiently. Commit has to wait confirmations from all receivers mentioned in "synchronous_standby_names" list. So we are waiting not only for our own single logical replication standby, but all other standbys as well. Number of synchronous standbyes is equal to number of shards divided by number of nodes. To provide uniform distribution number of shards should >> than number of nodes, for example for 10 nodes we usually create 100 shards. As a result we get awful performance and blocking of any replication channel blocks all backends.
>>
>> So my question is whether my understanding is correct and synchronous logical replication can not be efficiently used in such manner.
>> If so, the next question is how difficult it will be to make synchronous replication mechanism for logical replication more efficient and are there some plans to work in this direction?
> This seems to be a question that is a) about a commercial project we
> don't know much about b) hasn't received a lot of investigation.
>
Sorry, If I was not clear.
The question was about logical replication mechanism in mainstream version of Postgres.
I think that most of people are using asynchronous logical replication and synchronous LR is something exotic and not well tested and investigated.
It will be great if I am wrong:)

Concerning our sharded cluster (pg_shardman) - it is not a commercial product yet, it is in development phase.
We are going to open its sources when it will be more or less stable.
But unlike multimaster, this sharded cluster is mostly built from existed components: pg_pathman + postgres_fdw + logical replication.
So we are just trying to combine them all into some integrated system.
But currently the most obscure point is logical replication.

And the main goal of my e-mail was to know the opinion of authors and users of LR whether it is good idea to use LR to provide fault tolerance in sharded cluster.
Or some other approaches, for example sharding with redundancy or using streaming replication are preferable?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2017-10-07 20:40:08 Re: Discussion on missing optimizations
Previous Message Andres Freund 2017-10-07 19:42:12 Re: Slow synchronous logical replication