Re: Slow synchronous logical replication

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow synchronous logical replication
Date: 2017-10-09 07:37:01
Message-ID: 4e8ad98c-7399-7795-dec9-07952952abb1@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for explanations.

On 08.10.2017 16:00, Craig Ringer wrote:
> I think it'd be helpful if you provided reproduction instructions,
> test programs, etc, making it very clear when things are / aren't
> related to your changes.

It will be not so easy to provide some reproducing scenario, because
actually it involves many components (postgres_fdw, pg_pasthman,
pg_shardman, LR,...)
and requires multinode installation.
But let me try to explain what going on:
So we have implement sharding - splitting data between several remote
tables using pg_pathman and postgres_fdw.
It means that insert or update of parent table cause insert or update
of some derived partitions which is forwarded by postgres_fdw to the
correspondent node.
Number of shards is significantly larger than number of nodes, i.e. for
5 nodes we have 50 shards. Which means that at each onde we have 10 shards.
To provide fault tolerance each shard is replicated using logical
replication to one or more nodes. Right now we considered only
redundancy level 1 - each shard has only one replica.
So from each node we establish 10 logical replication channels.

We want commit to wait until data is actually stored at all replicas, so
we are using synchronous replication:
So we set synchronous_commit option to "on" and include all ten 10
subscriptions in synchronous_standby_names list.

In this setup commit latency is very large (about 100msec and most of
the time is actually spent in commit) and performance is very bad -
pgbench shows about 300 TPS for optimal number of clients (about 10, for
larger number performance is almost the same). Without logical
replication at the same setup we get about 6000 TPS.

I have checked syncrepl.c file, particularly SyncRepGetSyncRecPtr
function. Each wal sender independently calculates minimal LSN among all
synchronous replicas and wakeup backends waiting for this LSN. It means
that transaction performing update of data in one shard will actually
wait confirmation from replication channels for all shards.
If some shard is updated rarely than other or is not updated at all (for
example because communication channels between this node is broken),
then all backens will stuck.
Also all backends are competing for the single SyncRepLock, which also
can be a contention point.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-10-09 09:56:16 Re: parallelize queries containing initplans
Previous Message Wood, Dan 2017-10-09 06:11:16 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple