Re: Slow synchronous logical replication

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow synchronous logical replication
Date: 2017-10-08 13:00:31
Message-ID: CAMsr+YE6aE6Re6smrMr-xCabRmCr=yzXEf2Yuv5upEDY5nMX8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 October 2017 at 03:58, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> The question was about logical replication mechanism in mainstream version
> of Postgres.

I think it'd be helpful if you provided reproduction instructions,
test programs, etc, making it very clear when things are / aren't
related to your changes.

> I think that most of people are using asynchronous logical replication and
> synchronous LR is something exotic and not well tested and investigated.
> It will be great if I am wrong:)

I doubt it's widely used. That said, a lot of people use synchronous
replication with BDR and pglogical, which are ancestors of the core
logical rep code and design.

I think you actually need to collect some proper timings and
diagnostics here, rather than hand-waving about it being "slow". A
good starting point might be setting some custom 'perf' tracepoints,
or adding some 'elog()'ing for timestamps. Then scrape the results and
build a latency graph.

That said, if I had to guess why it's slow, I'd say that you're facing
a number of factors:

* By default, logical replication in PostgreSQL does not do an
immediate flush to disk after downstream commit. In the interests of
faster apply performance it instead delays sending flush confirmations
until the next time WAL is flushed out. See the docs for CREATE
SUBSCRIPTION, notably the synchronous_commit option. This will
obviously greatly increase latencies on sync commit.

* Logical decoding doesn't *start* streaming a transaction until the
origin node finishes the xact and writes a COMMIT, then the xlogreader
picks it up.

* As a consequence of the above, a big xact holds up commit
confirmations of smaller ones by a LOT more than is the case for
streaming physical replication.

Hopefully that gives you something to look into, anyway. Maybe you'll
be inspired to work on parallelized logical decoding :)

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-10-08 13:21:03 Re: Help required to debug pg_repack breaking logical replication
Previous Message Andrey Borodin 2017-10-08 07:52:28 Re: On markers of changed data