Re: Slow synchronous logical replication

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow synchronous logical replication
Date: 2017-10-12 08:24:57
Message-ID: 5f5143cc-9f73-3909-3ef7-d3895cc6cc90@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.10.2017 04:23, Craig Ringer wrote:
> On 12 October 2017 at 00:57, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>> The reason of such behavior is obvious: wal sender has to decode huge
>> transaction generate by insert although it has no relation to this
>> publication.
> It does. Though I wouldn't expect anywhere near the kind of drop you
> report, and haven't observed it here.
>
> Is the CREATE TABLE and INSERT done in the same transaction?

No. Table was create in separate transaction.
Moreover the same effect will take place if table is create before
start of replication.
The problem in this case seems to be caused by spilling decoded
transaction to the file by ReorderBufferSerializeTXN.
Please look at two profiles:
http://garret.ru/lr1.svg corresponds to normal work if pgbench with
synchronous replication to one replica,
http://garret.ru/lr2.svg - the with concurrent execution of huge insert
statement.

And here is output of pgbench (at fifth second insert is started):

progress: 1.0 s, 10020.9 tps, lat 0.791 ms stddev 0.232
progress: 2.0 s, 10184.1 tps, lat 0.786 ms stddev 0.192
progress: 3.0 s, 10058.8 tps, lat 0.795 ms stddev 0.301
progress: 4.0 s, 10230.3 tps, lat 0.782 ms stddev 0.194
progress: 5.0 s, 10335.0 tps, lat 0.774 ms stddev 0.192
progress: 6.0 s, 4535.7 tps, lat 1.591 ms stddev 9.370
progress: 7.0 s, 419.6 tps, lat 20.897 ms stddev 55.338
progress: 8.0 s, 105.1 tps, lat 56.140 ms stddev 76.309
progress: 9.0 s, 9.0 tps, lat 504.104 ms stddev 52.964
progress: 10.0 s, 14.0 tps, lat 797.535 ms stddev 156.082
progress: 11.0 s, 14.0 tps, lat 601.865 ms stddev 93.598
progress: 12.0 s, 11.0 tps, lat 658.276 ms stddev 138.503
progress: 13.0 s, 9.0 tps, lat 784.120 ms stddev 127.206
progress: 14.0 s, 7.0 tps, lat 870.944 ms stddev 156.377
progress: 15.0 s, 8.0 tps, lat 1111.578 ms stddev 140.987
progress: 16.0 s, 7.0 tps, lat 1258.750 ms stddev 75.677
progress: 17.0 s, 6.0 tps, lat 991.023 ms stddev 229.058
progress: 18.0 s, 5.0 tps, lat 1063.986 ms stddev 269.361

It seems to be effect of large transactions.
Presence of several channels of synchronous logical replication reduce
performance, but not so much.
Below are results at another machine and pgbench with scale 10.

Configuraion
standalone
1 async logical replica
1 sync logical replca
3 async logical replicas
3 syn logical replicas
TPS
15k
13k
10k
13k
8k

>
> Only partly true. The output plugin can register a transaction origin
> filter and use that to say it's entirely uninterested in a
> transaction. But this only works based on filtering by origins. Not
> tables.
Yes I know about origin filtering mechanism (and we are using it in
multimaster).
But I am speaking about standard pgoutput.c output plugin. it's
pgoutput_origin_filter
always returns false.

>
> I imagine we could call another hook in output plugins, "do you care
> about this table", and use it to skip some more work for tuples that
> particular decoding session isn't interested in. Skip adding them to
> the reorder buffer, etc. No such hook currently exists, but it'd be an
> interesting patch for Pg11 if you feel like working on it.
>
>> Unfortunately it is not quite clear how to make wal-sender smarter and let
>> him skip transaction not affecting its publication.
> As noted, it already can do so by origin. Mostly. We cannot totally
> skip over WAL, since we need to process various invalidations etc. See
> ReorderBufferSkip.
The problem is that before end of transaction we do not know whether it
touch this publication or not.
So filtering by origin will not work in this case.

I really not sure that it is possible to skip over WAL. But the
particular problem with invalidation records etc can be solved by
always processing this records by WAl sender.
I.e. if backend is inserting invalidation record or some other record
which always should be processed by WAL sender, it can always promote
LSN of this record to WAL sender.
So WAl sender will skip only those WAl records which is safe to skip
(insert/update/delete records not affecting this publication).

I wonder if there can be some other problems with skipping part of
transaction by WAL sender.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-10-12 08:43:10 Re: replace GrantObjectType with ObjectType
Previous Message Konstantin Knizhnik 2017-10-12 08:09:16 Re: Slow synchronous logical replication