Re: Logical replication and multimaster

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical replication and multimaster
Date: 2015-12-02 17:30:32
Message-ID: CA+TgmoY1o3G0B-21zv2Pw5iEkpR8=J42GdsUOs4m0inKka3FEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 30, 2015 at 11:20 AM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> We have implemented ACID multimaster based on logical replication and our
> DTM (distributed transaction manager) plugin.
> Good news is that it works and no inconsistency is detected.
> But unfortunately it is very very slow...
>
> At standalone PostgreSQL I am able to achieve about 30000 TPS with 10
> clients performing simple depbit-credit transactions.
> And with multimaster consisting of three nodes spawned at the same system I
> got about 100 (one hundred) TPS.
> There are two main reasons of such awful performance:
>
> 1. Logical replication serializes all transactions: there is single
> connection between wal-sender and receiver BGW.
> 2. 2PC synchronizes transaction commit at all nodes.
>
> None of these two reasons are show stoppers themselves.
> If we remove DTM and do asynchronous logical replication then performance of
> multimaster is increased to 6000 TPS
> (please notice that in this test all multimaster node are spawned at the
> same system, sharing its resources,
> so 6k is not bad result comparing with 30k at standalone system).
> And according to 2ndquadrant results, BDR performance is very close to hot
> standby.

Logical decoding only begins decoding a transaction once the
transaction is complete. So I would guess that the sequence of
operations here is something like this - correct me if I'm wrong:

1. Do the transaction.
2. PREPARE.
3. Replay the transaction.
4. PREPARE the replay.
5. COMMIT PREPARED on original machine.
6. COMMIT PREPARED on replica.

Step 3 introduces latency proportional to the amount of work the
transaction did, which could be a lot. If you were doing synchronous
physical replication, the replay of the COMMIT record would only need
to wait for the replay of the commit record itself. But with
synchronous logical replication, you've got to wait for the replay of
the entire transaction. That's a major bummer, especially if replay
is single-threaded and there a large number of backends generating
transactions. Of course, the 2PC dance itself can also add latency -
that's most likely to be the issue if the transactions are each very
short.

What I'd suggest is trying to measure where the latency is coming
from. You should be able to measure how much time each transaction
spends (a) executing, (b) preparing itself, (c) waiting for the replay
thread to begin replaying it, (d) waiting for the replay thread to
finish replaying it, and (e) committing. Separating (c) and (d) might
be a little bit tricky, but I bet it's worth putting some effort in,
because the answer is probably important to understanding what sort of
change will help here. If (c) is the problem, you might be able to
get around it by having multiple processes, though that only helps if
applying is slower than decoding. But if (d) is the problem, then the
only solution is probably to begin applying the transaction
speculatively before it's prepared/committed. I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-02 18:03:01 Re: Using quicksort for every external sort run
Previous Message Tom Lane 2015-12-02 17:15:51 Re: Errors in our encoding conversion tables