Re: Single transaction in the tablesync worker?

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-07 04:31:53
Message-ID: CAGRY4nyjhZgHG+mGEES+QaRQKy7ya8gDZkqoDMC4rHqRsQmneQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 7 Dec 2020 at 11:44, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:

>
> Basically, I was wondering why can't the "tablesync" worker just
> gather messages in a similar way to how the current streaming feature
> gathers messages into a "changes" file, so that they can be replayed
> later.
>
>
See the related thread "Logical archiving"

https://www.postgresql.org/message-id/20D9328B-A189-43D1-80E2-EB25B9284AD6@yandex-team.ru

where I addressed some parts of this topic in detail earlier today.

A) The "tablesync" worker (after the COPY) does not ever apply any of
> the incoming messages, but instead it just gobbles them into a
> "changes" file until it decides it has reached SYNCDONE state and
> exits.
>

This has a few issues.

Most importantly, the sync worker must cooperate with the main apply worker
to achieve a consistent end-of-sync cutover. The sync worker must have
replayed the pending changes in order to make this cut-over, because the
non-sync apply worker will need to start applying changes on top of the
resync'd table potentially as soon as the next transaction it starts
applying, so it needs to see the rows there.

Doing this would also add another round of write multiplication since the
data would get spooled then applied to WAL then heap. Write multiplication
is already an issue for logical replication so adding to it isn't
particularly desirable without a really compelling reason. With the write
multiplication comes disk space management issues for big transactions as
well as the obvious performance/throughput impact.

It adds even more latency between upstream commit and downstream apply,
something that is again already an issue for logical replication.

Right now we don't have any concept of a durable and locally flushed spool.

It's not impossible to do as you suggest but the cutover requirement makes
it far from simple. As discussed in the logical archiving thread I think
it'd be good to have something like this, and there are times the write
multiplication price would be well worth paying. But it's not easy.

B) Then, when the "apply" worker proceeds, if it detects the existence
> of the "changes" file it will replay/apply_dispatch all those gobbled
> messages before just continuing as normal.
>

That's going to introduce a really big stall in the apply worker's progress
in many cases. During that time it won't be receiving from upstream (since
we don't spool logical changes to disk at this time) so the upstream lag
will grow. That will impact synchronous replication, pg_wal size
management, catalog bloat, etc. It'll also leave the upstream logical
decoding session idle, so when it resumes it may create a spike of I/O and
CPU load as it catches up, as well as a spike of network traffic. And
depending on how close the upstream write rate is to the max decode speed,
network throughput max, and downstream apply speed max, it may take some
time to catch up over the resulting lag.

Not a big fan of that approach.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-12-07 04:42:23 Re: Proposed patch for key managment
Previous Message Craig Ringer 2020-12-07 04:13:02 RFC: Deadlock detector hooks for edge injection