Re: Replication & recovery_min_apply_delay

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication & recovery_min_apply_delay
Date: 2019-09-04 13:37:39
Message-ID: 992ed453-656a-70bc-a7c2-4cc5d173112e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.09.2019 1:22, Alvaro Herrera wrote:
> On 2019-Aug-02, Michael Paquier wrote:
>
>> On Wed, Jul 31, 2019 at 04:43:26PM -0400, Alvaro Herrera wrote:
>>> As for the test module, the one I submitted takes a lot of time to run
>>> (well, 60s) and I don't think it's a good idea to include it as
>>> something to be run all the time by every buildfarm member. I'm not
>>> sure that there's a leaner way to test for this bug, though, but
>>> certainly it'd be a good idea to ensure that this continues to work.
>> Hmmm. Instead of that, wouldn't it be cleaner to maintain in the
>> context of the startup process a marker similar to receivedUpto for
>> the last LSN? The issue with this one is that it gets reset easily so
>> we would lose track of it easily, and we need also to count with the
>> case where a WAL receiver is not started. So I think that we should
>> do that as a last replayed or received LSN if a WAL receiver is up and
>> running, whichever is newer. Splitting the WAL receiver restart logic
>> into a separate routine is a good idea in itself, the patch attempting
>> to switch primary_conninfo to be reloadable could make use of that.
> Konstantin, any interest in trying this?
>
Sorry, I do not understand this proposal.

> and we need also to count with the case where a WAL receiver is not
started.

May be i missed something, but what this patch is trying to achieve is
to launch WAL receiver before already received transactions are applied.
So definitely WAL receiver is not started at this moment.

receivedUpto is just static variable in xlog.c, maintained by WAL receiver.
But as I mentioned above, WAL receiver is not started at the moment when
we need to know LSN of last record.

Certainly it should be possible to somehow persist receveidUpto, so we
do not need to scan WAL to determine the last LSN at next start.
By persisting last LSN introduce a lot of questions and problems.
For example when it needs to be flushed for the disk. If it is done
after each received transaction, then it can significantly suffer
performance.
If it is done more or less asynchronously, then there us a risk that we
requested streaming with wrong position.
In any case it will significantly complicate the patch and make it more
sensible for various errors.

I wonder what is wrong with determining LSN of last record by just
scanning WAL?
Certainly it is not the most efficient way. But I do not expect that
somebody will have hundreds or thousands megabytes of WAL.
Michael, do you see some other problems with GetLastLSN() functions
except time of its execution?

IMHO one of the ,ani advantages of this patch is that it is very simple.
We need to scan WAL to locate last LSN only if recovery_min_apply_delay
is set.
So this patch should not affect performance of all other cases.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2019-09-04 13:56:28 Default JIT setting in V12
Previous Message Robert Haas 2019-09-04 13:31:44 Re: block-level incremental backup