Re: standby apply lag on inactive servers

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: standby apply lag on inactive servers
Date: 2020-01-31 14:47:57
Message-ID: 20200131144757.GA3354@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-Jan-31, Fujii Masao wrote:
> On 2020/01/31 22:40, Alvaro Herrera wrote:
> > On 2020-Jan-31, Fujii Masao wrote:
> >
> > > You're thinking to apply this change to the back branches? Sorry
> > > if my understanding is not right. But I don't think that back-patch
> > > is ok because it changes the documented existing behavior
> > > of pg_last_xact_replay_timestamp(). So it looks like the behavior
> > > change not a bug fix.
> >
> > Yeah, I am thinking in backpatching it. The documented behavior is
> > already not what the code does.
>
> Maybe you thought this because getRecordTimestamp() extracts the
> timestamp from even WAL record of a restore point? That is, you're
> concerned about that pg_last_xact_replay_timestamp() returns the
> timestamp of not only commit/abort record but also restore point one.
> Right?

right.

> As far as I read the code, this problem doesn't occur because
> SetLatestXTime() is called only for commit/abort records, in
> recoveryStopsAfter(). No?

... uh, wow, you're right about that too. IMO this is extremely
fragile, easy to break, and under-documented. But you're right, there's
no bug there at present.

> > Do you have a situation where this
> > change would break something? If so, can you please explain what it is?
>
> For example, use the return value of pg_last_xact_replay_timestamp()
> (and also the timestamp in the log message output at the end of
> recovery) as a HINT when setting recovery_target_time later.

Hmm.

I'm not sure how you would use it in that way. I mean, I understand how
it *can* be used that way, but it seems too fragile to be done in
practice, in a scenario that's not just laboratory games.

> Use it to compare with the timestamp retrieved from the master server,
> in order to monitor the replication delay.

That's precisely the use case that I'm aiming at. The timestamp
currently is not useful because this usage breaks when the primary is
inactive (no COMMIT records occur). During such periods of inactivity,
CHECKPOINT records would keep the "last xtime" current. This has
actually happened in a production setting, it's not a thought
experiment.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-31 15:13:19 Re: Marking some contrib modules as trusted extensions
Previous Message Fujii Masao 2020-01-31 14:29:00 Re: standby apply lag on inactive servers