Re: Measuring replay lag

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring replay lag
Date: 2017-03-05 07:31:42
Message-ID: CANP8+jJ6pkZjXEccZe+wGECUoEmLCt2bUxnM7mXGO=bAmeKknw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 March 2017 at 10:47, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Fri, Feb 24, 2017 at 9:05 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 21 February 2017 at 21:38, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> However, I think a call like LagTrackerWrite(SendRqstPtr,
>>> GetCurrentTimestamp()) needs to go into XLogSendLogical, to mirror
>>> what happens in XLogSendPhysical. I'm not sure about that.
>>
>> Me neither, but I think we need this for both physical and logical.
>>
>> Same use cases graphs for both, I think. There might be issues with
>> the way LSNs work for logical.
>
> This seems to be problematic. Logical peers report LSN changes for
> all three operations (write, flush, commit) only on commit. I suppose
> that might work OK for synchronous replication, but it makes it a bit
> difficult to get lag measurements that don't look really strange and
> sawtoothy when you have long transactions, and overlapping
> transactions might interfere with the measurements in odd ways. I
> wonder if the way LSNs are reported by logical rep would need to be
> changed first. I need to study this some more and would be grateful
> for ideas from any of the logical rep people.

I have no doubt there are problems with the nature of logical
replication that affect this. Those things are not the problem of this
patch but that doesn't push everything away.

What we want from this patch is something that works for both, as much
as that is possible.

With that in mind, this patch should be able to provide sensible lag
measurements from a simple case like logical replication of a standard
pgbench run. If that highlights problems with this patch then we can
fix them here.

Thanks

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2017-03-05 07:36:00 Re: Allow pg_dumpall to work without pg_authid
Previous Message Simon Riggs 2017-03-05 07:20:19 Re: dropping partitioned tables without CASCADE