Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-04-07 18:27:51
Message-ID: 7e111173-6e0a-ad36-8b35-1cbabb9f2426@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 4/7/23 8:12 PM, Andres Freund wrote:
> Hi,
>
> On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
>> Integrated all of these.
>
> Here's my current version. Changes:
> - Integrated Bertrand's changes
> - polished commit messages of 0001-0003
> - edited code comments for 0003, including
> InvalidateObsoleteReplicationSlots()'s header
> - added a bump of SLOT_VERSION to 0001
> - moved addition of pg_log_standby_snapshot() to 0007
> - added a catversion bump for pg_log_standby_snapshot()
> - moved all the bits dealing with procsignals from 0003 to 0004, now the split
> makes sense IMO
> - combined a few more sucessive ->safe_psql() calls
>

Thanks!

> I see occasional failures in the tests, particularly in the new test using
> pg_authid, but not solely. cfbot also seems to have seen these:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740
>
> I made a bogus attempt at a workaround for the pg_authid case last night. But
> that didn't actually fix anything, it just changed the timing.
>
> I think the issue is that VACUUM does not force WAL to be flushed at the end
> (since it does not assign an xid). wait_for_replay_catchup() uses
> $node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from
> before VACUUM completed.
>
> The problem can be made more likely by adding pg_usleep(1000000); before
> walwriter.c's call to XLogBackgroundFlush().
>
> We probably should introduce some infrastructure in Cluster.pm for this, but
> for now I just added a 'flush_wal' table that we insert into after a
> VACUUM. That guarantees a WAL flush.
>
>
Ack for the Cluster.pm "improvement" and thanks for the "workaround"!

> I think some of the patches might have more reviewers than really applicable,
> and might also miss some. I'd appreciate if you could go over that...
>

Sure, will do in a couple of hours.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-07 18:29:31 Re: monitoring usage count distribution
Previous Message Drouvot, Bertrand 2023-04-07 18:24:33 Re: Minimal logical decoding on standbys