Re: Minimal logical decoding on standbys

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-04-07 18:12:26
Message-ID: 20230407181226.6oyt4jcazy6eh7rx@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-07 08:47:57 -0700, Andres Freund wrote:
> Integrated all of these.

Here's my current version. Changes:
- Integrated Bertrand's changes
- polished commit messages of 0001-0003
- edited code comments for 0003, including
InvalidateObsoleteReplicationSlots()'s header
- added a bump of SLOT_VERSION to 0001
- moved addition of pg_log_standby_snapshot() to 0007
- added a catversion bump for pg_log_standby_snapshot()
- moved all the bits dealing with procsignals from 0003 to 0004, now the split
makes sense IMO
- combined a few more sucessive ->safe_psql() calls

I see occasional failures in the tests, particularly in the new test using
pg_authid, but not solely. cfbot also seems to have seen these:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740

I made a bogus attempt at a workaround for the pg_authid case last night. But
that didn't actually fix anything, it just changed the timing.

I think the issue is that VACUUM does not force WAL to be flushed at the end
(since it does not assign an xid). wait_for_replay_catchup() uses
$node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from
before VACUUM completed.

The problem can be made more likely by adding pg_usleep(1000000); before
walwriter.c's call to XLogBackgroundFlush().

We probably should introduce some infrastructure in Cluster.pm for this, but
for now I just added a 'flush_wal' table that we insert into after a
VACUUM. That guarantees a WAL flush.

I think some of the patches might have more reviewers than really applicable,
and might also miss some. I'd appreciate if you could go over that...

Greetings,

Andres Freund

Attachment Content-Type Size
va67-0001-Replace-a-replication-slot-s-invalidated_at-LSN.patch text/x-diff 6.6 KB
va67-0002-Prevent-use-of-invalidated-logical-slot-in-Crea.patch text/x-diff 4.1 KB
va67-0003-Support-invalidating-replication-slots-due-to-h.patch text/x-diff 12.5 KB
va67-0004-Handle-logical-slot-conflicts-on-standby.patch text/x-diff 12.2 KB
va67-0005-Arrange-for-a-new-pg_stat_database_conflicts-an.patch text/x-diff 10.2 KB
va67-0006-For-cascading-replication-wake-physical-and-log.patch text/x-diff 9.6 KB
va67-0007-Allow-logical-decoding-on-standby.patch text/x-diff 16.4 KB
va67-0008-New-TAP-test-for-logical-decoding-on-standby.patch text/x-diff 30.4 KB
va67-0008-TAP-test-for-logical-decoding-on-standby.patch text/x-diff 30.3 KB
va67-0009-Doc-changes-describing-details-about-logical-de.patch text/x-diff 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-04-07 18:24:33 Re: Minimal logical decoding on standbys
Previous Message Emre Hasegeli 2023-04-07 17:35:18 Unnecessary confirm work on logical replication