Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Date: 2024-01-12 13:46:08
Message-ID: ZaFCoJ1FxBQ9jQkX@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jan 12, 2024 at 02:00:01PM +0300, Alexander Lakhin wrote:
> Hi,
>
> 12.01.2024 10:15, Bertrand Drouvot wrote:
> >
> > For this one, the "good" news is that it looks like that we don’t see the
> > "terminating" message not followed by an "obsolete" message (so the engine
> > behaves correctly) anymore.
> >
> > There is simply nothing related to the row_removal_activeslot at all (the catalog_xmin
> > advanced and there is no conflict).
>
> Yes, judging from all the failures that we see now, it looks like the
> 0001-Fix-race-condition...patch works as expected.

Yeah, thanks for confirming, I'll create a dedicated hackers thread for that one.

> > Let's give v5 a try? (please apply v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> > too).
>
> Unfortunately, I've got the failure again (please see logs attached).
> (_primary.log can confirm that I have used exactly v5 — I see no
> txid_current() calls there...)

Okay ;-( Thanks for the testing. Then I can think of:

1) Michael's proposal up-thread (means tweak the test with a retry logic, retrying
things if such a standby snapshot is found).

2) Don't report a test error for active slots in case its catalog_xmin advanced.

I'd vote for 2) as:

- this is a corner case and the vast majority of the animals don't report any
issues (means the active slot conflict detection is already well covered).

- even on the same animal it should be pretty rare to not have an active slot
conflict detection not covered at all (and the "failing" one would be probably
moving over time).

- It may be possible that 1) ends up failing (as we'd need to put a limit on the
retry logic anyhow).

What do you think?

And BTW, looking closely at wait_until_vacuum_can_remove(), I'm not sure it's
fully correct, so I'll give it another look.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2024-01-12 13:53:45 Re: plpgsql memory leaks
Previous Message Aleksander Alekseev 2024-01-12 13:35:52 Re: Improve the connection failure error messages