Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Date: 2024-01-12 11:00:01
Message-ID: cc7925b8-30cc-c76d-b1b6-c9ec6bd36a03@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

12.01.2024 10:15, Bertrand Drouvot wrote:
>
> For this one, the "good" news is that it looks like that we don’t see the
> "terminating" message not followed by an "obsolete" message (so the engine
> behaves correctly) anymore.
>
> There is simply nothing related to the row_removal_activeslot at all (the catalog_xmin
> advanced and there is no conflict).

Yes, judging from all the failures that we see now, it looks like the
0001-Fix-race-condition...patch works as expected.

> And I agree that this is due to the Standby/RUNNING_XACTS that is "advancing" the
> catalog_xmin of the active slot.
>
>> Standby/RUNNING_XACTS is exactly why 039_end_of_wal.pl uses wal_level
>> = minimal, because these lead to unpredictible records inserted,
>> impacting the reliability of the tests. We cannot do that here,
>> obviously. That may be a long shot, but could it be possible to tweak
>> the test with a retry logic, retrying things if such a standby
>> snapshot is found because we know that the invalidation is not going
>> to work anyway?
> I think it all depends what the xl_running_xacts does contain (means does it
> "advance" or not the catalog_xmin in our case).
>
> In our case it does advance it (should it occurs) due to the "select txid_current()"
> that is done in wait_until_vacuum_can_remove() in 035_standby_logical_decoding.pl.
>
> I suggest to make use of txid_current_snapshot() instead (that does not produce
> a Transaction/COMMIT wal record, as opposed to txid_current()).
>
> I think that it could be "enough" for our case here, and it's what v5 attached is
> now doing.
>
> Let's give v5 a try? (please apply v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> too).

Unfortunately, I've got the failure again (please see logs attached).
(_primary.log can confirm that I have used exactly v5 — I see no
txid_current() calls there...)

Best regards,
Alexander

Attachment Content-Type Size
035-failures-vacuum-pg_authid.tar.gz application/gzip 150.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-01-12 11:16:37 Re: Make attstattarget nullable
Previous Message Michael Banck 2024-01-12 10:54:29 Re: plpgsql memory leaks