| From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> | 
|---|---|
| To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> | 
| Cc: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Fix 035_standby_logical_decoding.pl race conditions | 
| Date: | 2025-04-03 06:59:44 | 
| Message-ID: | Z+4x4HM9ZWlD0D8P@ip-10-97-1-34.eu-west-3.compute.internal | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On Thu, Apr 03, 2025 at 05:34:10AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Dear Bertrand, Amit,
> 
> > > I do prefer v5-PG17-2 as it is "closer" to HEAD. That said, I think that we should
> > > keep the slots active and only avoid doing the checks for them (they are
> > invalidated
> > > that's fine, they are not that's fine too).
> > >
> > 
> > I don't mind doing that, but there is no benefit in making slots
> > active unless we can validate them. And we will end up adding some
> > more checks, as in function check_slots_conflict_reason without any
> > advantage.
I think that there is advantage. The pros are:
- the test would be closer to HEAD from a behavioural point of view
- it's very rare to hit the corner cases: so the test would behave the same
as on HEAD most of the time (and when it does not that would not hurt as the
checks are nor done)
- Kuroda-San's patch removes "or die "replication slot stats of vacuum_full_activeslot not updated"
while keeping the slot active is able to keep it (should the slot being invalidated
or not). But more on that in the comment === 1 below.
> I feel Kuroda-San's second patch is simple, and we have
> > fewer chances to make mistakes and easy to maintain in the future as
> > well.
Yeah maybe but the price to pay is to discard the pros above. That said, I'm also
fine with Kuroda-San's patch if both of you feel that it's better.
> I have concerns for Bertrand's patch that it could introduce another timing
> issue. E.g., if the activated slots are not invalidated, dropping slots is keep
> being activated so the dropping might be fail.
Yeah, but the drop is done with "$node_standby->psql" so that the test does not
produce an error. It would produce an error should we use "$node_standby->safe_psql"
instead.
> I did not reproduce this but
> something like this can happen if we activate slots.
You can see it that way (+ reproducer.txt):
"
+       my $bdt = $node_standby->safe_psql('postgres', qq[SELECT * from pg_replication_slots]);
+       note "BDT: $bdt";
+
        $node_standby->psql('postgres',
                qq[SELECT pg_drop_replication_slot('$inactive_slot')]);
"
You'd see the slot being active and the "$node_standby->psql" not reporting
any error.
> Attached patch has a conclusion of these discussions, slots are created but
> it seldomly be activated.
Thanks for the patch!
=== 1
-$node_standby->poll_query_until('testdb',
-       qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']
-) or die "replication slot stats of vacuum_full_activeslot not updated";
-
 # This should trigger the conflict
 wait_until_vacuum_can_remove(
I wonder if we could not keep this test and make the slot active for the
vacuum full case. Looking at drongo's failure in [1], there is no occurence
of "vacuum full" and that's probably linked to Andres's explanation in [2]:
"
a VACUUM FULL on pg_class is
used, which prevents logical decoding from progressing after it started (due
to the logged AEL at the start of VACFULL).
"
meaning that the active slot is invalidated even if the catalog xmin's moves
forward due to xl_running_xacts.
[1]: https://www.postgresql.org/message-id/386386.1737736935%40sss.pgh.pa.us
[2]: https://www.postgresql.org/message-id/zqypkuvtihtd2zbmwdfmcceujg4fuakrhojmjkxpp7t4udqkty%40couhenc7dsor
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jakub Wartak | 2025-04-03 07:01:43 | Re: Draft for basic NUMA observability | 
| Previous Message | Hayato Kuroda (Fujitsu) | 2025-04-03 06:57:00 | RE: pg_recvlogical cannot create slots with failover=true |