| From: | Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
|---|---|
| To: | fabriziomello(at)gmail(dot)com |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: [PATCH] pg_surgery: Fix WAL corruption from concurrent heap_force_kill |
| Date: | 2026-05-05 20:42:43 |
| Message-ID: | CAN4CZFMT=MUfD_14vX+0Goaymn7_zx0_EkZz2pJFVTeeO5faaw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello!
I verified both the patch, and that the test case catches the bug
without the patch, the overall change seems good to me.
+ if (did_modify_vm)
+ INJECTION_POINT_CACHED("heap-force-kill-before-vm-wal", NULL);
+
if (did_modify_vm && RelationNeedsWAL(rel))
+ {
log_newpage_buffer(vmbuf, false);
+ LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK);
+ }
Is the additional if intentional here? Based on the name it seems like
it could be simply part of the next if, currently it also fires for
unlogged tables.
+# Give session 2 time to reach the VM buffer lock (or complete if
+# unfixed). We cannot reliably detect blocking from Perl, so just
+# sleep briefly.
+use Time::HiRes qw(usleep);
+usleep(500_000);
Wouldn't this be possibly unstable on CI?
The following seems to work for me, both for detecting the issue in
the unpatched version and to result in a quicker continuation in the
patched version: (~1.55sec total execution time compared to ~2sec
original)
(with the patch, s2 wait for the loc, without the patch it finishes
and becomes idle)
my $s2 = $node->background_psql('postgres');
my $s2_pid = $s2->query_safe(q{SELECT pg_backend_pid()});
chomp $s2_pid;
$s2->query_until(qr/starting_s2/,
q(\echo starting_s2
SELECT heap_force_kill('test_vm'::regclass, ARRAY['(1,1)']::tid[]);
\echo s2_done
));
use Time::HiRes qw(usleep);
my $observed = '';
my $deadline = time() + 10;
while (time() < $deadline) {
$observed = $node->safe_psql('postgres', qq{
SELECT format('%s/%s/%s',
coalesce(wait_event_type, 'NULL'),
coalesce(wait_event, 'NULL'),
coalesce(state, 'NULL'))
FROM pg_stat_activity WHERE pid = $s2_pid
});
last if $observed =~ m{^Buffer/BufferExclusive/}
|| $observed =~ m{/idle$};
usleep(50_000);
}
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zsolt Parragi | 2026-05-05 20:53:20 | Re: Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server |
| Previous Message | Tom Lane | 2026-05-05 20:32:41 | Re: [BUG] CRASH: ECPGprepared_statement() and ECPGdeallocate_all() when connection is NULL |