Re: Test of a partition with an incomplete detach has a timing issue

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "'amitlangote09(at)gmail(dot)com'" <amitlangote09(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test of a partition with an incomplete detach has a timing issue
Date: 2021-05-25 03:56:42
Message-ID: 20210525035642.GA3804869@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 24, 2021 at 09:12:40PM -0400, Tom Lane wrote:
> The experiments I did awhile ago are coming back to me now. I tried
> a number of variations on this same theme, and none of them closed
> the gap entirely. The fundamental problem is that it's possible
> for backend A to complete its transaction, and for backend B (which
> is the isolationtester's monitoring session) to observe that A has
> completed its transaction, and for B to report that fact to the
> isolationtester, and for that report to arrive at the isolationtester
> *before A's query result does*. You need some bad luck for that
> to happen, like A losing the CPU right before it flushes its output
> buffer to the client, but I was able to demonstrate it fairly
> repeatably.

> So a completely bulletproof interlock seems out of reach.

What if we had a standard that the step after the cancel shall send a query to
the backend that just received the cancel? Something like:

--- a/src/test/isolation/specs/detach-partition-concurrently-3.spec
+++ b/src/test/isolation/specs/detach-partition-concurrently-3.spec
@@ -34,16 +34,18 @@ step "s1describe" { SELECT 'd3_listp' AS root, * FROM pg_partition_tree('d3_list
session "s2"
step "s2begin" { BEGIN; }
step "s2snitch" { INSERT INTO d3_pid SELECT pg_backend_pid(); }
step "s2detach" { ALTER TABLE d3_listp DETACH PARTITION d3_listp1 CONCURRENTLY; }
+step "s2noop" { UNLISTEN noop; }
+# TODO follow every instance of s1cancel w/ s2noop
step "s2detach2" { ALTER TABLE d3_listp DETACH PARTITION d3_listp2 CONCURRENTLY; }
step "s2detachfinal" { ALTER TABLE d3_listp DETACH PARTITION d3_listp1 FINALIZE; }
step "s2drop" { DROP TABLE d3_listp1; }
step "s2commit" { COMMIT; }

# Try various things while the partition is in "being detached" state, with
# no session waiting.
-permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s1c" "s1describe" "s1alter"
+permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s2noop" "s1c" "s1describe" "s1alter"
permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s1insert" "s1c"
permutation "s2snitch" "s1brr" "s1s" "s2detach" "s1cancel" "s1insert" "s1c" "s1spart"
permutation "s2snitch" "s1b" "s1s" "s2detach" "s1cancel" "s1c" "s1insertpart"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-05-25 04:12:30 Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Previous Message Ajin Cherian 2021-05-25 03:24:08 Re: [HACKERS] logical decoding of two-phase transactions