RE: Test of a partition with an incomplete detach has a timing issue

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Alvaro Herrera' <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "'amitlangote09(at)gmail(dot)com'" <amitlangote09(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Test of a partition with an incomplete detach has a timing issue
Date: 2021-05-25 00:42:34
Message-ID: OSBPR01MB4888036C48E999EA8EA44B1EED259@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, May 25, 2021 3:07 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2021-May-24, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
>
> > Also, I've gotten some logs left.
> > * src/test/isolation/output_iso/regression.out
> >
> > test detach-partition-concurrently-1 ... ok 682 ms
> > test detach-partition-concurrently-2 ... ok 321 ms
> > test detach-partition-concurrently-3 ... FAILED 1084 ms
> > test detach-partition-concurrently-4 ... ok 1078 ms
> > test fk-contention ... ok 77 ms
> >
> > * src/test/isolation/output_iso/regression.diffs
> >
> > diff -U3
> /(where/I/put/PG)/src/test/isolation/expected/detach-partition-concurrently
> -3.out
> /(where/I/put/PG)/src/test/isolation/output_iso/results/detach-partition-con
> currently-3.out
> > ---
> /(where/I/put/PG)/src/test/isolation/expected/detach-partition-concurrently
> -3.out 2021-05-24 03:30:15.735488295 +0000
> > +++
> /(where/I/put/PG)/src/test/isolation/output_iso/results/detach-partition-con
> currently-3.out 2021-05-24 04:46:48.851488295 +0000
> > @@ -12,9 +12,9 @@
> > pg_cancel_backend
> >
> > t
> > -step s2detach: <... completed>
> > -error in steps s1cancel s2detach: ERROR: canceling statement due to
> > user request step s1c: COMMIT;
> > +step s2detach: <... completed>
> > +error in steps s1c s2detach: ERROR: canceling statement due to user
> > +request
>
> Uh, how annoying. If I understand correctly, I agree that this is a timing issue:
> sometimes it is fast enough that the cancel is reported together with its own
> step, but other times it takes longer so it is reported with the next command of
> that session instead, s1c (commit).
>
> I suppose a fix would imply that the error report waits until after the "cancel"
> step is over, but I'm not sure how to do that.
>
> Maybe we can change the "cancel" query to something like
>
> SELECT pg_cancel_backend(pid), somehow_wait_for_detach_to_terminate()
> FROM d3_pid;
>
> ... where maybe that function can check the "state" column in s3's
> pg_stat_activity row? I'll give that a try.
Thank you so much for addressing this issue.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-25 00:46:58 Re: Test of a partition with an incomplete detach has a timing issue
Previous Message Ranier Vilela 2021-05-25 00:37:41 Possible pointer var TupleDesc rettupdesc used not initialized (src/backend/optimizer/util/clauses.c)