Re: Test of a partition with an incomplete detach has a timing issue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "'amitlangote09(at)gmail(dot)com'" <amitlangote09(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test of a partition with an incomplete detach has a timing issue
Date: 2021-05-24 18:21:04
Message-ID: 1012540.1621880464@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On 2021-May-24, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
>> t
>> -step s2detach: <... completed>
>> -error in steps s1cancel s2detach: ERROR: canceling statement due to user request
>> step s1c: COMMIT;
>> +step s2detach: <... completed>
>> +error in steps s1c s2detach: ERROR: canceling statement due to user request

> Uh, how annoying. If I understand correctly, I agree that this is a
> timing issue: sometimes it is fast enough that the cancel is reported
> together with its own step, but other times it takes longer so it is
> reported with the next command of that session instead, s1c (commit).

Yeah, we see such failures in the buildfarm with various isolation
tests; some recent examples:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-05-23%2019%3A43%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-05-08%2006%3A34%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-29%2009%3A43%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2021-04-22%2021%3A24%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-04-21%2010%3A38%3A32
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fossa&dt=2021-04-08%2019%3A36%3A06

I remember having tried to rewrite the isolation tester to eliminate
the race condition, without success (and I don't seem to have kept
my notes, which now I regret).

However, the existing hazards seem to hit rarely enough to not be
much of a problem. We might need to see if we can rejigger the
timing in this test to make it a little more stable.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-05-24 18:21:19 Re: Performance degradation of REFRESH MATERIALIZED VIEW
Previous Message Alvaro Herrera 2021-05-24 18:07:23 Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options