Re: Transaction timeout

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, 邱宇航 <iamqyh(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers mailing list <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Transaction timeout
Date: 2024-02-18 19:16:02
Message-ID: A738D5A4-23BA-4AD6-A01E-EF34A5A98C57@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander, thanks for pushing this! This is small but very awaited feature.

> On 16 Feb 2024, at 02:08, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Isn't this test going to be very fragile on busy / slow machines? What if the
> pg_sleep() takes one second, because there were other tasks to schedule? I'd
> be surprised if this didn't fail under valgrind, for example.

Even more robust tests that were bullet-proof in CI previously exhibited some failures on buildfarm. Currently there are 5 failures through this weekend.
Failing tests are testing interaction of idle_in_transaction_session_timeout vs transaction_timeout(5), and rescheduling transaction_timeout(6).
Symptoms:

[0] transaction timeout occurs when it is being scheduled. Seems like SET was running to long.
step s6_begin: BEGIN ISOLATION LEVEL READ COMMITTED;
step s6_tt: SET statement_timeout = '1s'; SET transaction_timeout = '10ms';
+s6: FATAL: terminating connection due to transaction timeout
step checker_sleep: SELECT pg_sleep(0.1);

[1] transaction timeout 10ms is not detected after 1s
step s6_check: SELECT count(*) FROM pg_stat_activity WHERE application_name = 'isolation/timeouts/s6';
count
-----
- 0
+ 1

[2] transaction timeout is not detected in both session 5 and session 6.

So far not signle animal reported failures twice, so it's hard to say anything about frequency. But it seems to be significant source of failures.

So far I have these ideas:

1. Remove test sessions 5 and 6. But it seems a little strange that session 3 did not fail at all (it is testing interaction of statement_timeout and transaction_timeout). This test is very similar to test sessiont 5...
2. Increase wait times.
step checker_sleep { SELECT pg_sleep(0.1); }
Seems not enough to observe backend timed out from pg_stat_activity. But this won't help from [0].
3. Reuse waiting INJECTION_POINT from [3] to make timeout tests deterministic and safe from race conditions. With waiting injection points we can wait as much as needed in current environment.

Any advices are welcome.

Best regards, Andrey Borodin.

[0] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-02-16%2020%3A06%3A51
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-02-16%2001%3A45%3A10
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2024-02-17%2001%3A55%3A45
[3] https://www.postgresql.org/message-id/0925F9A9-4D53-4B27-A87E-3D83A757B0E0@yandex-team.ru

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-02-18 19:44:02 Re: Running the fdw test from the terminal crashes into the core-dump
Previous Message Tom Lane 2024-02-18 19:13:20 Re: Running the fdw test from the terminal crashes into the core-dump