Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Date: 2023-12-04 12:00:27
Message-ID: c4029baf-693d-4bb5-7c57-5bfcdc5572ff@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/4/23 12:37, Amit Kapila wrote:
> On Sat, Dec 2, 2023 at 9:52 PM Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> wrote:
>>
>>> thread. I think you can compare the timing of regression tests in
>>> subscription, with and without the patch to show there is no
>>> regression. And probably some tests with a large number of tables for
>>> sync with very little data.
>>
>> I have tested the regression test timings for subscription with and
>> without patch. I also did the timing test for sync of subscription
>> with the publisher for 100 and 1000 tables respectively.
>> I have attached the test script and results of the timing test are as follows:
>>
>> Time taken for test to run in Linux VM
>> Summary | Subscription Test (sec)
>> | 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
>> (sec)
>> Without patch Release | 95.564
>> | 7.877 | 58.919
>> With patch Release | 96.513
>> | 6.533 | 45.807
>>
>> Time Taken for test to run in another Linux VM
>> Summary | Subscription Test (sec)
>> | 100 tables in pub and Sub (sec) | 1000 tables in pub and Sub
>> (sec)
>> Without patch Release | 109.8145
>> | 6.4675 | 83.001
>> With patch Release | 113.162
>> | 7.947 | 87.113
>>
>
> So, on some machines, it may increase the test timing although not too
> much. I think the reason is probably doing the work in multiple
> transactions for multiple relations. I am wondering that instead of
> committing and starting a new transaction before
> wait_for_relation_state_change(), what if we do it inside that
> function just before we decide to wait? It is quite possible that in
> many cases we don't need any wait at all.
>

I'm not sure what you mean by "do it". What should the function do?

As for the test results, I very much doubt the differences are not
caused simply by random timing variations, or something like that. And I
don't understand what "Performance Machine Linux" is, considering those
timings are slower than the other two machines.

Also, even if it was a bit slower, does it really matter? I mean, the
current code is wrong, can lead to infinite duration if it happens to
hit the deadlock. And it's a one-time action, I don't think it's a very
sensitive in terms of performance.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-12-04 12:10:36 Re: Avoid detoast overhead when possible
Previous Message Tatsuo Ishii 2023-12-04 11:40:48 Re: Row pattern recognition