RE: suboverflowed subtransactions concurrency performance optimize

From: "Pengchengliu" <pengchengliu(at)tju(dot)edu(dot)cn>
To: "'Andrey Borodin'" <x4mmm(at)yandex-team(dot)ru>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: suboverflowed subtransactions concurrency performance optimize
Date: 2021-08-31 06:43:02
Message-ID: 000d01d79e33$7270ba30$57522e90$@tju.edu.cn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrey,
Thanks a lot for your replay and reference information.

The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.
If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.
So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.
And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email.

I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice.
I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop.
Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop.
After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result.

[1] solution. Actually first, we try to use Buffer manager to replace SLRU for subtrans too. And we have implemented it.
With the test case which I mentioned in previous mail, It was still stuck. In default there is 2048 subtrans in one page.
When some processes get the top transaction in one page, they should pin/unpin and lock/unlock the same page repeatedly.
I found than it was stuck at pin/unpin page for some backends.

Compare test results, pgbench with subtrans_128.sql
Concurrency PG master PG master with path[0] Local cache optimize
300 stuck stuck no stuck
500 stuck stuck no stuck
1000 stuck stuck no stuck

Maybe we can test different approach with my test case. For massive concurrency, if it will not be stuck, we can get a good solution.

[0] https://commitfest.postgresql.org/34/2627/
[1] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com

Thanks
Pengcheng

-----Original Message-----
From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Sent: 2021年8月30日 18:25
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: suboverflowed subtransactions concurrency performance optimize

Hi Pengcheng!

You are solving important problem, thank you!

> 30 авг. 2021 г., в 13:43, Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn> написал(а):
>
> To resolve this performance problem, we think about a solution which
> cache SubtransSLRU to local cache.
> First we can query parent transaction id from SubtransSLRU, and copy
> the SLRU page to local cache page.
> After that if we need query parent transaction id again, we can query
> it from local cache directly.

A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared representation of SLRU?

JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0].
Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not pass the moderation. [1]

Also it's important to note that there was a community request to move SLRUs to shared_buffers [2].

Thanks!

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/34/2627/
[1] https://www.postgresql.org/message-id/flat/BE73A0BB-5929-40F4-BAF8-55323DE39561%40yandex-team.ru
[2] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-08-31 06:45:06 Re: archive status ".ready" files may be created too early
Previous Message Tatsuo Ishii 2021-08-31 06:39:18 Re: Fix around conn_duration in pgbench