31 авг. 2021 г., в 11:43, Pengchengliu <pengchengliu@tju.edu.cn> написал(а):

Hi Andrey,
Thanks a lot for your replay and reference information.

The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically.
If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.
So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache.
And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email.

I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice.
I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop.
Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop.
After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result.

SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections.

I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway.

Thanks!

Best regards, Andrey Borodin.