Re: Slow standby snapshot

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow standby snapshot
Date: 2021-08-09 21:45:17
Message-ID: CANtu0ojw4-vFQk4nscmGXYj2sPRNAuez39=1Oy1JEhO97Ekp9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Andres.

Thanks for the feedback again.

> The problem is that we don't want to add a lot of work
> KnownAssignedXidsAdd/Remove, because very often nobody will build a snapshot
> for that moment and building a sorted, gap-free, linear array of xids isn't
> cheap. In my experience it's more common to be bottlenecked by replay CPU
> performance than on replica snapshot building.

Yes, but my patch adds almost the smallest possible amount for
KnownAssignedXidsAdd/Remove - a single write to the array by index.
It differs from the first version in this thread which is based on linked lists.
The "next valid offset" is just "optimistic optimization" - it means
"you could safely skip KnownAssignedXidsNext[i] while finding the next
valid".
But KnownAssignedXidsNext is not updated by Add/Remove.

> During recovery, where there's only one writer to the procarray / known xids,
> it might not be hard to opportunistically build a dense version of the known
> xids whenever a snapshot is built.

AFAIU the patch does exactly the same.
On the first snapshot building, offsets to the next valid entry are
set. So, a dense version is created on-demand.
And this version is reused (even partly if something was removed) on
the next snapshot building.

> I'm not entirely sure what datastructure would work best. I can see something
> like a radix tree work well, or a skiplist style approach. Or a hashtable:

We could try to use some other structure (for example - linked hash
map) - but the additional cost (memory management, links, hash
calculation) will probably significantly reduce performance.
And it is a much harder step to perform.

So, I think "next valid offset" optimization is a good trade-off for now:
* KnownAssignedXidsAdd/Remove are almost not affected in their complexity
* KnownAssignedXidsGetAndSetXmin is eliminated from the CPU top on
typical read scenario - so, more TPS, less ProcArrayLock contention
* it complements GetSnapshotData() scalability - now on standby
* changes are small

Thanks,
Michail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2021-08-09 21:48:07 Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Previous Message Melanie Plageman 2021-08-09 21:27:18 Re: Asynchronous and "direct" IO support for PostgreSQL.