Re: Smoothing the subtrans performance catastrophe

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Smoothing the subtrans performance catastrophe
Date: 2022-08-03 11:55:42
Message-ID: CAFiTN-s2Jh2jD1khT6suO1Yszcn0rbSggGOCd9cGQ9cnBDencQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 1, 2022 at 10:13 PM Simon Riggs
<simon(dot)riggs(at)enterprisedb(dot)com> wrote:
>
> "A mathematical catastrophe is a point in a model of an input-output
> system, where a vanishingly small change in the input can produce a
> large change in the output."
>
> We have just such a change in Postgres: when a snapshot overflows. In
> this case it takes only one subxid over the subxid cache limit to slow
> down every request in XidInMVCCSnapshot(), which becomes painful when
> a long running transaction exists at the same time. This situation has
> been noted by various bloggers, but is illustrated clearly in the
> attached diagram, generated by test results from Julien Tachoires.
>
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
>
> Attached patch improves on the situation, as also shown in the attached diagram.
>
> The patch does these things:
>
> 1. Rework XidInMVCCSnapshot() so that it always checks the snapshot
> first, before attempting to lookup subtrans. A related change means
> that we always keep full subxid info in the snapshot, even if one of
> the backends has overflowed.
>
> 2. Use binary search for standby snapshots, since the snapshot subxip
> is in sorted order.
>
> 3. Rework GetTopmostTransaction so that it a) checks xmin as it goes,
> b) only does one iteration on standby snapshots, both of which save
> subtrans lookups in appropriate cases.
> (This was newly added in v6)
>
> Now, is this a panacea? Not at all. What this patch does is smooth out
> the catastrophic effect so that a few overflowed subxids don't spoil
> everybody else's performance, but eventually, if many or all sessions
> have their overflowed subxid caches then the performance will descend
> as before, albeit that the attached patch has some additional
> optimizations (2, 3 above). So what this gives is a better flight
> envelope in case of a small number of occasional overflows.
>
> Please review. Thank you.

+1,
I had a quick look into the patch to understand the idea and I think
the idea looks really promising to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2022-08-03 12:46:33 Correct comment in RemoveNonParentXlogFiles()
Previous Message Drouvot, Bertrand 2022-08-03 11:49:37 Re: Generalize ereport_startup_progress infrastructure