Smoothing the subtrans performance catastrophe

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Smoothing the subtrans performance catastrophe
Date: 2022-08-01 16:42:49
Message-ID: CANbhV-Fdnr5F0YyPAG88qhq9KhUtGMJ7ZGVKp1SD+YY12jVdPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"A mathematical catastrophe is a point in a model of an input-output
system, where a vanishingly small change in the input can produce a
large change in the output."

We have just such a change in Postgres: when a snapshot overflows. In
this case it takes only one subxid over the subxid cache limit to slow
down every request in XidInMVCCSnapshot(), which becomes painful when
a long running transaction exists at the same time. This situation has
been noted by various bloggers, but is illustrated clearly in the
attached diagram, generated by test results from Julien Tachoires.

The reason for the slowdown is clear: when we overflow we check every
xid against subtrans, producing a large stream of lookups. Some
previous hackers have tried to speed up subtrans - this patch takes a
different approach: remove as many subtrans lookups as possible. (So
is not competing with those other solutions).

Attached patch improves on the situation, as also shown in the attached diagram.

The patch does these things:

1. Rework XidInMVCCSnapshot() so that it always checks the snapshot
first, before attempting to lookup subtrans. A related change means
that we always keep full subxid info in the snapshot, even if one of
the backends has overflowed.

2. Use binary search for standby snapshots, since the snapshot subxip
is in sorted order.

3. Rework GetTopmostTransaction so that it a) checks xmin as it goes,
b) only does one iteration on standby snapshots, both of which save
subtrans lookups in appropriate cases.
(This was newly added in v6)

Now, is this a panacea? Not at all. What this patch does is smooth out
the catastrophic effect so that a few overflowed subxids don't spoil
everybody else's performance, but eventually, if many or all sessions
have their overflowed subxid caches then the performance will descend
as before, albeit that the attached patch has some additional
optimizations (2, 3 above). So what this gives is a better flight
envelope in case of a small number of occasional overflows.

Please review. Thank you.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachment Content-Type Size
subx_redesign_snapshot_search_at_overflow.png image/png 94.4 KB
subx_redesign_snapshot_search.v6.patch application/x-patch 15.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-08-01 16:43:06 Re: [Commitfest 2022-07] Patch Triage: Waiting on Author
Previous Message Robert Haas 2022-08-01 16:33:52 Re: [Commitfest 2022-07] Patch Triage: Waiting on Author