subtransaction performance regression [kind of] due to snapshot caching

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: subtransaction performance regression [kind of] due to snapshot caching
Date: 2021-04-06 04:35:21
Message-ID: 20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In a recent thread ([1]) I found a performance regression of the
following statement
DO $do$
BEGIN FOR i IN 1 .. 10000 LOOP
BEGIN
EXECUTE $cf$CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $f$BEGIN frakbar; END;$f$;$cf$;
EXCEPTION WHEN others THEN
END;
END LOOP;
END;$do$;

13: 1617.798
14-dev: 34088.505

The time in 14 is spent mostly below:
- 94.58% 0.01% postgres postgres [.] CreateFunction
- 94.57% CreateFunction
- 94.49% ProcedureCreate
- 90.95% record_object_address_dependencies
- 90.93% recordMultipleDependencies
- 89.65% isObjectPinned
- 89.12% systable_getnext
- 89.06% index_getnext_slot
- 56.13% index_fetch_heap
- 54.82% table_index_fetch_tuple
+ 53.79% heapam_index_fetch_tuple
0.07% heap_hot_search_buffer
0.01% ReleaseAndReadBuffer
0.01% LockBuffer
0.08% heapam_index_fetch_tuple

After a bit of debugging I figured out that the direct failure lies with
623a9ba79b. The problem is that subtransaction abort does not increment
ShmemVariableCache->xactCompletionCount. That's trivial to remedy (we
already lock ProcArrayLock during XidCacheRemoveRunningXids).

What happens is that heap_hot_search_buffer()-> correctly recognizes the
aborted subtransaction's rows as dead, but they are not recognized as
"surely dead". Which then leads to O(iterations^2) index->heap lookups,
because the rows from previous iterations are never recognized as dead.

I initially was a bit worried that this could be a correctness issue as
well. The more I thought about it the more confused I got. A
transaction's subtransaction abort should not actually change the
contents of a snapshot, right?

Snapshot
GetSnapshotData(Snapshot snapshot)
...
/*
* We don't include our own XIDs (if any) in the snapshot. It
* needs to be includeded in the xmin computation, but we did so
* outside the loop.
*/
if (pgxactoff == mypgxactoff)
continue;

The sole reason for the behavioural difference is that the cached
snapshot's xmax is *lower* than a new snapshot's would be, because
GetSnapshotData() initializes xmax as
ShmemVariableCache->latestCompletedXid - which
XidCacheRemoveRunningXids() incremented, without incrementing
ShmemVariableCache->xactCompletionCount.

Which then causes HeapTupleSatisfiesMVCC to go down
if (!HeapTupleHeaderXminCommitted(tuple))
...
else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetRawXmin(tuple));
else
{
/* it must have aborted or crashed */
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}

the "return false" for XidInMVCCSnapshot() rather than the
SetHintBits(HEAP_XMIN_INVALID) path. Which then in turn causes
HeapTupleIsSurelyDead() to not recognize the rows as surely dead.

bool
XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
{
uint32 i;

/*
* Make a quick range check to eliminate most XIDs without looking at the
* xip arrays. Note that this is OK even if we convert a subxact XID to
* its parent below, because a subxact with XID < xmin has surely also got
* a parent with XID < xmin, while one with XID >= xmax must belong to a
* parent that was not yet committed at the time of this snapshot.
*/

/* Any xid < xmin is not in-progress */
if (TransactionIdPrecedes(xid, snapshot->xmin))
return false;
/* Any xid >= xmax is in-progress */
if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
return true;

I *think* this issue doesn't lead to actually wrong query results. For
HeapTupleSatisfiesMVCC purposes there's not much of a difference between
an aborted transaction and one that's "in progress" according to the
snapshot (that's required - we don't check for aborts for xids in the
snapshot).

It is a bit disappointing that there - as far as I could find - are no
tests for kill_prior_tuple actually working. I guess that lack, and that
there's no difference in query results, explains why nobody noticed the
issue in the last ~9 months.

See the attached fix. I did include a test that verifies that the
kill_prior_tuples optimization actually prevents the index from growing,
when subtransactions are involved. I think it should be stable, even
with concurrent activity. But I'd welcome a look.

I don't think that's why the issue exists, but I very much hate the
XidCache* name. Makes it sound much less important than it is...

Greetings,

Andres Freund

[1] https://postgr.es/m/20210317055718.v6qs3ltzrformqoa%40alap3.anarazel.de

Attachment Content-Type Size
0001-snapshot-caching-vs-subtransactions.patch text/x-diff 6.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-04-06 04:46:59 Re: Get memory contexts of an arbitrary backend process
Previous Message Bharath Rupireddy 2021-04-06 04:33:28 Re: TRUNCATE on foreign table