performance issue in remove_from_unowned_list()

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: performance issue in remove_from_unowned_list()
Date: 2019-02-06 23:38:02
Message-ID: 80c27103-99e4-1d0c-642c-d9f3b94aaa0a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While working on benchmarks for the syscache patch (negative entries and
all of that), I've ran into a an issue in remove_from_unowned_list.

If you create a lot of relations in a single transaction (say, 100k) and
then abort the transaction (or if it fails for some reason, e.g. because
of hitting max_locks_per_transaction), smgrDoPendingDeletes ought to
clean relation files.

Unfortunately, the constructed srels array is ordered in exactly the
opposite way compared to "unowned list" (used by smgrclose). So what
happens is that we start with the first relation, walk the whole list
unowned list and remove the last item. Then we get the second rel and
again walk the whole unowned list until the last item. And so on.

For large number of objects that's pretty significant. With 100k rels
I've lost the patience after a couple of minutes, and just nuked the
whole cluster (interrupting ROLLBACK is a no go, and after a kill the
recovery just starts chewing on it again).

Of course, transactions creating 100k tables in a single transaction are
not that common, but it's not unheard of either (such applications
exist, are discussed in the syscache thread, and restoring them from a
pg_dump is likely to hit this issue). And the same issue applies to
cases that drop a bunch of tables in a single transaction (and I've seen
plenty of scripts doing that, although in smaller batches).

But it's a bit funnier, because there's also DropRelationFiles() which
does smgrclose on a batch of relations too, and it says this

/*
* Call smgrclose() in reverse order as when smgropen() is called.
* This trick enables remove_from_unowned_list() in smgrclose()
* to search the SMgrRelation from the unowned list,
* with O(1) performance.
*/
for (i = ndelrels - 1; i >= 0; i--)
...

but it's called from two places in xact.c only. And if you trigger the
issue with 100k x CREATE TABLE + ROLLBACK, and then force a recovery by
killing postmaster, you actually get the very same behavior with always
traversing the unowned list for some reason. I'm not quite sure why, but
it seems the optimization is exactly the wrong thing to do here.

Attached is a WIP patch removing the optimization from DropRelationFiles
and adding it to smgrDoPendingDeletes. This resolves the issue, at least
in the cases I've been able to reproduce. But maybe we should deal with
this issue earlier by ensuring the two lists are ordered the same way
from the very beginning, somehow.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
smgr-fix.patch text/x-patch 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-02-06 23:44:08 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Justin Pryzby 2019-02-06 23:21:11 Re: dsa_allocate() faliure