Re: advance local xmin more aggressively

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: advance local xmin more aggressively
Date: 2014-12-10 17:57:04
Message-ID: 54888970.5020807@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/10/2014 06:56 PM, Robert Haas wrote:
> On Wed, Dec 10, 2014 at 9:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I guess this bears some further thought. I certainly don't like the
>> fact that this makes the whole system crap out at a lower number of
>> subtransactions than presently. The actual performance numbers don't
>> bother me very much; I'm comfortable with the possibility that closing
>> a cursor will be some modest percentage slower if you've got thousands
>> of active savepoints.
>
> Here's a new version with two changes:
>
> 1. I changed the traversal of the resource owner tree to iterate
> instead of recurse. It now does a depth-first, pre-order traversal of
> the tree; when we reach the last child of a node, we follow its parent
> pointer to get back to where we were. That way, we don't need to keep
> anything on the stack. That fixed the crash at 100k cursors, but it
> was still 4x slower.

Clever. Could we use that method in ResourceOwnerReleaseInternal and
ResourceOwnerDelete, too? Might be best to have a
ResourceOwnerWalk(resowner, callback) function for walking all resource
owners in a tree, instead of one for walking the snapshots in them.

> 2. Instead of traversing the tree until we find an xmin equal to the
> one we're currently advertising, the code now traverses the entire
> tree each time it runs. However, it also keeps a record of how many
> times the oldest xmin occurred in the tree, which is decremented each
> time we unregister a snapshot with that xmin; the traversal doesn't
> run again until that count reaches 0. That fixed the performance
> regression on your test case.
>
> With a million subtransactions:
>
> master 34.464s 33.742s 34.317s
> advance-xmin 34.516s 34.069s 34.196s

Well, you can still get the slowness back by running other stuff in the
background. I admit that it's a very obscure case, probably fine in
practice. I would still feel better if snapmgr.c did its own
bookkeeping, from an aesthetic point of view. In a heap, or even just a
linked list, if the performance characteristics of that is acceptable.

It occurs to me that the pairing heap I just posted in another thread
(http://www.postgresql.org/message-id/54886BB8.9040000@vmware.com) would
be a good fit for this. It's extremely cheap to insert to and to find
the minimum item (O(1)), and the delete operation is O(log N),
amortized. I didn't implement a delete operation, for removing a
particular node, I only did delete-min, but it's basically the same
code. Using a pairing heap for this might be overkill, but if we have
that implementation anyway, the code in snapmgr.c to use it would be
very simple, so I see little reason not to. It might even be simpler
than your patch, because you wouldn't need to have the heuristics on
whether to attempt updating the xmin; it would be cheap enough to always
try it.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-12-10 18:13:36 Re: On partitioning
Previous Message Andres Freund 2014-12-10 17:10:51 Re: logical column ordering