Re: cheaper snapshots redux

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: cheaper snapshots redux
Date: 2011-08-22 23:22:47
Message-ID: CA+TgmoZHt4iGyVp3vxOTOi8ev4J_faWZNQ-3OXgcabed7uFaDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 22, 2011 at 6:45 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> Something that would be really nice to fix is our reliance on a fixed size of shared memory, and I'm wondering if this could be an opportunity to start in a new direction. My thought is that we could maintain two distinct shared memory snapshots and alternate between them. That would allow us to actually resize them as needed. We would still need something like what you suggest to allow for adding to the list without locking, but with this scheme we wouldn't need to worry about extra locking when taking a snapshot since we'd be doing that in a new segment that no one is using yet.
>
> The downside is such a scheme does add non-trivial complexity on top of what you proposed. I suspect it would be much better if we had a separate mechanism for dealing with shared memory requirements (shalloc?). But if it's just not practical to make a generic shared memory manager it would be good to start thinking about ways we can work around fixed shared memory size issues.

Well, the system I'm proposing is actually BETTER than having two
distinct shared memory snapshots. For example, right now we cache up
to 64 subxids per backend. I'm imagining that going away and using
that memory for the ring buffer. Out of the box, that would imply a
ring buffer of 64 * 103 = 6592 slots. If the average snapshot lists
100 XIDs, you could rewrite the snapshot dozens of times times before
the buffer wraps around, which is obviously a lot more than two. Even
if subtransactions are being heavily used and each snapshot lists 1000
XIDs, you still have enough space to rewrite the snapshot several
times over before wraparound occurs. Of course, at some point the
snapshot gets too big and you have to switch to retaining only the
toplevel XIDs, which is more or less the equivalent of what happens
under the current implementation when any single transaction's subxid
cache overflows.

With respect to a general-purpose shared memory allocator, I think
that there are cases where that would be useful to have, but I don't
think there are as many of them as many people seem to think. I
wouldn't choose to implement this using a general-purpose allocator
even if we had it, both because it's undesirable to allow this or any
subsystem to consume an arbitrary amount of memory (nor can it fail...
especially in the abort path) and because a ring buffer is almost
certainly faster than a general-purpose allocator. We have enough
trouble with palloc overhead already. That having been said, I do
think there are cases where it would be nice to have... and it
wouldn't surprise me if I end up working on something along those
lines in the next year or so. It turns out that memory management is
a major issue in lock-free programming; you can't assume that it's
safe to recycle an object once the last pointer to it has been removed
from shared memory - because someone may have fetched the pointer just
before you removed it and still be using it to examine the object. An
allocator with some built-in capabilities for handling such problems
seems like it might be very useful....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2011-08-23 01:04:19 Re: SSL-mode error reporting in libpq
Previous Message Jim Nasby 2011-08-22 22:45:04 Re: cheaper snapshots redux