From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Date: 2013-09-15 15:23:07
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-09-15 02:19:41 -0700, Peter Geoghegan wrote:
> On Sat, Sep 14, 2013 at 1:57 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> > It seems to me that the nature of the problem is that there will unavoidably
> > be a nexus between the two parts of the code here. We can try to isolate it
> > as much as possible but we're going to need a bit of a compromise.
> Exactly. That's why all the proposals with the exception of this one
> have to date involved unacceptable bloating - that's how they try and
> span the nexus.

> I'll find it very difficult to accept any implementation that is going
> to bloat things even worse than our upsert looping example.

How would any even halfway sensible example cause *more* bloat than the
upsert looping thing?
I'll concede that bloat is something to be aware of, but just because
it's *an* issue, it's not *the* only issue.

In all the solutions I can think of/have heard of that have the chance
of producing additional bloat also have good chance of cleaning up the
additional bloat.

In the "promises" approach you simply can mark the promise index tuples
as LP_DEAD in the IGNORE case if you've found a conflicting tuple. In
the OR UPDATE case you can immediately reuse them. There's no heap
bloat. The logic for dead items already exists in nbtree, so that's not
too much complication. The case where that doesn't work is when postgres
dies inbetween or we're signalled to abort. But that produces bloat for
normal DML anyway. Any vacuum or insert can check whether the promise
xid has committed and remove the promise otherwise.

In the proposals that involve just inserting the heaptuple and then
handle the uniqueness violation when inserting the index tuples you can
immediately mark the index tuples as dead and mark it as prunable.

> The only advantage of such an implementation over the upsert example is that
> it'll avoid burning through subxacts. The main reason I don't want to
> take that approach is that I know it won't be accepted, because it's a
> disaster. That's why the people that proposed this in various forms
> down through the years haven't gone and implemented it themselves. I
> do not accept that all of this is like the general situation with row
> locks.

The primary advantage will be that it's actually usable by users without
massive overhead in writing dozens of functions.

I don't think the bloat issue had much to do with the feature not
getting implemented so far. It's that nobody was willing to do the work
and endure the discussions around it. And I definitely applaud you for
finally tackling the issue despite that.

> I do not think that the big costs of having many dead
> duplicates in a unique index can be overlooked

Why would there be so many duplicate index tuples? The primary user of
this is going to be UPSERT. In case there's a conflicting tuple, there
is going to be a new tuple version. Which will need a new index entry
quite often. If there's no conflict, we will insert anyway.
So, there's the case of UPSERTs that could be done as HOT updates
because there's enough space on the page and none of the indexes
actually have changed. As explained above, we can simply mark the index
tuple as dead in that case (don't even need an exclusive lock for that,
if done right).

> (or perhaps the cost of
> cleaning them up eagerly, which is something I'd also expect to work
> very badly).

Why? Remember the page you did the insert to, do a _bt_moveright() to
catch eventual splits. Mark the item as dead. Done. The next insert will
repack the page if necessary (cf. _bt_findinsertloc).

> What I will concede (what I have conceded, actually) is that it would
> be better if the locks were more granular. Now, I'm not so much
> concerned about concurrent inserters inserting values that just so
> happen to be values that were locked. It's more the case that I'm
> worried about inserters blocking on other values that are incidentally
> locked despite not already existing, that would go on the locked page
> or maybe a later page. In particular, I'm concerned about the impact
> on SERIAL primary key columns. Not exactly an uncommon case (though
> one I'd already thought to optimize by locking last).

Yes, I think that's the primary issue from a scalability and performance
POV. Locking entire ranges of values, potentially even on inner pages
(because you otherwise would have to split) isn't going to work.

> What I think might actually work acceptably is if we were to create an
> SLRU that kept track of value-locks per buffer. The challenge there
> would be to have regular unique index inserters care about them, while
> having little to no impact on their regular performance. This might be
> possible by having them check the buffer for external value locks in
> the SLRU immediately after exclusive locking the buffer - usually that
> only has to happen once per index tuple insertion (assuming no
> duplicates necessitate retry). If they find their value in the SLRU,
> they do something like unlock and block on the other xact and restart.
> Now, obviously a lot of the details would have to be worked out, but
> it seems possible.

If you can make that work, without locking heap and btree pages at the
same time, yes, I think that's a possible way forward. One way to offset
the cost of SLRU in the common case where there is no contention would
be to have a page level flag that triggers that lookup. There should be
space in btpo_flags.

> In order for any of this to really be possible, there'd have to be
> some concession made to my position, as Greg mentions here. In other
> words, I'd need buy-in for the general idea of holding locks in shared
> memory from indexes across heap tuple insertion (subject to a sound
> deadlock analysis, of course).

I don't have a fundamental problem with holding locks during the
insert. I have a problem with holding page level lightweight locks on
the btree and the heap at the same time.

> Some modest compromises may need to be made around interruptibility.

Why? As far as I understand that proposal, I don't see why that would be needed?

> I'd also probably need agreement that
> it's okay that value locks can not last more than an instant (they
> cannot be held indefinitely pending the end of a transaction). This
> isn't something that I imagine to be too controversial, because it's
> true today for a single unique index. As I've already outlined, anyone
> waiting on another transaction with a would-be duplicate to commit has
> very few guarantees about the order that it'll get its second shot
> relative to the order it initial queued up behind the successful but
> not-yet-committed inserter.

I forsee problems here.


Andres Freund

Andres Freund
PostgreSQL Development, 24x7 Support, Training & Services

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-09-15 15:25:59 Re: Minmax indexes
Previous Message Peter Eisentraut 2013-09-15 15:20:20 Re: logical changeset generation v6