Re: Refactoring speculative insertion with unique indexes a little

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring speculative insertion with unique indexes a little
Date: 2016-09-27 14:00:40
Message-ID: CAM3SWZQGR6sOCgkZJog65REJU_1-XuYrUCSRB2amt7m8QHv1BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 27, 2016 at 2:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I can see the value of processing unique indexes before non-unique ones.
> I'm pretty suspicious of trying to prioritize primary keys first, though,
> because (a) it's not clear why bother, and (b) PG is a tad squishy about
> whether an index is a primary key or not, so that I'd be worried about
> different backends sometimes choosing different orders. I'd simplify
> this to "uniques in OID order then non-uniques in OID order".

I see your point. A more considered ordering of indexes for processing
by the executor (prepared for it by the relcache), including something
more that goes further than your proposal is useful in the context of
fixing the bug I mentioned [1], but for non-obvious reasons. I would
like to clarify what I meant there specifically. I am repeating
myself, but maybe I just wasn't clear before.

The theory of putting the PK first there is that we then have a
well-defined (uh, better defined) user-visible ordering *across unique
indexes* such that the problem case would *reliably* be fixed. With
only this refactoring patch applied (and no change to the relcache
ordering thing), it is then only a sheer accident of OID assignment
ordering that the INSERT ON CONFLICT problem case happens to take the
alternative path on the OP's *inferred* index (which, as it happens,
was the PK for him), rather than the other unique index that was
involved (the one that is not actually inferred, and yet is equivalent
to the PK, UPSERT-semantics-wise). So, the reporter of the bug [1] is
happy with his exact case now working, at least.

You might now counter: "But why prefer one convention over the other?
Prioritizing the PK would reliably fix that particular problem case,
but that's still pretty arbitrary."

It's true that it's somewhat arbitrary to always (speculatively)
insert into the PK first. But, I think that it's more likely that the
PK is inferred in general, and so it's more likely that users will
fall on the right side of that in practice. Besides, at least we now
have a consistent behavior.

You might also reasonably ask: "But what if there are multiple unique
indexes, none of which happen to be the PK? Isn't that subject to the
same vagaries of OID ordering anyway?"

I must admit that it is. But I don't really know where to draw the
line here. Is it worth contemplating a more complicated scheme still?
For example, trigger-style ordering; a sort order that considers index
name as a "secondary attribute", in order to ensure perfectly
consistent behavior? I must admit that I don't really have a clue
whether or not that's a good idea. It's an idea.

[1] https://www.postgresql.org/message-id/CAM3SWZTFTbK_Y%3D7uWJaXYLHnYQ99pV4KFpmjTKbNJR5_%2BQThzA%40mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-09-27 14:08:14 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Robert Haas 2016-09-27 13:58:59 Re: Showing parallel status in \df+