Quick Links

Re: Making joins involving ctid work for the benefit of UPSERT

From:	David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Making joins involving ctid work for the benefit of UPSERT
Date:	2014-07-23 22:48:47
Message-ID:	1406155727865-5812640.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Peter Geoghegan-3 wrote
>> with semantics like this:
>>
>> 1. Search the table, using any type of scan you like, for a row
>> matching the given predicate.
>
> Perhaps I've misunderstood, but this is fundamentally different to
> what I'd always thought would need to happen. I always understood that
> the UPSERT should be insert-driven, and so not really have an initial
> scan at all in the same sense that every Insert lacks one. Moreover,
> I've always thought that everyone agreed on that. We go to insert, and
> then in the course of inserting index tuples do something with dirty
> snapshots. That's how we get information on conflicts.

SQL standard not-withstanding there really is no way to pick one
implementation and not have it be sub-optimal in some situations. Unless
there is a high barrier why not introduce syntax:

SCAN FIRST; INSERT FIRST

that allows the user to specify the behavior that they expect would be most
efficient given their existing/new data ratio?

>> Having said all that, I believe the INSERT ON CONFLICT syntax is more
>> easily comprehensible than previous proposals. But I still tend to
>> agree with Andres that an explicit UPSERT syntax or something like it,
>> that captures all of the MVCC games inside itself, is likely
>> preferable from a user standpoint, whatever the implementation ends up
>> looking like.
>
> Okay then. If you both feel that way, I will come up with something
> closer to what you sketch. But for now I still strongly feel it ought
> to be driven by an insert. Perhaps I've misunderstood you entirely,
> though.

Getting a little syntax crazy here but having all of:

UPSERT [SCAN|INSERT] FIRST
INSERT ON CONFLICT UPDATE - same as INSERT FIRST
UPDATE ON MISSING INSERT - same as SCAN FIRST

with the corresponding 2 implementations would make the user interface
slightly more complicated but able to be conformed to the actual data that
the user has.

You could basically perform a two-phase pass where you run the
user-requested algorithm and then for all failures attempt the alternate
algorithm and then error if both fail.

I am not at all fluent on the concurrency issues here, and the MVCC
violations and re-tries that might be considered, but at a high-level there
is disagreement here simply because both answers are "correct" and ideally
both can be provided to the user.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Making-joins-involving-ctid-work-for-the-benefit-of-UPSERT-tp5811919p5812640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

In response to

Re: Making joins involving ctid work for the benefit of UPSERT at 2014-07-23 20:05:22 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2014-07-23 22:59:14	Re: Making joins involving ctid work for the benefit of UPSERT
Previous Message	Krystian Bigaj	2014-07-23 22:29:41	Re: System shutdown signal on Windows (was Re: [GENERAL])