Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date: 2014-10-25 15:12:42
Message-ID: CA+TgmoabOS4g2PWrwFrE4BCZ0vebs6zwW7PTMx3XtVUsowAgAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 24, 2014 at 6:48 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> You're conflating the user-visible syntax with the parse tree
>> representation in way that is utterly without foundation. I don't
>> have a position at this point on which parse-analysis representation
>> is preferable, but it's completely wrong-headed to say that you
>> "can't" make NEW.x produce the same parse-analysis output that your
>> CONFLICTING(x) syntax would have created. Sure, it might be harder,
>> but it's not that much harder, and it's definitely not an unsolvable
>> problem.
>
> I don't believe I did. The broader point is that the difficulty in
> making that work reflects the conceptually messiness, from
> user-visible aspects down. I can work with the difficulty, and I may
> even be able to live with the messiness, but I'm trying to bring the
> problems with it to a head sooner rather than later for good practical
> reasons. In all sincerity, my real concern is that you or the others
> will change your mind when I actually go and implement an OLD.* style
> syntax, and see the gory details. I regret it if to ask this is to ask
> too much of you, but FYI that's the thought process behind it.

I think what's more likely is that we'll complain about the way you
chose to implement it. I don't believe your contention (in the other
email) that "Support
for an OLD.* style syntax would have to exist at *all* stages of query
execution, from parse analysis through to rewriting, planning, and
execution." I think if your design for implementing that syntax
requires that, and your design for some other syntax doesn't require
that, then you're not thinking clearly enough about what needs to
happen in parse analysis. Make the parse analysis for this syntax
emit the same representation that you would have had it emit in the
other syntax, and you won't have this problem.

> What if we spelled
> EXCLUDING/CONFLICTING as follows:
>
> INSERT INTO upsert VALUES(1, 'Art') ON CONFLICT (key) UPDATE SET val =
> EXCLUDED || 'this works' WHERE another_col != EXCLUDED;
>
> What do you think of that?

I am in complete agreement with the comments made by Petr and Alvaro.

>>>> You're wrong. The choice of which index to use is clearly wildly
>>>> inappropriately placed in the parse analysis phase, and if you think
>>>> that has any chance of ever being committed by anyone, then you are
>>>> presuming the existence of a committer who won't mind ignoring the
>>>> almost-immediate revert request that would draw from, at the very
>>>> least, Tom.
>>>
>>> Why? This has nothing to do with optimization.
>>
>> That is false. If there is more than one index that could be used,
>> the system should select the best one. That is an optimization
>> decision per se. Also, if a plan is saved - in the plancache, say, or
>> in a view - the query can be re-planned if the index it depends on is
>> dropped, but there's no way to do parse analysis.
>
> Generating index paths for the UPDATE is a waste of cycles.
> Theoretically, there could be an (a, b, c) unique index and a (c,b,a)
> unique index, and those two might have a non-equal cost to scan. But
> that almost certainly isn't going to happen in practice, since that's
> a rather questionable indexing strategy, and even when it does, you're
> going to have to insert into all the unique indexes a good proportion
> of the time anyway, making the benefits of that approach pale in
> comparison to the costs.

I don't care whether you actually generate index-paths or not, and in
fact I suspect it makes no sense to do so. But I do care that you do
a cost comparison between the available indexes and pick the one that
looks cheapest. If somebody's got a bloated index and builds a new
one, they will want it to get used even before they drop the old one.

Even if that weren't an issue, though, the fact that you can't
re-parse but you can re-plan is a killer point AFAICS. It means you
are borked if the statement gets re-executed after the index you
picked is dropped.

> From my point of view, I spent a significant amount of time making the
> patch more or less match your proposed design for unique index
> inference. It is discouraging to hear that you think I'm not
> cooperating with community process. I'm doing my best. I think it
> would be a bad idea for me to not engage with the community for an
> extended period at this point. There were plenty of other issues
> address by V1.3 that were not the CONFLICTING()/EXCLUDING thing that
> you highlighted (or the other thing you highlighted around where to do
> unique index inference).

I think this gets at a point that has been bugging me and, perhaps,
other people here. You often post a new patch with some fixes but
without fixes for the issues that reviewers have indicated are
top-of-mind for them. Sometimes, but not always, you say something
like "I know this doesn't fix X but I'd like comments on other aspects
of the patch". Even when you do, though, it can be difficult to
overlook something that appears to be a fundamental structural problem
to comment on details, and sometimes it feels like that's what we're
being asked to do. When I'm reviewing, I tend to find issues more or
less in proportion to the time I spend on the patch. If things that I
complained about before haven't been fixed, I tend to find the same
ones again, but not necessarily all that much faster than I found them
the first time. So that's not efficient for me. I would not make an
absolute rule that no patch should ever be re-posted without
addressing every issue; I wouldn't be able to follow that rule in
every case myself. But I try to follow it as often as I can, and I
would suggest that you try to lean quite a bit more firmly in that
direction. I think you will make more progress, and spend less time
arguing with others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-10-25 18:00:17 Re: Reducing lock strength of adding foreign keys
Previous Message Andreas Karlsson 2014-10-25 14:38:41 [WIP Patch] Using 128-bit integers for sum, avg and statistics aggregates