Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date: 2014-11-21 20:38:41
Message-ID: CAM3SWZSBpioO3A7rx4_8wO86Si=09dmq=hk6V-UaK3N+cZqZgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 20, 2014 at 10:58 PM, Anssi Kääriäinen
<anssi(dot)kaariainen(at)thl(dot)fi> wrote:
> Django uses the command tag currently to check if a row was updated. We
> also use RETURNING to get SERIAL values back from the database on
> insert.
>
> The most likely place to use this functionality in Django is
> Model.save(). This method is best defined as "make sure this object's
> state is either inserted or updated to the database by the primary key
> of the object". The Model.save() method needs to also report if the
> model was created or updated. The command tag is sufficient for this
> case.
>
> So, the proposed feature now has everything Django needs for
> Model.save().

So, to be clear, it would be okay if the command tag reported number
of rows *upserted*, without making any distinction between whether
they were actually inserted or updated? That seems like the least
invasive thing, since top level commands historically report a single
number of affected rows (FWIW I wouldn't report the first OID of a
single inserted tuple, as currently happens with the INSERT command
tag). The ON CONFLICT IGNORE variant would still report number of rows
inserted, though. RETURNING would show the resulting rows from either
the insert or update for each slot processed to completion (i.e.
actually inserted or updated) by the upsert, without making any
user-visible distinction (you asked for an upsert, so you must not
care).

> Django might add a bulk_merge(objs) command later on. This is defined as
> "make sure each obj in objs is persisted to the database using the
> fastest way available". The INSERT ON CONFLICT UPDATE command looks
> excellent for this case. In this case it will be more problematic to
> check which rows were inserted, which update, as we need information for
> each primary key value separately for this case.

Cool.

> When I think of this feature outside of Django, it seems it is
> completely reasonable to return database computed values on UPSERT. This
> requires two queries with the proposed API. My opinion is that RETURNING
> for the update case is better than not having it.

I am almost convinced that that behavior is better. I would like to
hear more opinions before looking at adding the necessary changes to
the implementation, though. It might be a bit questionable that ON
CONFLICT IGNORE and ON CONFLICT UPDATE have different command tags,
for example.

What do other people think? Should RETURNING project updated tuples as
well as inserted tuples, as described here?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message José Luis Tallón 2014-11-21 20:38:55 Re: Additional role attributes && superuser review
Previous Message Andrew Dunstan 2014-11-21 20:20:12 Re: psql \sf doesn't show it's SQL when ECHO_HIDDEN is on