Re: Promise index tuples for UPSERT

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Promise index tuples for UPSERT
Date: 2014-10-07 23:34:09
Message-ID: CAM3SWZQvSf+UWpt1YfTqudc4Z2j5dwyBX2fQQfDR-1k-CB6eog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 7, 2014 at 11:25 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Now, we're not talking about a huge advantage here (I should re-test
> that).

I attach raw output when running the bash scripts insert.sh and
update.sh. These are benchmarks that concern performance in terms of
total system throughput (TPS). The scripts are available from my
stress-test suite: https://github.com/petergeoghegan/upsert

These scripts were originally designed to compare UPSERT with an
unsympathetic "gold-standard" for performance: "equivalent" INSERTs
and UPDATEs. I looked at a few runs of 60 seconds, on unlogged tables,
making the most direct comparison between UPSERTs and "equivalent"
INSERTs and UPDATEs that is possible. To be clear, by "equivalent" I
mean UPSERTS where we know we'll only UPDATE (in the case of
update.sh), and UPSERTS where we know we'll only INSERT (in the case
of insert.sh).

Both #1 and #2 do respectably as compared to "equivalent" INSERTs and
UPDATEs. There may be even less sympathetic though more representative
cases, but certainly for these simple cases, performance is solid
across the board. I got these numbers on my laptop, and it may be
necessary to devise a more rigorous benchmark later, but performance
is quite consistent between runs shown here.

Approach #1 wins out with UPDATEs. The heavyweight-lock avoidance
stuff is enough to compensate for the fact that we never INSERT (and
never need B-Tree leaf page heavyweight locks). Median TPS was
19,310.79 for #1. Whereas for #2, it was 18,872.63 TPS. This is the
case even though the "pre-check" for #2 is always appropriate, while
we still acquire page-level heavyweight locks sometimes with #1.

INSERTs see #2 win, and by a wider margin than #1 beat #2 with
UPDATEs. However, insert.sh is by design very unsympathetic towards
#1. It uses a serial primary key, so every INSERT uselessly obtains a
HW lock on the same leaf page for the duration of heap insertion.
Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and
20,441.57 TPS for #2. So you're pretty much seeing the full brunt of
page heavyweight locking, and it isn't all that bad. However, Heikki
has said something about being more clever with when and how #2 is
made to pre-check (which is always wasted here); so it's possible to
imagine INSERTs becoming faster for #2, while that probably isn't the
case for #1. I think that if I wanted to, I could get #1 to do much
better on another case where page heavyweight locking is more varied.
My goal here was to do the opposite, though.

--
Peter Geoghegan

Attachment Content-Type Size
upsert_benchmark.txt text/plain 16.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2014-10-08 00:42:10 Re: Add regression tests for autocommit-off mode for psql and fix some omissions
Previous Message Tom Lane 2014-10-07 22:05:20 Re: Proposal for better support of time-varying timezone abbreviations