Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
Date: 2015-01-06 01:53:24
Message-ID: CAM3SWZSd3MU=c9HcP-SxCbUcLvFWT8t9MZCotOJxuE3RbTvd0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> The attached patch fixes Jeff's test case, as far as it goes. But, as
> I mentioned, it's not intended as a real fix. For one thing, I have
> made multiple changes to HeapTupleSatisfies*() routines, but haven't
> fully considered the consequences for, say, ri_triggers.c.
> HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not
> modified. I've modified HeapTupleSatisfiesVacuum() to see
> InvalidTransactionID (not invalid xid infomask bits) as visible for
> garbage collection (alongside HeapTupleSatisfiesDirty() and
> HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be
> surprised if that created new problems in other areas. In short, this
> patch is just for illustrative purposes.

I think that I see another race condition, requiring custom
instrumentation to the server to demonstrate.

Basically, even with the fix above, there are still similar races. I'm
not trying to call ExecLockUpdateTuple() against "super deleted" NULL
tuples, because those are no longer visible to any relevant snapshot
type, the fix described above. However, I can still see a change in
tuple header xmin between a dirty index scan and attempting to lock
that row to update. If I'm not mistaken, the race condition is small
enough that something like Jeff's tool won't catch it directly in a
reasonable amount of time, because it happens to be the same index key
value.

It's not all that easy to recreate, even with my custom
instrumentation. With fsync=off, it occurs somewhat infrequently with
a custom instrumentation Postgres build:

pg(at)hamster:~/jjanes_upsert$ perl count_upsert.pl 8 100000
[Mon Jan 5 17:31:57 2015] init done at count_upsert.pl line 101.
[Mon Jan 5 17:32:17 2015] child abnormal exit [Mon Jan 5 17:32:17
2015] DBD::Pg::db selectall_arrayref failed: ERROR: mismatch in xmin
for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line
210.\n at count_upsert.pl line 227.
[Mon Jan 5 17:32:49 2015] sum is -800
[Mon Jan 5 17:32:49 2015] count is 9515
[Mon Jan 5 17:32:49 2015] normal exit at 1420507969 after 733912
items processed at count_upsert.pl line 182.

I think that "super deleting" broken promise tuples undermines how
vacuum interlocking is supposed to work [1]. We end up at the wrong
tuple, that happens to occupy the same slot as the old one (and
happens to have the same index key value, because the race is so small
are there are relatively few distinct values in play - it's hard to
recreate). Attached instrumentation patch was used with Jeff Jane's
tool.

If we weren't super deleting in the patch, then I think that our
backend's xmin horizon would be enough of an interlock (haven't tested
that theory by testing value locking approach #1 implementation yet,
but I worried about the scenario quite a lot before and things seemed
okay). But we are "super deleting" with implementation #2, and we're
also not holding a pin on the heap buffer or B-Tree leaf page buffer
throughout, so this happens (if I'm not mistaken).

[1] https://github.com/postgres/postgres/blob/REL9_3_STABLE/src/backend/access/nbtree/README#L142
--
Peter Geoghegan

Attachment Content-Type Size
xmin_changes_race.patch text/x-patch 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-01-06 02:09:23 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Peter Eisentraut 2015-01-06 01:43:12 Re: Turning recovery.conf into GUCs