Re: Adding a test for speculative insert abort case

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Taylor Vesely <tvesely(at)pivotal(dot)io>
Subject: Re: Adding a test for speculative insert abort case
Date: 2020-02-08 00:40:46
Message-ID: 20200208004046.t4odt6rqulhx6sbf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'm currently fighting with a race I'm observing in about 1/4 of the
runs. I get:
@@ -361,16 +361,18 @@
locktype mode granted

speculative tokenShareLock f
speculative tokenExclusiveLock t
step controller_unlock_2_4: SELECT pg_advisory_unlock(2, 4);
pg_advisory_unlock

t
s1: NOTICE: blurt_and_lock_123() called for k1 in session 1
s1: NOTICE: acquiring advisory lock on 2
+s1: NOTICE: blurt_and_lock_123() called for k1 in session 1
+s1: NOTICE: acquiring advisory lock on 2
step s1_upsert: <... completed>
step s2_upsert: <... completed>
step controller_show: SELECT * FROM upserttest;
key data

k1 inserted s2 with conflict update s1
(this is the last permutation)

The issue is basically that s1 goes through the
check_exclusion_or_unique_constraint() conflict check twice.

I added a bit of debugging information (using fprintf, with elog it was
much harder to hit):

Success:
2020-02-07 16:14:56.501 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167003: acquiring xact lock
2020-02-07 16:14:56.512 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.522 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 2 3
1167004: acquired xact lock
1167004: xid 8466 acquiring 5
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC70F0: - Heap/INSERT: off 2 flags 0x0C
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] NOTICE: blurt_and_lock_123() called for k1 in session 2
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] NOTICE: acquiring advisory lock on 2
2020-02-07 16:14:56.523 PST [1167004][6/1014:8466] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167004: acquiring xact lock
2020-02-07 16:14:56.533 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.544 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.555 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.565 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.576 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.587 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 2 2
1167004: acquired xact lock
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC7150: - Btree/NEWROOT: lev 0
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC7190: - Btree/INSERT_LEAF: off 1
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] NOTICE: blurt_and_lock_4() called for k1 in session 2
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] NOTICE: acquiring advisory lock on 4
2020-02-07 16:14:56.588 PST [1167004][6/1014:8466] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167004: acquiring xact lock
2020-02-07 16:14:56.598 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.609 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.620 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.630 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 1 2
1167003: acquired xact lock
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC71D0: - Btree/INSERT_LEAF: off 1
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] NOTICE: blurt_and_lock_4() called for k1 in session 1
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] NOTICE: acquiring advisory lock on 4
2020-02-07 16:14:56.631 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC7230: - Btree/NEWROOT: lev 0
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC7270: - Btree/INSERT_LEAF: off 1
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC72A8: - Heap/DELETE: off 1 flags 0x08
1167003: xid 8465 releasing lock 5
1167003: retry due to conflict
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE: acquiring advisory lock on 2
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] NOTICE: acquiring advisory lock on 2
2020-02-07 16:14:56.632 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
1167003: xid 8465 waiting xwait 8466 (xmin 8466 xmax 0) spec 5
2020-02-07 16:14:56.642 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.653 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.667 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:14:56.677 PST [1167001][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167002: releasing xact lock 2 4
1167004: acquired xact lock
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC72E8: - Btree/INSERT_LEAF: off 2
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC7318: - Heap/HEAP_CONFIRM: off 2
1167004: xid 8466 releasing lock 5
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC7348: - Transaction/COMMIT: 2020-02-07 16:14:56.678602-08
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG: xlog flush request 0/9CC7348; write 0/9CBDF58; flush 0/9CBDF58
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] STATEMENT: INSERT INTO upserttest(key, data) VALUES('k1', 'inserted s2') ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s2';
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] NOTICE: acquiring advisory lock on 2
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167003: acquiring xact lock
1167003: acquired xact lock
2020-02-07 16:14:56.678 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC7380: - Heap/LOCK: off 2: xid 8465: flags 0x00 LOCK_ONLY EXCL_LOCK
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC73F0: - Heap/HOT_UPDATE: off 2 xmax 8465 flags 0x10 ; new off 3 xmax 8465
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] LOG: INSERT @ 0/9CC7420: - Transaction/COMMIT: 2020-02-07 16:14:56.679085-08
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] LOG: xlog flush request 0/9CC7420; write 0/9CC7060; flush 0/9CC7060
2020-02-07 16:14:56.679 PST [1167003][5/1254:8465] STATEMENT: INSERT INTO upserttest(key, data) VALUES('k1', 'inserted s1') ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s1';

fail:
1167056: releasing xact lock 2 3
1167058: acquired xact lock
1167058: xid 8490 acquiring 5
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D1D8: - Heap/INSERT: off 2 flags 0x0C
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] NOTICE: blurt_and_lock_123() called for k1 in session 2
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] NOTICE: acquiring advisory lock on 2
2020-02-07 16:16:43.990 PST [1167058][6/11:8490] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167058: acquiring xact lock
2020-02-07 16:16:44.000 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.011 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.022 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.033 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.044 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.054 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167056: releasing xact lock 2 2
1167058: acquired xact lock
2020-02-07 16:16:44.055 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D238: - Btree/NEWROOT: lev 0
2020-02-07 16:16:44.055 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D278: - Btree/INSERT_LEAF: off 1
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] NOTICE: blurt_and_lock_4() called for k1 in session 2
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] NOTICE: acquiring advisory lock on 4
2020-02-07 16:16:44.056 PST [1167058][6/11:8490] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167058: acquiring xact lock
2020-02-07 16:16:44.066 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.076 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.087 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.098 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167056: releasing xact lock 1 2
1167057: acquired xact lock
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D2B8: - Btree/INSERT_LEAF: off 1
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] NOTICE: blurt_and_lock_4() called for k1 in session 1
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 3 at RAISE
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] NOTICE: acquiring advisory lock on 4
2020-02-07 16:16:44.099 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_4(text) line 4 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D318: - Btree/NEWROOT: lev 0
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D358: - Btree/INSERT_LEAF: off 1
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D390: - Heap/DELETE: off 1 flags 0x08
1167057: xid 8489 releasing lock 5
1167057: retry due to conflict
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE: acquiring advisory lock on 2
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] NOTICE: acquiring advisory lock on 2
2020-02-07 16:16:44.100 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 5
2020-02-07 16:16:44.110 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.121 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.135 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
2020-02-07 16:16:44.145 PST [1167055][3/0:0] DEBUG: bind <unnamed> to isolationtester_waiting
1167056: releasing xact lock 2 4
1167058: acquired xact lock
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D3D0: - Btree/INSERT_LEAF: off 2
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D400: - Heap/HEAP_CONFIRM: off 2
1167058: xid 8490 releasing lock 5
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D430: - Transaction/COMMIT: 2020-02-07 16:16:44.146767-08
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG: xlog flush request 0/9D2D430; write 0/9D24058; flush 0/9D24058
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] STATEMENT: INSERT INTO upserttest(key, data) VALUES('k1', 'inserted s2') ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s2';
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] NOTICE: acquiring advisory lock on 2
2020-02-07 16:16:44.146 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 0
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] NOTICE: blurt_and_lock_123() called for k1 in session 1
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 3 at RAISE
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] NOTICE: acquiring advisory lock on 2
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] CONTEXT: PL/pgSQL function blurt_and_lock_123(text) line 7 at RAISE
1167057: acquiring xact lock
1167057: acquired xact lock
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D468: - Heap/LOCK: off 2: xid 8489: flags 0x00 LOCK_ONLY EXCL_LOCK
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D4D8: - Heap/HOT_UPDATE: off 2 xmax 8489 flags 0x10 ; new off 3 xmax 8489
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG: INSERT @ 0/9D2D508: - Transaction/COMMIT: 2020-02-07 16:16:44.147348-08
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] LOG: xlog flush request 0/9D2D508; write 0/9D2D148; flush 0/9D2D148
2020-02-07 16:16:44.147 PST [1167057][5/13:8489] STATEMENT: INSERT INTO upserttest(key, data) VALUES('k1', 'inserted s1') ON CONFLICT (blurt_and_lock_123(key)) DO UPDATE SET data = upserttest.data || ' with conflict update s1';

The important bit is here the different "xid .* waiting xwait .* spec*"
lines. In the success case we see:

1167003: xid 8465 waiting xwait 8466 (xmin 8466 xmax 0) spec 5
1167002: releasing xact lock 2 4
1167004: acquired xact lock
1167004: xid 8466 releasing lock 5
2020-02-07 16:14:56.678 PST [1167004][6/1014:8466] LOG: INSERT @ 0/9CC7348: - Transaction/COMMIT: 2020-02-07 16:14:56.678602-08
1167003: acquired xact lock

In the failing case we see:
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 5
1167056: releasing xact lock 2 4
1167058: acquired xact lock
1167058: xid 8490 releasing lock 5
2020-02-07 16:16:44.146 PST [1167058][6/11:8490] LOG: INSERT @ 0/9D2D430: - Transaction/COMMIT: 2020-02-07 16:16:44.146767-08
1167057: xid 8489 waiting xwait 8490 (xmin 8490 xmax 0) spec 0
1167057: acquired xact lock

I think the issue here is that determines whether s1 can finish its
check_exclusion_or_unique_constraint() check with one retry is whether
it reaches it does the tuple visibility test before s2's transaction has
actually marked itself as visible (note that ProcArrayEndTransaction is
after RecordTransactionCommit logging the COMMIT above).

I think the fix is quite easy: Ensure that there *always* will be the
second wait iteration on the transaction (in addition to the already
always existing wait on the speculative token). Which is just adding
s2_begin s2_commit steps. Simple, but took me a few hours to
understand :/.

I've attached that portion of my changes. Will interrupt scheduled
programming for a bit of exercise now.

Greetings,

Andres Freund

Attachment Content-Type Size
fix.diff text/x-diff 14.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-02-08 01:01:07 Re: error context for vacuum to include block number
Previous Message Tom Lane 2020-02-07 22:18:46 Re: Draft release notes are up for review