Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andrey Salnikov <andrey(dot)salnikov(at)dataegret(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication
Date: 2019-11-26 17:10:54
Message-ID: 20191126171054.GB4161@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2019-Nov-19, Andres Freund wrote:

> > > 2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
> > > 2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
> > > 2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
> > > 2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
> > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
> > > 2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
> > > 2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
> > > 2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
> > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
> > > 2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
> > > 2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
> > > 56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
>
> > > 56347:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
> > > 56348:rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
> > > 56349:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
> > > 56350:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
> > > 56351:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
> > > 56352:rmgr: Transaction len (rec/tot): 66/ 66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
>
> > So, this DELETE change triggers the issue. If I understand it correctly,
> > that means the transaction locked the tuple and then deleted it. But I
> > don't see any record about the lock.
>
> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>
> if (old_infomask & HEAP_XMAX_INVALID)
>
> is true. Which is the most common case.

I don't understand what you guys are saying. There's a DELETE at
25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble.
Right? There's also a LOCK+UPDATE sequence just below that, which is
unrelated (different rel). Are you saying that one of them is toast and
the other is its main heap? It would be useful to know more about the
rels.

> I actually find it *more*
> curious to *not* see that for a delete. I've not traced fully through
> it, yet.
> Alvaro, it's not clear to me whether the the multixact codepaths in
> compute_new_xmax_infomask() actually work as expected for deletes.

Are you saying that the three *prior* deletes (lines 2532-2534) are
strange because they lack the KEYS_UPDATED bit? (Note these are all for
different rels).

I'm not sure what to do about this report at this point.
Notwithstanding my general failure to focus on anything too complex
given my surrounding social context.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2019-11-26 17:41:31 Re: Failed assertion clauses != NIL
Previous Message Tomas Vondra 2019-11-26 14:18:54 Re: BUG #16138: Error while installing PostgreSQL