Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrey Salnikov <andrey(dot)salnikov(at)dataegret(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication
Date: 2019-11-26 18:27:52
Message-ID: 20191126182752.byojog7xws2or73x@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Nov 26, 2019 at 02:10:54PM -0300, Alvaro Herrera wrote:
>On 2019-Nov-19, Andres Freund wrote:
>
>> > > 2518:rmgr: Heap len (rec/tot): 2063/ 2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
>> > > 2520:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
>> > > 2522:rmgr: Heap len (rec/tot): 1341/ 9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
>> > > 2524:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
>> > > 2525:rmgr: Heap len (rec/tot): 1671/ 1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
>> > > 2527:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
>> > > 2530:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
>> > > 2531:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
>> > > 2532:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
>> > > 2533:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
>> > > 2534:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
>> > > 56346:rmgr: Heap len (rec/tot): 378/ 8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
>>
>> > > 56347:rmgr: Heap len (rec/tot): 54/ 54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
>> > > 56348:rmgr: Heap len (rec/tot): 6563/ 6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
>> > > 56349:rmgr: Btree len (rec/tot): 64/ 64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
>> > > 56350:rmgr: Btree len (rec/tot): 96/ 96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
>> > > 56351:rmgr: Btree len (rec/tot): 80/ 80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
>> > > 56352:rmgr: Transaction len (rec/tot): 66/ 66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
>>
>> > So, this DELETE change triggers the issue. If I understand it correctly,
>> > that means the transaction locked the tuple and then deleted it. But I
>> > don't see any record about the lock.
>>
>> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
>> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>>
>> if (old_infomask & HEAP_XMAX_INVALID)
>>
>> is true. Which is the most common case.
>
>I don't understand what you guys are saying. There's a DELETE at
>25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble.
>Right? There's also a LOCK+UPDATE sequence just below that, which is
>unrelated (different rel). Are you saying that one of them is toast and
>the other is its main heap? It would be useful to know more about the
>rels.
>

I was wondering why there's no LOCK before the DELETE, when the waldump
output has KEYS_UPDATED for that change. But as Andres says, it's likely
normal because e.g. heap_delete sets flag in certain cases.

I agree it would be useful to know more about the rels. AFAIK 88964815
is the heap, and my guess is the other rels are various indexes on it.
I've already said so in my message from last week.

>> I actually find it *more*
>> curious to *not* see that for a delete. I've not traced fully through
>> it, yet.
>> Alvaro, it's not clear to me whether the the multixact codepaths in
>> compute_new_xmax_infomask() actually work as expected for deletes.
>
>Are you saying that the three *prior* deletes (lines 2532-2534) are
>strange because they lack the KEYS_UPDATED bit? (Note these are all for
>different rels).
>
>I'm not sure what to do about this report at this point.
>Notwithstanding my general failure to focus on anything too complex
>given my surrounding social context.
>

I think having more information about the structure (tables, indexes,
mapping for relfilenodes) and a better idea what the transaction is
doing, would be helpful.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Abhilash Mishra 2019-11-26 18:38:05 Re: BUG #16138: Error while installing PostgreSQL
Previous Message Tomas Vondra 2019-11-26 17:41:31 Re: Failed assertion clauses != NIL