Re: Why is lorikeet so unstable in v14 branch only?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Why is lorikeet so unstable in v14 branch only?
Date: 2022-03-27 16:31:23
Message-ID: 278381.1648398683@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> It appears that it is using PS_USE_NONE, as it doesn't have any of the
> defines required for the other paths. I note that the branch for that in
> get_ps_display() doesn't set *displen, which looks a tad suspicious.

Indeed. I forced it to use PS_USE_NONE on my Linux machine, and got
a core dump on the first try of the regression tests:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 __memmove_avx_unaligned_erms ()
at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:516
516 VMOVNT %VEC(0), (%r9)
(gdb) bt
#0 __memmove_avx_unaligned_erms ()
at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:516
#1 0x00000000008299b3 in WaitOnLock (locallock=locallock(at)entry=0x2a5e700,
owner=owner(at)entry=0x2aba8f0) at lock.c:1831
#2 0x000000000082adc6 in LockAcquireExtended (
locktag=locktag(at)entry=0x7ffc864fad90, lockmode=lockmode(at)entry=1,
sessionLock=sessionLock(at)entry=false, dontWait=dontWait(at)entry=false,
reportMemoryError=reportMemoryError(at)entry=true,
locallockp=locallockp(at)entry=0x7ffc864fad88) at lock.c:1101
#3 0x000000000082861f in LockRelationOid (relid=1259, lockmode=1)
at lmgr.c:117
#4 0x000000000051c5ed in relation_open (relationId=1259,
lockmode=lockmode(at)entry=1) at relation.c:56
...

(gdb) f 1
#1 0x00000000008299b3 in WaitOnLock (locallock=locallock(at)entry=0x2a5e700,
owner=owner(at)entry=0x2aba8f0) at lock.c:1831
1831 memcpy(new_status, old_status, len);
(gdb) p len
$1 = -1

Problem explained, good detective work!

> And maybe there's a good case for also
> surrounding some of the code in WaitOnLock() with "if (len) ..."

+1. I'll make it so, and check the other callers too.

Once I push this, you should remove the update_process_title hack
from lorikeet's config, since that was just a workaround until
we tracked down the problem, which I think we just did.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2022-03-27 17:00:11 Re: Document atthasmissing default optimization avoids verification table scan
Previous Message Matthias van de Meent 2022-03-27 16:21:42 Re: Assert in pageinspect with NULL pages