Re: [HACKERS] Deadlock in XLogInsert at AIX

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bernd Helmle <mailings(at)oopsware(dot)de>
Subject: Re: [HACKERS] Deadlock in XLogInsert at AIX
Date: 2019-10-09 06:39:00
Message-ID: 20191009063900.GA4066266@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > [ fetch-add-gcc-xlc-unify-v2.patch ]
>
> This still fails on Apple's compilers. The first failure I get is
>
> ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include -I/usr/local/include -isysroot /Developer/SDKs/MacOSX10.5.sdk -c -o nodeHashjoin.o nodeHashjoin.c
> /var/tmp//ccXUM8ep.s:449:Parameter error: r0 not allowed for parameter 2 (code as 0 not r0)
>
> Line 449 of the assembly file is the addi in
>
> LM87:
> sync
> lwarx r0,0,r2
> addi r11,r0,1
> stwcx. r11,0,r2
> bne $-12
> isync
>
> which I suppose comes out of PG_PPC_FETCH_ADD. I find this idea of
> constructing assembly code by string-pasting pretty unreadable and am not
> tempted to try to debug it, but I don't immediately see why this doesn't
> work when the existing s_lock.h code does. I think that the assembler
> error message is probably misleading: while it seems to be saying to
> s/r0/0/ in the addi, gcc consistently uses "rN" syntax for the second
> parameter elsewhere. I do note that gcc never generates r0 as addi's
> second parameter in several files I checked through, so maybe what it
> means is "you need to use some other register"? (Which would imply that
> the constraint for this asm argument is too loose.)

Thanks for testing. That error boils down to "need to use some other
register". The second operand of addi is one of the ppc instruction operands
that can hold a constant zero or a register number[1], so the proper
constraint is "b". I've made it so and added a comment. I should probably
update s_lock.h, too, in a later patch. I don't know how it has
mostly-avoided this failure mode, but its choice of constraint could explain
https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com

> I'm also wondering why this isn't following s_lock.h's lead as to
> USE_PPC_LWARX_MUTEX_HINT and USE_PPC_LWSYNC.

Most or all of today's pg_atomic_compare_exchange_u32() usage does not have
the property that the mutex hint would signal.

pg_atomic_compare_exchange_u32() specifies "Full barrier semantics", which
lwsync does not provide. We might want to relax the specification to make
lwsync acceptable, but that would be a separate, architecture-independent
project. (generic-gcc.h:pg_atomic_compare_exchange_u32_impl() speculates
along those lines, writing "FIXME: we can probably use a lower consistency
model".)

[1] "Notice that addi and addis use the value 0, not the contents of GPR 0, if
RA=0." -- https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

Attachment Content-Type Size
fetch-add-gcc-xlc-unify-v3.patch text/plain 14.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2019-10-09 06:57:39 Re: Transparent Data Encryption (TDE) and encrypted files
Previous Message Michael Paquier 2019-10-09 06:26:40 Safeguards against incorrect fd flags for fsync()