Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date: 2023-09-02 21:00:00
Message-ID: 2132c88f-7e32-6dba-1057-2ecc5ce66509@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Robert,

01.09.2023 23:21, Robert Haas wrote:
> On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin<exclusion(at)gmail(dot)com> wrote:
>> (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
>> issue for us.)
> Maybe it'd be worth trying something stronger, like
> pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from
> reordering loads and stores as it goes, and ARM64 has weak memory
> ordering.

Indeed, thank you for the tip!
So maybe here we deal with not compiler's, but with CPU's optimization.
The wider code fragment is:
  805c48: 52800028      mov     w8, #1 // true
  805c4c: 52800319      mov     w25, #24
  805c50: 5280073a      mov     w26, #57
  805c54: fd446128      ldr     d8, [x9, #2240]
  805c58: 90000d7b      adrp    x27, 0x9b1000 <ModifyWaitEvent+0xb0>
  805c5c: fd415949      ldr     d9, [x10, #688]
  805c60: f9071d68      str     x8, [x11, #3640] // waiting = true (x8 = w8)
  805c64: f90003f3      str     x19, [sp]
  805c68: 14000010      b       0x805ca8 <WaitEventSetWait+0x108>

  805ca8: f9400a88      ldr     x8, [x20, #16] // if (set->latch && set->latch->is_set)
  805cac: b4000068      cbz     x8, 0x805cb8 <WaitEventSetWait+0x118>
  805cb0: f9400108      ldr     x8, [x8]
  805cb4: b5001248      cbnz    x8, 0x805efc <WaitEventSetWait+0x35c>
  805cb8: f9401280      ldr     x0, [x20, #32]

If that CPU can delay the writing to the variable waiting
(str x8, [x11, #3640]) in it's internal form like
"store 1 to [address]" to 805cb0 or a later instruction, then we can get the
behavior discussed. Something like that is shown in the ARM documentation:
https://developer.arm.com/documentation/102336/0100/Memory-ordering?lang=en
I'll try to test this guess on the target machine...

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-09-02 23:06:20 Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Previous Message Tomas Vondra 2023-09-02 19:09:44 Re: Initdb-time block size specification