Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date: 2023-09-01 19:00:00
Message-ID: 7f006842-975a-bb0a-d8cf-ffa4cc2bbe36@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Tomas,

01.09.2023 16:00, Tomas Vondra wrote:
> Hmmm, I'm not very good at reading the binary code, but here's what
> objdump produced for WaitEventSetWait. Maybe someone will see what the
> issue is.

At first glance, I can't see anything suspicious in the disassembly.
IIUC, waiting = true presented there as:
  805c38: b902ad18      str     w24, [x8, #684] // pgstat_report_wait_start(): proc->wait_event_info = wait_event_info;
// end of pgstat_report_wait_start(wait_event_info);

  805c3c: b0ffdb09      adrp    x9, 0x366000 <dsm_segment_address+0x24>
  805c40: b0ffdb0a      adrp    x10, 0x366000 <dsm_segment_address+0x28>
  805c44: f0000eeb      adrp    x11, 0x9e4000 <PMSignalShmemInit+0x4>

  805c48: 52800028      mov     w8, #1 // true
  805c4c: 52800319      mov     w25, #24
  805c50: 5280073a      mov     w26, #57
  805c54: fd446128      ldr     d8, [x9, #2240]
  805c58: 90000d7b      adrp    x27, 0x9b1000 <ModifyWaitEvent+0xb0>
  805c5c: fd415949      ldr     d9, [x10, #688]
  805c60: f9071d68      str     x8, [x11, #3640] // waiting = true (x8 = w8)
So there are two simple mov's and two load operations performed in parallel,
but I don't think it's similar to what we had in that case.

> I thought about maybe just adding the barrier in the code, but then how
> would we know it's the issue and this fixed it? It happens so rarely we
> can't make any conclusions from a couple runs of tests.

Probably I could construct a reproducer for the lockup if I had access to
the such machine for a day or two.

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2023-09-01 19:00:14 Speaker Bureau
Previous Message Jeff Davis 2023-09-01 18:57:01 Re: [17] CREATE SUBSCRIPTION ... SERVER