Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date: 2023-01-29 12:53:10
Message-ID: 0581936e-ce1c-862f-f6e0-e4e7cbec2936@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/28/23 13:05, Tomas Vondra wrote:
>
> FWIW I'll wait for dikkop to finish the current buildfarm run (it's
> currently chewing on HEAD) and then will try to do runs of the 'joins'
> test in a loop. That's where dikkop got stuck before.
>

So I did that - same configure options as the buildfarm client, and a
'make check' (with only tests up to the 'join' suite, because that's
where it got stuck before). And it took only ~15 runs (~1h) to hit this
again on dikkop.

As before, there are three processes - leader + 2 workers, but the query
is different - this time it's this one:

-- A couple of other hash join tests unrelated to work_mem management.
-- Check that EXPLAIN ANALYZE has data even if the leader doesn't
participate
savepoint settings;
set local max_parallel_workers_per_gather = 2;
set local work_mem = '4MB';
set local parallel_leader_participation = off;
select * from hash_join_batches(
$$
select count(*) from simple r join simple s using (id);
$$);

I managed to collect the fstat/procstat stuff Thomas asked for, and the
backtraces - attached. I still have the core files, in case we look at
something. As before, running gcore on the second worker (29081) gets
this unstuck - it sends some signal that apparently wakes it up.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
bt.29046.log text/x-log 5.3 KB
bt.29080.log text/x-log 4.1 KB
bt.29081.log text/x-log 4.3 KB
fstat.29046.log text/x-log 7.0 KB
fstat.29080.log text/x-log 1.0 KB
fstat.29081.log text/x-log 1.0 KB
procstat.29046.log text/x-log 5.3 KB
procstat.29080.log text/x-log 5.3 KB
procstat.29081.log text/x-log 5.3 KB
ps-ax.log text/x-log 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2023-01-29 12:56:02 Re: pg_stat_statements and "IN" conditions
Previous Message Dmitry Dolgov 2023-01-29 12:22:42 Re: pg_stat_statements and "IN" conditions