Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, James Coleman <jtc331(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Date: 2019-12-17 22:21:11
Message-ID: CA+hUKGJMJHJNzLJgyFei-681Av7eLjOrNGK4cL9cw-_S1s78Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Dec 17, 2019 at 10:02 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On Mon, Dec 16, 2019 at 03:09:22PM +0100, Tomas Vondra wrote:
> >On Sat, Dec 14, 2019 at 02:39:12PM +0100, Tomas Vondra wrote:
> >>On Sat, Dec 14, 2019 at 05:01:05PM +1300, Thomas Munro wrote:
> >>>On Wed, Dec 11, 2019 at 6:32 AM Tomas Vondra
> >>><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >>>>The annoying thing is that it's the workers that crash, and the leader
> >>>>failed to notice that, so it was waiting in WaitForParallelWorkersToExit
> >>>>forever. Not sure what the issue is.
> >>>
> >>>That's bad. Some edge case in
> >>>WaitForBackgroundWorkerShutdown()/GetBackgroundWorkerPid() that is
> >>>b0rked perhaps? Can you reproduce it, and tell us what the stack
> >>>looks like, maybe on a new thread?
> >>
> >>Sure. I'll try reproducing it and get this extra info. I was planning to
> >>do that, but then I lost the data for the report because of a silly
> >>mistake. Early next week, hopefully.
> >>
> >
> >I've tried reproducing this, but so far no luck. I've only managed to do
> >two runs, though, because this particular machine only has space for the
> >1TB of temp files on spinning rust. I wonder if it's possible to
> >reproduce the issue with a smaller data set? I've tried using smaller
> >work_mem, but then it does not trigger the issue :-(
> >
>
> And of course, the very next run after I send the "can't reproduce"
> message actually reproduces the issue. How typical ...

Thanks for reproducing this.

> 2019-12-16 20:25:00.617 CET [14081] ERROR: invalid DSA memory alloc request size 1711276032
> 2019-12-16 20:25:00.617 CET [14081] STATEMENT: explain analyze select count(*) from sevenb t1 join sevenb t2 using (i);
> 2019-12-16 20:25:00.617 CET [29417] ERROR: invalid DSA memory alloc request size 1711276032
> 2019-12-16 20:25:00.617 CET [29417] CONTEXT: parallel worker

Ok, so 14081 raised an error and exited, ...

> 2019-12-16 20:25:00.746 CET [29408] LOG: background worker "parallel worker" (PID 14081) exited with exit code 1

... and then the leader cancelled the query on 5 other workers,

> 2019-12-16 20:25:00.816 CET [29408] LOG: background worker "parallel worker" (PID 14079) exited with exit code 1
> 2019-12-16 20:25:00.816 CET [29408] LOG: background worker "parallel worker" (PID 14085) exited with exit code 1
> 2019-12-16 20:25:00.817 CET [29408] LOG: background worker "parallel worker" (PID 14084) exited with exit code 1
> 2019-12-16 20:25:00.817 CET [29408] LOG: background worker "parallel worker" (PID 14080) exited with exit code 1
> 2019-12-16 20:25:00.818 CET [29408] LOG: background worker "parallel worker" (PID 14078) exited with exit code 1

... but somehow there were 2 more workers that didn't get the message
and are waiting:

> #0 0x00007f64684d2bfb in epoll_wait (epfd=6, events=0x55d8873d53d8, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #1 0x000055d8864d4e3c in WaitEventSetWaitBlock (set=0x55d8873d5378, cur_timeout=-1, occurred_events=0x7ffc6e805130, nevents=1) at latch.c:1080
> #2 0x000055d8864d4d00 in WaitEventSetWait (set=0x55d8873d5378, timeout=-1, occurred_events=0x7ffc6e805130, nevents=1, wait_event_info=134217742) at latch.c:1032
> #3 0x000055d8864e1848 in ConditionVariableTimedSleep (cv=0x7f63602a060c, timeout=-1, wait_event_info=134217742) at condition_variable.c:186
> #4 0x000055d8864e17b2 in ConditionVariableSleep (cv=0x7f63602a060c, wait_event_info=134217742) at condition_variable.c:126
> #5 0x000055d8864d05f2 in BarrierArriveAndWait (barrier=0x7f63602a05f4, wait_event_info=134217742) at barrier.c:191
> #6 0x000055d886328d4e in ExecParallelHashIncreaseNumBatches (hashtable=0x55d887481e70) at nodeHash.c:1189

They're waiting on a condition variable, in a barrier, while trying to
coordinate an increase in the number of batches, while hashing. I
suspect the details of what exactly it's waiting for are not relevant
here though. ConditionVariableTimedSleep() contains a standard latch
wait loop with CHECK_FOR_INTERRUPTS(), and parallel workers run with
the standard SIGTERM handler die(). So what I expect to happen here
is:

1. DestroyParallelContext() calls TerminateBackgroundWorker() for all workers.
2. TerminateBackgroundWorker() sets slot->terminate, and signals the
postmaster.
3. The postmaster sees the terminate flag and does kill(worker, SIGTERM).
4. The worker handles SIGTERM by setting ProcDiePending and setting
its own latch.
5. The ConditionVariableTimedSleep()'s syscall returs,
CHECK_FOR_INTERRUPTS() runs and sees ProcDiePending.

So how did it escape this fate? Ooooh. Now I'm wondering if the
problem is that ConditionVariablePrepareToSleep() contains a call to
ResetLatch(), but doesn't then run CHECK_FOR_INTERRUPTS(). Perhaps
that ate the latch that was set by die(). There isn't another
CHECK_FOR_INTERRUPTS() until *after* the WaitEventSetWait() in
ConditionVariableTimedSleep().

> I'll leave the system in this "stuck" space in case we need to do
> further investigation.

What are the value sof InterruptHoldoffCount, CritSectionCount,
InterruptPending and ProcDiePending in the stuck workers?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2019-12-17 23:21:24 Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
Previous Message Tom Lane 2019-12-17 21:05:57 Re: BUG #16169: Default time output for 24:00 is 00:00