Re: Unexpected "shared memory block is still in use"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Unexpected "shared memory block is still in use"
Date: 2019-05-09 22:47:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> However, I have a new theory, after noticing that c09850992 moved the
> check for shm_nattch == 0. Previously, if a shmem segment had zero attach
> count, it was unconditionally considered not-a-threat. Now, we'll try
> shmat() anyway, and if that fails for any reason other than EACCES, we say
> SHMSTATE_ANALYSIS_FAILURE which leads to the described error report.
> So I suspect that what we hit was a race condition whereby some other
> parallel test was using the same shmem ID and we managed to see its
> segment successfully in shmctl but then it was gone by the time we did
> shmat. This leads me to think that EINVAL and EIDRM failures from
> shmat had better be considered SHMSTATE_ENOENT not
> In principle this is a longstanding race condition, but I wonder
> whether we made it more probable by moving the shm_nattch check.

Hah --- this is a real race condition, and I can demonstrate it very
easily by inserting a sleep right there, as in the attached
for-testing-only patch.

The particular parallelism level I use is

make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount'

on a dual-socket 4-cores-per-socket Xeon machine. With that command and
this patch, I frequently get multiple failures per run, and they all
report either EINVAL or EIDRM.

The patch generally reports that nattch had been 1, so my thought that
that change might've made it worse seems unfounded. But we have
absolutely got a hittable race condition here. The real fix should
be on the order of

if (errno == EACCES)
+ else if (errno == EINVAL || errno == EIDRM)

(plus comments of course).

regards, tom lane

Attachment Content-Type Size
sleep-to-expose-shmat-race-condition.patch text/x-diff 856 bytes

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-05-09 23:20:02 Re: Inconsistent error message wording for REINDEX CONCURRENTLY
Previous Message Peter Geoghegan 2019-05-09 22:32:52 Re: _bt_split(), and the risk of OOM before its critical section