Re: On-demand running query plans using auto_explain and signals

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On-demand running query plans using auto_explain and signals
Date: 2015-09-14 17:27:15
Message-ID: CAFj8pRA_yTPDtZcaf=kLE-PD-AZg19s8_OKFd5AwLt+EcgA68g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr(dot)shulgin(at)zalando(dot)de>
:

> On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <
> oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>
>> On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <
>> tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>>>
>>>> Now the backend that has been signaled on the second call to
>>>> pg_cmdstatus (it can be either some other backend, or the backend B
>>>> again) will not find an unprocessed slot, thus it will not try to
>>>> attach/detach the queue and the backend A will block forever.
>>>>
>>>> This requires a really bad timing and the user should be able to
>>>> interrupt the querying backend A still.
>>>>
>>>
>>> I think we can't rely on the low probability that this won't happen, and
>>> we should not rely on people interrupting the backend. Being able to detect
>>> the situation and fail gracefully should be possible.
>>>
>>> It may be possible to introduce some lock-less protocol preventing such
>>> situations, but it's not there at the moment. If you believe it's possible,
>>> you need to explain and "prove" that it's actually safe.
>>>
>>> Otherwise we may need to introduce some basic locking - for example we
>>> may introduce a LWLock for each slot, and lock it with dontWait=true (and
>>> skip it if we couldn't lock it). This should prevent most scenarios where
>>> one corrupted slot blocks many processes.
>>
>>
>> OK, I will revisit this part then.
>>
>
> I have a radical proposal to remove the need for locking: make the
> CmdStatusSlot struct consist of a mere dsm_handle and move all the required
> metadata like sender_pid, request_type, etc. into the shared memory segment
> itself.
>
> If we allow the only the requesting process to update the slot (that is
> the handle value itself) this removes the need for locking between sender
> and receiver.
>
> The sender will walk through the slots looking for a non-zero dsm handle
> (according to dsm_create() implementation 0 is considered an invalid
> handle), and if it finds a valid one, it will attach and look inside, to
> check if it's destined for this process ID. At first that might sound
> strange, but I would expect 99% of the time that the only valid slot would
> be for the process that has been just signaled.
>
> The sender process will then calculate the response message, update the
> result_code in the shared memory segment and finally send the message
> through the queue. If the receiver has since detached we get a detached
> result code and bail out.
>
> Clearing the slot after receiving the message should be the requesting
> process' responsibility. This way the receiver only writes to the slot and
> the sender only reads from it.
>
> By the way, is it safe to assume atomic read/writes of dsm_handle
> (uint32)? I would be surprised if not.
>

I don't see any reason why it should not to work - only few processes will
wait for data - so lost attach/detach shm operations will not be too much.

Pavel

>
> --
> Alex
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2015-09-14 17:29:36 jsonb_set array append hack?
Previous Message Jim Nasby 2015-09-14 17:23:29 Re: Attach comments to functions' parameters and return value