Re: On-demand running query plans using auto_explain and signals

From: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: On-demand running query plans using auto_explain and signals
Date: 2015-09-16 14:31:47
Message-ID: CACACo5SixNu1UQyfOvZsDiMMcT3HkeT2_a-vdG3SE-OWWzLbzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <
oleksandr(dot)shulgin(at)zalando(dot)de> wrote:

> On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>>
>> 2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr(dot)shulgin(at)zalando(dot)de>:
>>
>>>
>>> ... This way the receiver only writes to the slot and the sender only
>>> reads from it.
>>>
>>> By the way, is it safe to assume atomic read/writes of dsm_handle
>>> (uint32)? I would be surprised if not.
>>>
>>
>> I don't see any reason why it should not to work - only few processes
>> will wait for data - so lost attach/detach shm operations will not be too
>> much.
>>
>
> Please see attached for implementation of this approach. The most
> surprising thing is that it actually works :)
>
> One problem still remains with the process requesting information when the
> target process exits before it can have a chance to handle the signal. The
> requesting process then waits forever, because nobody attaches/detaches the
> queue. We've discussed this before and looks like we need to introduce a
> timeout parameter to pg_cmdstatus()...
>

I've added the timeout parameter to the pg_cmdstatus call, and more
importantly to the sending side of the queue, so that one can limit the
potential effect of handling the interrupt in case something goes really
wrong.

I've tested a number of possible scenarios with artificial delays in reply
and sending cancel request or SIGTERM to the receiving side and this is all
seems to work nicely due to proper message queue detach event
notification. Still I think it helps to know that there is a timeout in
case the receiving side is really slow to read the message.

I've also decided we really ought to suppress any possible ERROR level
messages generated during the course of processing the status request in
order not to prevent the originally running transaction to complete. The
errors so caught are just logged using LOG level and ignored in this new
version of the patch.

I'm also submitting the instrumentation support as a separate patch on top
of this. I'm not really fond of adding bool parameter to InstrEndLoop, but
for now I didn't find any better way.

What I'm now thinking about is probably we can extend this backend
communication mechanism in order to query GUC values effective in a
different backend or even setting the values. The obvious candidate to
check when you see some query misbehaving would be work_mem, for example.
Or we could provide a report of all settings that were overridden in the
backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

The obvious candidates to be set externally are the
cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
you'd rather do that carefully for a single backend than globally
per-cluster. One can still modify the postgresql.conf and then send SIGHUP
to just a single backend, but I think a more direct way to alter the
settings would be better.

In this light should we rename the API to something like "backend control"
instead of "command status"? The SHOW/SET syntax could be extended to
support the remote setting retrieval/update.

--
Alex

Attachment Content-Type Size
0002-explain-instrument-pid-v6.patch text/x-patch 14.0 KB
0001-explain-pid-v6.patch text/x-patch 29.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-09-16 14:32:54 Re: Inaccurate results from numeric ln(), log(), exp() and pow()
Previous Message Teodor Sigaev 2015-09-16 14:28:08 Re: pgbench progress with timestamp