Re: [HACKERS] Optional message to user when terminating/cancelling backend

From: Andres Freund <andres(at)anarazel(dot)de>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>, Onder Kalaci <onder(at)citusdata(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Optional message to user when terminating/cancelling backend
Date: 2018-06-13 19:16:21
Message-ID: 20180613191621.dl5ko3v5gjltxsbj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Onder, I CCed you because it seems worthwhile to ensure that the
relevant Citus code could use this instead of the gross hack you and I
committed...

On 2018-06-13 20:54:03 +0200, Daniel Gustafsson wrote:
> > On 9 Apr 2018, at 23:55, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2017-06-20 13:01:35 -0700, Andres Freund wrote:
> >> For extensions it'd also be useful if it'd be possible to overwrite the
> >> error code. E.g. for citus there's a distributed deadlock detector,
> >> running out of process because there's no way to interrupt lock waits
> >> locally, and we've to do some ugly hacking to generate proper error
> >> messages and code from another session.
> >
> > What happened to this request? Seems we're out of the crunch mode and
> > could round the feature out a littlebit more…
>
> Revisiting old patches, I took a stab at this request.
>
> Since I don’t really have a use case for altering the sqlerrcode other than the
> on that Citus.. cited, I modelled the API around that.

Cool. Onder?

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index b851fe023a..ad9763698f 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -18540,7 +18540,7 @@ SELECT set_config('log_statement_stats', 'off', false);
> <tbody>
> <row>
> <entry>
> - <literal><function>pg_cancel_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
> + <literal><function>pg_cancel_backend(<parameter>pid</parameter> <type>int</type> [, <parameter>message</parameter> <type>text</type>])</function></literal>
> </entry>
> <entry><type>boolean</type></entry>
> <entry>Cancel a backend's current query. This is also allowed if the
> @@ -18565,7 +18565,7 @@ SELECT set_config('log_statement_stats', 'off', false);
> </row>
> <row>
> <entry>
> - <literal><function>pg_terminate_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
> + <literal><function>pg_terminate_backend(<parameter>pid</parameter> <type>int</type> [, <parameter>message</parameter> <type>text</type>])</function></literal>
> </entry>
> <entry><type>boolean</type></entry>
> <entry>Terminate a backend. This is also allowed if the calling role
> @@ -18596,6 +18596,14 @@ SELECT set_config('log_statement_stats', 'off', false);
> The role of an active backend can be found from the
> <structfield>usename</structfield> column of the
> <structname>pg_stat_activity</structname> view.
> + If the optional <literal>message</literal> parameter is set, the text
> + will be appended to the error message returned to the signalled backend.
> + <literal>message</literal> is limited to 128 bytes, any longer text
> + will be truncated. An example where we cancel our own backend:
> +<programlisting>
> +postgres=# SELECT pg_cancel_backend(pg_backend_pid(), 'Cancellation message text');
> +ERROR: canceling statement due to user request: Cancellation message text
> +</programlisting>
> </para>

I'm not sure I really like the appending bit. There's a security
argument to be made about doing so, but from a user POV that mostly
seems restrictive. I wonder if that aspect would be better handled by
adding an error context that contains information about which process
did the cancellation (or DETAIL?)?

> +/*
> + * Structure for registering a feedback payload to be sent to a cancelled, or
> + * terminated backend. Each backend is registered per pid in the array which is
> + * indexed by Backend ID. Reading and writing the message is protected by a
> + * per-slot spinlock.
> + */
> +typedef struct
> +{
> + pid_t pid;

This is the pid of the receiving process? If so, why do we need this in
here? That's just duplicated data, no?

> + slock_t mutex;
> + char message[MAX_CANCEL_MSG];
> + int len;
> + int sqlerrcode;

I'd vote for including the pid of the process that did the cancelling in
here.

> +Datum
> +pg_cancel_backend(PG_FUNCTION_ARGS)
> +{
> + PG_RETURN_BOOL(pg_cancel_backend_internal(PG_GETARG_INT32(0), NULL));
> +}
> +
> +Datum
> +pg_cancel_backend_msg(PG_FUNCTION_ARGS)
> +{
> + pid_t pid;
> + char *msg = NULL;
> +
> + if (PG_ARGISNULL(0))
> + PG_RETURN_NULL();
> +
> + pid = PG_GETARG_INT32(0);
> +
> + if (PG_NARGS() == 2 && !PG_ARGISNULL(1))
> + msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> + PG_RETURN_BOOL(pg_cancel_backend_internal(pid, msg));
> +}

Why isn't this just one function? Given that you already have a PG_NARGS
check, I don't quite see the point?

> /*
> * Signal to terminate a backend process. This is allowed if you are a member
> * of the role whose process is being terminated.
> *
> * Note that only superusers can signal superuser-owned processes.
> */
> -Datum
> -pg_terminate_backend(PG_FUNCTION_ARGS)
> +static bool
> +pg_terminate_backend_internal(pid_t pid, char *msg)
> {
> - int r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
> + int r = pg_signal_backend(pid, SIGTERM, msg);
>
> if (r == SIGNAL_BACKEND_NOSUPERUSER)
> ereport(ERROR,
> @@ -146,7 +203,30 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))));
>
> - PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
> + return (r == SIGNAL_BACKEND_SUCCESS);
> +}
> +
> +Datum
> +pg_terminate_backend(PG_FUNCTION_ARGS)
> +{
> + PG_RETURN_BOOL(pg_terminate_backend_internal(PG_GETARG_INT32(0), NULL));
> +}
> +
> +Datum
> +pg_terminate_backend_msg(PG_FUNCTION_ARGS)
> +{
> + pid_t pid;
> + char *msg = NULL;
> +
> + if (PG_ARGISNULL(0))
> + PG_RETURN_NULL();
> +
> + pid = PG_GETARG_INT32(0);
> +
> + if (PG_NARGS() == 2 && !PG_ARGISNULL(1))
> + msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> + PG_RETURN_BOOL(pg_terminate_backend_internal(pid, msg));
> }
>
> /*
> @@ -168,7 +248,6 @@ pg_reload_conf(PG_FUNCTION_ARGS)
> PG_RETURN_BOOL(true);
> }

Same.

> +/*
> + * Set a message for the cancellation of the backend with the specified pid,
> + * using the default sqlerrcode.
> + */
> +int
> +SetBackendCancelMessage(pid_t backend_pid, char *message)
> +{
> + return backend_feedback(backend_pid, message, ERRCODE_QUERY_CANCELED);
> +}
> +
> +/*
> + * Set a message for the termination of the backend with the specified pid,
> + * using the default sqlerrcode.
> + */
> +int
> +SetBackendTerminationMessage(pid_t backend_pid, char *message)
> +{
> + return backend_feedback(backend_pid, message, ERRCODE_ADMIN_SHUTDOWN);
> +}
> +
> +/*
> + * Set both a message and a sqlerrcode for use when signalling the backend
> + * with the specified pid.
> + */
> +int
> +SetBackendSignalFeedback(pid_t backend_pid, char *message, int sqlerrcode)
> +{
> + return backend_feedback(backend_pid, message, sqlerrcode);
> +}

I'm not quite seeing the point of these variants. What advantage are
they over just doing the same on the caller level?

> +/*
> + * Sets a cancellation message for the backend with the specified pid, and
> + * returns the length of the message actually created. If the returned length
> + * is less than the length of the message parameter, truncation has occurred.
> + * If the backend isn't found, -1 is returned. If no message is passed, zero is
> + * returned. If two backends collide in setting a message, the existing message
> + * will be overwritten by the last one in.
> + */
> +static int
> +backend_feedback(pid_t backend_pid, char *message, int sqlerrcode)
> +{
> + int i;
> + int len;
> +
> + if (!message)
> + return 1;
> +
> + len = pg_mbcliplen(message, strlen(message), MAX_CANCEL_MSG - 1);
> +
> + for (i = 0; i < MaxBackends; i++)
> + {
> + BackendSignalFeedbackShmemStruct *slot = &BackendSignalFeedbackSlots[i];
> +
> + if (slot->pid != 0 && slot->pid == backend_pid)
> + {
> + SpinLockAcquire(&slot->mutex);
> + if (slot->pid != backend_pid)
> + {
> + SpinLockRelease(&slot->mutex);
> + goto error;
> + }
> +
> + MemSet(slot->message, '\0', sizeof(slot->message));
> + memcpy(slot->message, message, len);

This seems unnecessarily expensive.

> + slot->len = len;
> + slot->sqlerrcode = sqlerrcode;
> + SpinLockRelease(&slot->mutex);
> +
> + if (len != strlen(message))
> + ereport(NOTICE,
> + (errmsg("message is too long and has been truncated")));
> + return 0;
> + }
> + }

So we made cancellation a O(N) proposition :/. Probably not too bad, but
still...

> +error:
> +
> + elog(LOG, "Cancellation message requested for missing backend %d by %d",
> + (int) backend_pid, MyProcPid);
> +
> + return 1;
> +}

So we now do log spam if processes exit in the wrong moment?

> +/*
> + * Test whether there is feedback registered for the current backend that can
> + * be consumed and presented to the user.
> + */
> +bool
> +HasBackendSignalFeedback(void)
> +{
> + volatile BackendSignalFeedbackShmemStruct *slot = MyCancelSlot;
> + bool has_message = false;
> +
> + if (slot != NULL)
> + {
> + SpinLockAcquire(&slot->mutex);
> + has_message = ((slot->len > 0) && (slot->sqlerrcode != 0));
> + SpinLockRelease(&slot->mutex);
> + }
> +
> + return has_message;
> +}

I'm somewhat uncomfortable with acquiring a mutex in an error path. We
used to prcoess at least some interrupts in signal handlers in some
cases - I think I removed all of them, but if not, we'd be in deep
trouble here. Wonder if we shouldn't try to get around needing that...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2018-06-13 19:42:23 Re: Spilling hashed SetOps and aggregates to disk
Previous Message Daniel Gustafsson 2018-06-13 18:54:03 Re: [HACKERS] Optional message to user when terminating/cancelling backend