From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Daniel Gustafsson <daniel(at)yesql(dot)se>, Onder Kalaci <onder(at)citusdata(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Optional message to user when terminating/cancelling backend |
Date: | 2018-06-13 19:16:21 |
Message-ID: | 20180613191621.dl5ko3v5gjltxsbj@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Onder, I CCed you because it seems worthwhile to ensure that the
relevant Citus code could use this instead of the gross hack you and I
committed...
On 2018-06-13 20:54:03 +0200, Daniel Gustafsson wrote:
> > On 9 Apr 2018, at 23:55, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2017-06-20 13:01:35 -0700, Andres Freund wrote:
> >> For extensions it'd also be useful if it'd be possible to overwrite the
> >> error code. E.g. for citus there's a distributed deadlock detector,
> >> running out of process because there's no way to interrupt lock waits
> >> locally, and we've to do some ugly hacking to generate proper error
> >> messages and code from another session.
> >
> > What happened to this request? Seems we're out of the crunch mode and
> > could round the feature out a littlebit more…
>
> Revisiting old patches, I took a stab at this request.
>
> Since I don’t really have a use case for altering the sqlerrcode other than the
> on that Citus.. cited, I modelled the API around that.
Cool. Onder?
> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index b851fe023a..ad9763698f 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -18540,7 +18540,7 @@ SELECT set_config('log_statement_stats', 'off', false);
> <tbody>
> <row>
> <entry>
> - <literal><function>pg_cancel_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
> + <literal><function>pg_cancel_backend(<parameter>pid</parameter> <type>int</type> [, <parameter>message</parameter> <type>text</type>])</function></literal>
> </entry>
> <entry><type>boolean</type></entry>
> <entry>Cancel a backend's current query. This is also allowed if the
> @@ -18565,7 +18565,7 @@ SELECT set_config('log_statement_stats', 'off', false);
> </row>
> <row>
> <entry>
> - <literal><function>pg_terminate_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
> + <literal><function>pg_terminate_backend(<parameter>pid</parameter> <type>int</type> [, <parameter>message</parameter> <type>text</type>])</function></literal>
> </entry>
> <entry><type>boolean</type></entry>
> <entry>Terminate a backend. This is also allowed if the calling role
> @@ -18596,6 +18596,14 @@ SELECT set_config('log_statement_stats', 'off', false);
> The role of an active backend can be found from the
> <structfield>usename</structfield> column of the
> <structname>pg_stat_activity</structname> view.
> + If the optional <literal>message</literal> parameter is set, the text
> + will be appended to the error message returned to the signalled backend.
> + <literal>message</literal> is limited to 128 bytes, any longer text
> + will be truncated. An example where we cancel our own backend:
> +<programlisting>
> +postgres=# SELECT pg_cancel_backend(pg_backend_pid(), 'Cancellation message text');
> +ERROR: canceling statement due to user request: Cancellation message text
> +</programlisting>
> </para>
I'm not sure I really like the appending bit. There's a security
argument to be made about doing so, but from a user POV that mostly
seems restrictive. I wonder if that aspect would be better handled by
adding an error context that contains information about which process
did the cancellation (or DETAIL?)?
> +/*
> + * Structure for registering a feedback payload to be sent to a cancelled, or
> + * terminated backend. Each backend is registered per pid in the array which is
> + * indexed by Backend ID. Reading and writing the message is protected by a
> + * per-slot spinlock.
> + */
> +typedef struct
> +{
> + pid_t pid;
This is the pid of the receiving process? If so, why do we need this in
here? That's just duplicated data, no?
> + slock_t mutex;
> + char message[MAX_CANCEL_MSG];
> + int len;
> + int sqlerrcode;
I'd vote for including the pid of the process that did the cancelling in
here.
> +Datum
> +pg_cancel_backend(PG_FUNCTION_ARGS)
> +{
> + PG_RETURN_BOOL(pg_cancel_backend_internal(PG_GETARG_INT32(0), NULL));
> +}
> +
> +Datum
> +pg_cancel_backend_msg(PG_FUNCTION_ARGS)
> +{
> + pid_t pid;
> + char *msg = NULL;
> +
> + if (PG_ARGISNULL(0))
> + PG_RETURN_NULL();
> +
> + pid = PG_GETARG_INT32(0);
> +
> + if (PG_NARGS() == 2 && !PG_ARGISNULL(1))
> + msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> + PG_RETURN_BOOL(pg_cancel_backend_internal(pid, msg));
> +}
Why isn't this just one function? Given that you already have a PG_NARGS
check, I don't quite see the point?
> /*
> * Signal to terminate a backend process. This is allowed if you are a member
> * of the role whose process is being terminated.
> *
> * Note that only superusers can signal superuser-owned processes.
> */
> -Datum
> -pg_terminate_backend(PG_FUNCTION_ARGS)
> +static bool
> +pg_terminate_backend_internal(pid_t pid, char *msg)
> {
> - int r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
> + int r = pg_signal_backend(pid, SIGTERM, msg);
>
> if (r == SIGNAL_BACKEND_NOSUPERUSER)
> ereport(ERROR,
> @@ -146,7 +203,30 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> (errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend"))));
>
> - PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
> + return (r == SIGNAL_BACKEND_SUCCESS);
> +}
> +
> +Datum
> +pg_terminate_backend(PG_FUNCTION_ARGS)
> +{
> + PG_RETURN_BOOL(pg_terminate_backend_internal(PG_GETARG_INT32(0), NULL));
> +}
> +
> +Datum
> +pg_terminate_backend_msg(PG_FUNCTION_ARGS)
> +{
> + pid_t pid;
> + char *msg = NULL;
> +
> + if (PG_ARGISNULL(0))
> + PG_RETURN_NULL();
> +
> + pid = PG_GETARG_INT32(0);
> +
> + if (PG_NARGS() == 2 && !PG_ARGISNULL(1))
> + msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> + PG_RETURN_BOOL(pg_terminate_backend_internal(pid, msg));
> }
>
> /*
> @@ -168,7 +248,6 @@ pg_reload_conf(PG_FUNCTION_ARGS)
> PG_RETURN_BOOL(true);
> }
Same.
> +/*
> + * Set a message for the cancellation of the backend with the specified pid,
> + * using the default sqlerrcode.
> + */
> +int
> +SetBackendCancelMessage(pid_t backend_pid, char *message)
> +{
> + return backend_feedback(backend_pid, message, ERRCODE_QUERY_CANCELED);
> +}
> +
> +/*
> + * Set a message for the termination of the backend with the specified pid,
> + * using the default sqlerrcode.
> + */
> +int
> +SetBackendTerminationMessage(pid_t backend_pid, char *message)
> +{
> + return backend_feedback(backend_pid, message, ERRCODE_ADMIN_SHUTDOWN);
> +}
> +
> +/*
> + * Set both a message and a sqlerrcode for use when signalling the backend
> + * with the specified pid.
> + */
> +int
> +SetBackendSignalFeedback(pid_t backend_pid, char *message, int sqlerrcode)
> +{
> + return backend_feedback(backend_pid, message, sqlerrcode);
> +}
I'm not quite seeing the point of these variants. What advantage are
they over just doing the same on the caller level?
> +/*
> + * Sets a cancellation message for the backend with the specified pid, and
> + * returns the length of the message actually created. If the returned length
> + * is less than the length of the message parameter, truncation has occurred.
> + * If the backend isn't found, -1 is returned. If no message is passed, zero is
> + * returned. If two backends collide in setting a message, the existing message
> + * will be overwritten by the last one in.
> + */
> +static int
> +backend_feedback(pid_t backend_pid, char *message, int sqlerrcode)
> +{
> + int i;
> + int len;
> +
> + if (!message)
> + return 1;
> +
> + len = pg_mbcliplen(message, strlen(message), MAX_CANCEL_MSG - 1);
> +
> + for (i = 0; i < MaxBackends; i++)
> + {
> + BackendSignalFeedbackShmemStruct *slot = &BackendSignalFeedbackSlots[i];
> +
> + if (slot->pid != 0 && slot->pid == backend_pid)
> + {
> + SpinLockAcquire(&slot->mutex);
> + if (slot->pid != backend_pid)
> + {
> + SpinLockRelease(&slot->mutex);
> + goto error;
> + }
> +
> + MemSet(slot->message, '\0', sizeof(slot->message));
> + memcpy(slot->message, message, len);
This seems unnecessarily expensive.
> + slot->len = len;
> + slot->sqlerrcode = sqlerrcode;
> + SpinLockRelease(&slot->mutex);
> +
> + if (len != strlen(message))
> + ereport(NOTICE,
> + (errmsg("message is too long and has been truncated")));
> + return 0;
> + }
> + }
So we made cancellation a O(N) proposition :/. Probably not too bad, but
still...
> +error:
> +
> + elog(LOG, "Cancellation message requested for missing backend %d by %d",
> + (int) backend_pid, MyProcPid);
> +
> + return 1;
> +}
So we now do log spam if processes exit in the wrong moment?
> +/*
> + * Test whether there is feedback registered for the current backend that can
> + * be consumed and presented to the user.
> + */
> +bool
> +HasBackendSignalFeedback(void)
> +{
> + volatile BackendSignalFeedbackShmemStruct *slot = MyCancelSlot;
> + bool has_message = false;
> +
> + if (slot != NULL)
> + {
> + SpinLockAcquire(&slot->mutex);
> + has_message = ((slot->len > 0) && (slot->sqlerrcode != 0));
> + SpinLockRelease(&slot->mutex);
> + }
> +
> + return has_message;
> +}
I'm somewhat uncomfortable with acquiring a mutex in an error path. We
used to prcoess at least some interrupts in signal handlers in some
cases - I think I removed all of them, but if not, we'd be in deep
trouble here. Wonder if we shouldn't try to get around needing that...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2018-06-13 19:42:23 | Re: Spilling hashed SetOps and aggregates to disk |
Previous Message | Daniel Gustafsson | 2018-06-13 18:54:03 | Re: [HACKERS] Optional message to user when terminating/cancelling backend |