Re: [HACKERS] Optional message to user when terminating/cancelling backend

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Onder Kalaci <onder(at)citusdata(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Optional message to user when terminating/cancelling backend
Date: 2018-06-14 21:56:05
Message-ID: 33159DF2-22F1-40B0-A7B4-ACFC7748B8B0@yesql.se
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

> On 13 Jun 2018, at 21:16, Andres Freund <andres(at)anarazel(dot)de> wrote:

Thanks for the review!

> I'm not sure I really like the appending bit. There's a security
> argument to be made about doing so, but from a user POV that mostly
> seems restrictive. I wonder if that aspect would be better handled by
> adding an error context that contains information about which process
> did the cancellation (or DETAIL?)?

That doesn’t sound like a bad idea at all, I took a stab at that in the
attached patch to see what it would look like. Is that along the lines of what
you had in mind?

It does however make testing harder as the .out file can’t have the matching
pid (commented out the test in the attached hoping that there is a way to test
it that I fail to see).

>> +/*
>> + * Structure for registering a feedback payload to be sent to a cancelled, or
>> + * terminated backend. Each backend is registered per pid in the array which is
>> + * indexed by Backend ID. Reading and writing the message is protected by a
>> + * per-slot spinlock.
>> + */
>> +typedef struct
>> +{
>> + pid_t pid;
>
> This is the pid of the receiving process? If so, why do we need this in
> here? That's just duplicated data, no?

Correct, we need that for finding the right slot in backend_feedback() unless
I’m missing something?

>> + slock_t mutex;
>> + char message[MAX_CANCEL_MSG];
>> + int len;
>> + int sqlerrcode;
>
> I'd vote for including the pid of the process that did the cancelling in
> here.

Did that for the above discussed log message change.

>> [ .. ]
>
> Why isn't this just one function? Given that you already have a PG_NARGS
> check, I don't quite see the point?

I was trying to retain STRICT on pg_cancel_backend(int4) for no food reason,
and I’m not entirely sure what the reason was anymore. Fixed in the attached
version.

>> [ .. ]
>
> I'm not quite seeing the point of these variants. What advantage are
> they over just doing the same on the caller level?

It seemed a cleaner API to provide these for when the caller just want to
provide messaging (which is where I started this a long time ago). I’m not wed
to the idea at all though, they are clearly just for convenience.

>> + MemSet(slot->message, '\0', sizeof(slot->message));
>> + memcpy(slot->message, message, len);
>
> This seems unnecessarily expensive.

My goal was to safeguard against the risk of fragments from an undelivered
message being delivered to another backend, which might be overly cautious.
Using strlcpy() instead to guarantee termination is perhaps enough?

>> + slot->len = len;
>> + slot->sqlerrcode = sqlerrcode;
>> + SpinLockRelease(&slot->mutex);
>> +
>> + if (len != strlen(message))
>> + ereport(NOTICE,
>> + (errmsg("message is too long and has been truncated")));
>> + return 0;
>> + }
>> + }
>
> So we made cancellation a O(N) proposition :/. Probably not too bad, but
> still…

When setting a message to be passed to the backend yes, with an upper bound of
MaxBackends.

>> [ .. ]
>
> So we now do log spam if processes exit in the wrong moment?

Yeah, that was rather pointless. Removed in the attached version.

> I'm somewhat uncomfortable with acquiring a mutex in an error path. We
> used to prcoess at least some interrupts in signal handlers in some
> cases - I think I removed all of them, but if not, we'd be in deep
> trouble here. Wonder if we shouldn't try to get around needing that…

I’m all ears if you have a better idea, this was the only option I could come
up with.

cheers ./daniel

Attachment Content-Type Size
0001-Refactor-backend-signalling-code-v12.patch application/octet-stream 14.5 KB
0002-Support-optional-message-in-backend-cancel-terminate-v12.patch application/octet-stream 22.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message R, Siva 2018-06-14 22:55:19 Re: Duplicate Item Pointers in Gin index
Previous Message Robert Haas 2018-06-14 21:26:33 Re: BUG #15237: I got "ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"