Re: Error-safe user functions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Joe Conway <mail(at)joeconway(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Error-safe user functions
Date: 2022-12-05 23:47:34
Message-ID: 20221205234734.sab3fuxy6q2sokjd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-12-05 16:40:06 -0500, Tom Lane wrote:
> +/*
> + * errsave_start --- begin a "safe" error-reporting cycle
> + *
> + * If "context" isn't an ErrorSaveContext node, this behaves as
> + * errstart(ERROR, domain), and the errsave() macro ends up acting
> + * exactly like ereport(ERROR, ...).
> + *
> + * If "context" is an ErrorSaveContext node, but the node creator only wants
> + * notification of the fact of a safe error without any details, just set
> + * the error_occurred flag in the ErrorSaveContext node and return false,
> + * which will cause us to skip the remaining error processing steps.
> + *
> + * Otherwise, create and initialize error stack entry and return true.
> + * Subsequently, errmsg() and perhaps other routines will be called to further
> + * populate the stack entry. Finally, errsave_finish() will be called to
> + * tidy up.
> + */
> +bool
> +errsave_start(void *context, const char *domain)

Why is context a void *?

> +{
> + ErrorSaveContext *escontext;
> + ErrorData *edata;
> +
> + /*
> + * Do we have a context for safe error reporting? If not, just punt to
> + * errstart().
> + */
> + if (context == NULL || !IsA(context, ErrorSaveContext))
> + return errstart(ERROR, domain);

I don't think we should "accept" !IsA(context, ErrorSaveContext) - that
seems likely to hide things like use-after-free.

> + if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
> + {
> + /*
> + * Wups, stack not big enough. We treat this as a PANIC condition
> + * because it suggests an infinite loop of errors during error
> + * recovery.
> + */
> + errordata_stack_depth = -1; /* make room on stack */
> + ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded")));
> + }

This is the fourth copy of this code...

> +/*
> + * errsave_finish --- end a "safe" error-reporting cycle
> + *
> + * If errsave_start() decided this was a regular error, behave as
> + * errfinish(). Otherwise, package up the error details and save
> + * them in the ErrorSaveContext node.
> + */
> +void
> +errsave_finish(void *context, const char *filename, int lineno,
> + const char *funcname)
> +{
> + ErrorSaveContext *escontext = (ErrorSaveContext *) context;
> + ErrorData *edata = &errordata[errordata_stack_depth];
> +
> + /* verify stack depth before accessing *edata */
> + CHECK_STACK_DEPTH();
> +
> + /*
> + * If errsave_start punted to errstart, then elevel will be ERROR or
> + * perhaps even PANIC. Punt likewise to errfinish.
> + */
> + if (edata->elevel >= ERROR)
> + errfinish(filename, lineno, funcname);

I'd put a pg_unreachable() or such after the errfinish() call.

> + /*
> + * Else, we should package up the stack entry contents and deliver them to
> + * the caller.
> + */
> + recursion_depth++;
> +
> + /* Save the last few bits of error state into the stack entry */
> + if (filename)
> + {
> + const char *slash;
> +
> + /* keep only base name, useful especially for vpath builds */
> + slash = strrchr(filename, '/');
> + if (slash)
> + filename = slash + 1;
> + /* Some Windows compilers use backslashes in __FILE__ strings */
> + slash = strrchr(filename, '\\');
> + if (slash)
> + filename = slash + 1;
> + }
> +
> + edata->filename = filename;
> + edata->lineno = lineno;
> + edata->funcname = funcname;
> + edata->elevel = ERROR; /* hide the LOG value used above */
> +
> + /*
> + * We skip calling backtrace and context functions, which are more likely
> + * to cause trouble than provide useful context; they might act on the
> + * assumption that a transaction abort is about to occur.
> + */

This seems like a fair bit of duplicated code.

> + * This is the same as InputFunctionCall, but the caller may also pass a
> + * previously-initialized ErrorSaveContext node. (We declare that as
> + * "void *" to avoid including miscnodes.h in fmgr.h.)

It seems way cleaner to forward declare ErrorSaveContext instead of
using void *.

> If escontext points
> + * to an ErrorSaveContext, any "safe" errors detected by the input function
> + * will be reported by filling the escontext struct. The caller must
> + * check escontext->error_occurred before assuming that the function result
> + * is meaningful.

I wonder if we shouldn't instead make InputFunctionCallSafe() return a
boolean and return the Datum via a pointer. As callers are otherwise
going to need to do SAFE_ERROR_OCCURRED(escontext) themselves, I think
it should also lead to more concise (and slightly more efficient) code.

> +Datum
> +InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
> + Oid typioparam, int32 typmod,
> + void *escontext)

Is there a reason not to provide this infrastructure for
ReceiveFunctionCall() as well?

Not that I have a suggestion for a better name, but I don't particularly
like "Safe" denoting non-erroring input function calls. There's too many
interpretations of safe - e.g. safe against privilege escalation issues
or such.

> @@ -252,10 +254,13 @@ record_in(PG_FUNCTION_ARGS)
> column_info->column_type = column_type;
> }
>
> - values[i] = InputFunctionCall(&column_info->proc,
> - column_data,
> - column_info->typioparam,
> - att->atttypmod);
> + values[i] = InputFunctionCallSafe(&column_info->proc,
> + column_data,
> + column_info->typioparam,
> + att->atttypmod,
> + escontext);
> + if (SAFE_ERROR_OCCURRED(escontext))
> + PG_RETURN_NULL();

It doesn't *quite* seem right to set ->isnull in case of an error. Not
that it has an obvious harm.

Wonder if it's perhaps worth to add VALGRIND_MAKE_MEM_UNDEFINED() calls
to InputFunctionCallSafe() to more easily detect cases where a caller
ignores that an error occured.

> + if (safe_mode)
> + {
> + ErrorSaveContext *es_context = cstate->es_context;
> +
> + /* Must reset the error_occurred flag each time */
> + es_context->error_occurred = false;

I'd put that into the if (es_context->error_occurred) path. Likely the
window for store-forwarding issues is smaller than
InputFunctionCallSafe(), but it's trivial to write it differently...

> diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
> index 285022e07c..ff77d27cfc 100644
> --- a/src/test/regress/sql/copy.sql
> +++ b/src/test/regress/sql/copy.sql
> @@ -268,3 +268,23 @@ a c b
>
> SELECT * FROM header_copytest ORDER BY a;
> drop table header_copytest;
> +
> +-- "safe" error handling
> +create table on_error_copytest(i int, b bool, ai int[]);
> +
> +copy on_error_copytest from stdin with (null_on_error);
> +1 a {1,}
> +err 1 {x}
> +2 f {3,4}
> +bad x {,
> +\.
> +
> +copy on_error_copytest from stdin with (warn_on_error);
> +3 0 [3:4]={3,4}
> +4 b [0:1000]={3,4}
> +err t {}
> +bad z {"zed"}
> +\.
> +
> +select * from on_error_copytest;
> +drop table on_error_copytest;

Think it'd be good to have a test for a composite type where one of the
columns safely errors out and the other doesn't.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-12-05 23:48:45 Re: [PATCH] Add native windows on arm64 support
Previous Message Andrey Borodin 2022-12-05 23:41:29 Re: Transaction timeout