Re: logical replication worker accesses catalogs in error context callback

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: logical replication worker accesses catalogs in error context callback
Date: 2021-01-26 05:58:55
Message-ID: CAD21AoByGFPCjNZim9XB85vfzW_vZwCM4zDF-HiUPoACo=GHSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 12, 2021 at 6:33 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Jan 12, 2021 at 9:40 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 11, 2021 at 4:46 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jan 11, 2021 at 10:56 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > Agreed. Attached the updated patch.
> > >
> > > Thanks for the updated patch. Looks like the comment crosses the 80
> > > char limit, I have corrected it. And also changed the variable name
> > > from remotetypeid to remotetypid, so that logicalrep_typmap_gettypname
> > > will not cross the 80 char limit. And also added a commit message.
> > > Attaching v3 patch, please have a look. Both make check and make
> > > check-world passes.
> >
> > Thanks! The change looks good to me.
>
> Thanks!
>
> > > > > I quickly searched in places where error callbacks are being used, I
> > > > > think we need a similar kind of fix for conversion_error_callback in
> > > > > postgres_fdw.c, because get_attname and get_rel_name are being used
> > > > > which do catalogue lookups. ISTM that all the other error callbacks
> > > > > are good in the sense that they are not doing sys catalogue lookups.
> > > >
> > > > Indeed. If we need to disallow the catalog lookup during executing
> > > > error callbacks we might want to have an assertion checking that in
> > > > SearchCatCacheInternal(), in addition to Assert(IsTransactionState()).
> > >
> > > I tried to add(as attached in
> > > v1-0001-Avoid-Sys-Cache-Lookups-in-Error-Callbacks.patch) the
> > > Assert(!error_context_stack); in SearchCatCacheInternal, initdb itself
> > > fails [1]. That means, we are doing a bunch of catalogue access from
> > > error context callbacks. Given this, I'm not quite sure whether we can
> > > have such an assertion in SearchCatCacheInternal.
> >
> > I think checking !error_context_stack is not a correct check if we're
> > executing an error context callback since it's a stack to store
> > callbacks. It can be non-NULL by setting an error callback, see
> > setup_parser_errposition_callback() for instance.
>
> Thanks! Yes, you are right, even though we are not processing the
> actual error callback, the error_context_stack can be non-null, hence
> the Assert(!error_context_stack); doesn't make any sense.
>
> > Probably we need to check if (recursion_depth > 0) and elevel.
> > Attached a patch for that as an example.
>
> IIUC, we must be knowing in SearchCatCacheInternal, whether errstart
> is called with elevel >= ERROR and we have recursion_depth > 0. If
> both are true, then the assertion in SearchCatCacheInternal should
> fail. I see that in your patch, elevel is being fetched from
> errordata, that's fine. What I'm not quite clear is the following
> assumption:
>
> + /* If we doesn't set any error data yet, assume it's an error */
> + if (errordata_stack_depth == -1)
> + return true;
>
> Is it always true that we are in error processing when
> errordata_stack_depth is -1, what happens if errordata_stack_depth <
> -1? Maybe I'm missing something.

You're right. I missed something.

>
> IMHO, adding an assertion in SearchCatCacheInternal(which is a most
> generic code part within the server) with a few error context global
> variables may not be always safe. Because we might miss using the
> error context variables properly. Instead, we could add a comment in
> ErrorContextCallback structure saying something like, "it's not
> recommended to access any system catalogues within an error context
> callback when the callback is expected to be called while processing
> an error, because the transaction might have been broken in that
> case." And let the future callback developers take care of it.
>
> Thoughts?

That sounds good to me. But I still also see the value to add an
assertion in SearchCatCacheInternal(). If we had it, we could find
these two bugs earlier.

Anyway, this seems to be unrelated to this bug fixing so we can start
a new thread for that.

>
> As I said earlier [1], currently only two(there could be more) error
> context callbacks access the sys catalogues, one is in
> slot_store_error_callback which will be fixed with your patch. Another
> is in conversion_error_callback, we can also fix this by storing the
> relname, attname and other required info in ConversionLocation,
> something like the attached. Please have a look.
>
> Thoughts?

Thank you for the patch!

Here are some comments:

+static void set_error_callback_info(ConversionLocation *errpos,
+ Relation rel, int cur_attno,
+ ForeignScanState *fsstate);

I'm concerned a bit that the function name is generic. How about
set_conversion_error_callback_info() or something to make the name
clear?

---
+static void
+conversion_error_callback(void *arg)
+{
+ ConversionLocation *errpos = (ConversionLocation *) arg;
+
+ if (errpos->show_generic_message)
+ errcontext("processing expression at position %d in
select list",
+ errpos->cur_attno);
+

I think we can set this generic message to the error context when
errpos->relname is NULL instead of using show_generic_message.

---
+ /*
+ * Set error context callback info, so that we could avoid accessing
+ * the system catalogues while processing the error in
+ * conversion_error_callback. System catalogue accesses are not safe in
+ * error context callbacks because the transaction might have been
+ * broken by then.
+ */
+ set_error_callback_info(&errpos, rel, i, fsstate);

Looking at other code, we use "catalog" rather than "catalogue" in
most places. Is it better to use "catalog" for consistency? FYI, the
"catalogue" is used at only three places in the server code:

$ git grep "catalogue" -- '*.[ch]'
src/backend/access/brin/brin.c: * This routine scans the complete
index looking for uncatalogued index pages,
src/backend/catalog/pg_constraint.c: * given relation; or InvalidOid
if no such index is catalogued.
src/backend/executor/execMain.c: * As in case of the catalogued
constraints, we treat a NULL result as

FYI I've added those bug fixes to the next Commitfest[1].

Regards,

[1] https://commitfest.postgresql.org/32/2955/

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2021-01-26 06:23:23 Re: Add SQL function for SHA1
Previous Message japin 2021-01-26 05:46:11 Re: Support ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION ... syntax