Re: logical replication worker accesses catalogs in error context callback

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: logical replication worker accesses catalogs in error context callback
Date: 2021-01-11 05:25:29
Message-ID: CAD21AoDxD1Z2SWNCxqHBDqrLPFAF+3XGFpWKiZEPL3VCBPhA=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 9, 2021 at 2:57 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Jan 7, 2021 at 5:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 6, 2021 at 11:42 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 6, 2021 at 11:02 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Due to a debug ereport I just noticed that worker.c's
> > > > slot_store_error_callback is doing something quite dangerous:
> > > >
> > > > static void
> > > > slot_store_error_callback(void *arg)
> > > > {
> > > > SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
> > > > LogicalRepRelMapEntry *rel;
> > > > char *remotetypname;
> > > > Oid remotetypoid,
> > > > localtypoid;
> > > >
> > > > /* Nothing to do if remote attribute number is not set */
> > > > if (errarg->remote_attnum < 0)
> > > > return;
> > > >
> > > > rel = errarg->rel;
> > > > remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
> > > >
> > > > /* Fetch remote type name from the LogicalRepTypMap cache */
> > > > remotetypname = logicalrep_typmap_gettypname(remotetypoid);
> > > >
> > > > /* Fetch local type OID from the local sys cache */
> > > > localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 1);
> > > >
> > > > errcontext("processing remote data for replication target relation \"%s.%s\" column \"%s\", "
> > > > "remote type %s, local type %s",
> > > > rel->remoterel.nspname, rel->remoterel.relname,
> > > > rel->remoterel.attnames[errarg->remote_attnum],
> > > > remotetypname,
> > > > format_type_be(localtypoid));
> > > > }
> > > >
> > > >
> > > > that's not code that can run in an error context callback. It's
> > > > theoretically possible (but unlikely) that
> > > > logicalrep_typmap_gettypname() is safe to run in an error context
> > > > callback. But get_atttype() definitely isn't.
> > > >
> > > > get_attype() may do catalog accesses. That definitely can't happen
> > > > inside an error context callback - the entire transaction might be
> > > > borked at this point!
> > >
> > > You're right. Perhaps calling to format_type_be() is also dangerous
> > > since it does catalog access. We should have added the local type
> > > names to SlotErrCallbackArg so we avoid catalog access in the error
> > > context.
> > >
> > > I'll try to fix this.
> >
> > Attached the patch that fixes this issue.
> >
> > Since logicalrep_typmap_gettypname() could search the sys cache by
> > calling to format_type_be(), I stored both local and remote type names
> > to SlotErrCallbackArg so that we can just set the names in the error
> > callback without sys cache lookup.
> >
> > Please review it.
>
> Patch looks good, except one minor comment - can we store
> rel->remoterel.atttyps[remoteattnum] into a local variable and use it
> in logicalrep_typmap_gettypname, just to save on indentation?

Thank you for reviewing the patch!

Agreed. Attached the updated patch.

>
> I quickly searched in places where error callbacks are being used, I
> think we need a similar kind of fix for conversion_error_callback in
> postgres_fdw.c, because get_attname and get_rel_name are being used
> which do catalogue lookups. ISTM that all the other error callbacks
> are good in the sense that they are not doing sys catalogue lookups.

Indeed. If we need to disallow the catalog lookup during executing
error callbacks we might want to have an assertion checking that in
SearchCatCacheInternal(), in addition to Assert(IsTransactionState()).

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachment Content-Type Size
fix_slot_store_error_callback_v2.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-01-11 05:30:14 Re: [HACKERS] Custom compression methods
Previous Message Amit Kapila 2021-01-11 04:34:56 Re: Single transaction in the tablesync worker?