Re: logical replication worker accesses catalogs in error context callback

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: logical replication worker accesses catalogs in error context callback
Date: 2021-01-06 02:42:39
Message-ID: CAD21AoDQZ7YP_pVOPn5ibkZx38OGJN0BgzGPt6frPE=DgzXSxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 6, 2021 at 11:02 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> Due to a debug ereport I just noticed that worker.c's
> slot_store_error_callback is doing something quite dangerous:
>
> static void
> slot_store_error_callback(void *arg)
> {
> SlotErrCallbackArg *errarg = (SlotErrCallbackArg *) arg;
> LogicalRepRelMapEntry *rel;
> char *remotetypname;
> Oid remotetypoid,
> localtypoid;
>
> /* Nothing to do if remote attribute number is not set */
> if (errarg->remote_attnum < 0)
> return;
>
> rel = errarg->rel;
> remotetypoid = rel->remoterel.atttyps[errarg->remote_attnum];
>
> /* Fetch remote type name from the LogicalRepTypMap cache */
> remotetypname = logicalrep_typmap_gettypname(remotetypoid);
>
> /* Fetch local type OID from the local sys cache */
> localtypoid = get_atttype(rel->localreloid, errarg->local_attnum + 1);
>
> errcontext("processing remote data for replication target relation \"%s.%s\" column \"%s\", "
> "remote type %s, local type %s",
> rel->remoterel.nspname, rel->remoterel.relname,
> rel->remoterel.attnames[errarg->remote_attnum],
> remotetypname,
> format_type_be(localtypoid));
> }
>
>
> that's not code that can run in an error context callback. It's
> theoretically possible (but unlikely) that
> logicalrep_typmap_gettypname() is safe to run in an error context
> callback. But get_atttype() definitely isn't.
>
> get_attype() may do catalog accesses. That definitely can't happen
> inside an error context callback - the entire transaction might be
> borked at this point!

You're right. Perhaps calling to format_type_be() is also dangerous
since it does catalog access. We should have added the local type
names to SlotErrCallbackArg so we avoid catalog access in the error
context.

I'll try to fix this.

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-01-06 02:48:45 Re: Deadlock between backend and recovery may not be detected
Previous Message Joel Jacobson 2021-01-06 02:39:42 Re: set_config() documentation clarification