Re: logical replication worker accesses catalogs in error context callback

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: logical replication worker accesses catalogs in error context callback
Date: 2021-02-05 06:38:43
Message-ID: CALj2ACWRUYJDCsAA-yRJuROYwEo_tRSsGpGcNcFA=eyiTvBjnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 4, 2021 at 5:42 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Thanks Amit. I verified it with gdb. I attached gdb to the logical
> > replication worker. In slot_store_data's for loop, I intentionally set
> > CurrentTransactionState->state = TRANS_DEFAULT,
> >
>
> What happens if you don't change CurrentTransactionState->state?

Just a refresher - the problem we are trying to solve with the 2
patches proposed in this thread [1] is that the error callbacks
shouldn't be accessing system catalogues because the txn might be
broken by then or another error can be raised in system catalogue
access paths (if we have a cache miss and access the catalogue table
from the disk). So the required information that's needed in the error
callback is to be filled in the function that register the callback.

Now, coming to testing the patches: I tried to use the error callback
when we are outside of the xact. For 0001patch i.e.
slot_store_error_callback, I called it after the
apply_handle_commit_internal in apply_handle_commit. The expectation
is that the patches proposed in this thread [1], should just be usable
even outside of the xact.

HEAD: The logical replication worker crashes at
Assert(IsTransactionState() in SearchCatCacheInternal. See [2] for the
testing patch that's used.

Patched: The testing DEBUG message gets printed in the subscriber
server log and no crash happens. See [3] for the testing patch that's
used.
2021-02-05 12:02:32.340 IST [2361724] DEBUG: calling slot_store_error_callback
2021-02-05 12:02:32.340 IST [2361724] CONTEXT: processing remote data
for replication target relation "public.t1" column "a1", remote type
remote_test_type, local type local_test_type

Similarly, the 0002 patch in [1] which is avoiding system catalogues
in conversion_error_callback postgres_fdw can also be tested.

Hope this testing is enough for the patches. Please let me know if
anything more is required.

[1] - https://www.postgresql.org/message-id/CALj2ACUkvABzq6yeKQZsgng5Rd_NF%2BikhTvL9k4NR0GSRKsSdg%40mail.gmail.com
[2] -
@@ -713,6 +717,26 @@ apply_handle_commit(StringInfo s)
apply_handle_commit_internal(s, &commit_data);
+ if (rel_for_err_cb_chk)
+ {
+ /* We are out of xact. */
+ Assert(!IsTransactionState());
+
+ /* Push callback + info on the error context stack */
+ errarg.rel = rel_for_err_cb_chk;
+ errarg.local_attnum = 0;
+ errarg.remote_attnum = 0;
+ errcallback.callback = slot_store_error_callback;
+ errcallback.arg = (void *) &errarg;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ elog(DEBUG1, "calling slot_store_error_callback");
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+ rel_for_err_cb_chk = NULL;
+ }
[3]
@@ -719,6 +724,26 @@ apply_handle_commit(StringInfo s)
apply_handle_commit_internal(s, &commit_data);
+ if (rel_for_err_cb_chk)
+ {
+ /* We are out of xact. */
+ Assert(!IsTransactionState());
+ errarg.rel = rel_for_err_cb_chk;
+ errarg.remote_attnum = 0;
+ errarg.local_typename = "local_test_type";
+ errarg.remote_typename = "remote_test_type";
+ errcallback.callback = slot_store_error_callback;
+ errcallback.arg = (void *) &errarg;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ elog(DEBUG1, "calling slot_store_error_callback");
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+ rel_for_err_cb_chk = NULL;
+ }

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-05 07:01:53 Re: pg_replication_origin_drop API potential race condition
Previous Message Michael Paquier 2021-02-05 06:35:39 Re: Support for NSS as a libpq TLS backend