RE: Forget close an open relation in ReorderBufferProcessTXN()

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Japin Li <japinli(at)hotmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Andres Freund <andres(at)anarazel(dot)de>
Subject: RE: Forget close an open relation in ReorderBufferProcessTXN()
Date: 2021-05-14 08:50:13
Message-ID: OSBPR01MB48884DDB2C2FA4AAB1BCA0D3ED509@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, May 13, 2021 7:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, May 13, 2021 at 11:15 AM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> > I tried the following scenarios for trying to reproduce this.
> > Scenario2:
> > (1) set up 1 publisher and 1 subscriber
> > (2) create table with user_catalog_table = true on the pub
> > (3) insert some data to this table
> > (4) create publication for the table on the pub
> > (5) create table with user_catalog_table = true on the sub
> > (6) create subscription on the sub
> > (7) add synchronous_standby_names to publisher's configuration and
> > restart the pub
> > (8) have a session to truncate the user_catalog_table on the pub
> >
> > Scenario 2 was successful.
>
> Yeah, because pgoutput or for that matter even test_decoding doesn't
> acquire a lock on user catalog tables.
>
> > Are these the scenario you have in mind, if not please let me know for
> > the missing steps.
> > I would like to reproduce the scenario and write a patch to fix this.
>
> I don't think we can reproduce it with core plugins as they don't lock user
> catalog tables.
OK. My current understanding about how the deadlock happens is below.

1. TRUNCATE command is performed on user_catalog_table.
2. TRUNCATE command locks the table and index with ACCESS EXCLUSIVE LOCK.
3. TRUNCATE waits for the subscriber's synchronization
when synchronous_standby_names is set.
4. Here, the walsender stops, *if* it tries to acquire a lock on the user_catalog_table
because the table where it wants to see is locked by the TRUNCATE already.

If this is right, we need to go back to a little bit higher level discussion,
since whether we should hack any plugin to simulate the deadlock caused by user_catalog_table reference
with locking depends on the assumption if the plugin takes a lock on the user_catalog_table or not.
In other words, if the plugin does read only access to that type of table with no lock
(by RelationIdGetRelation for example ?), the deadlock concern disappears and we don't
need to add anything to plugin sides, IIUC.

Here, we haven't gotten any response about whether output plugin takes (should take)
the lock on the user_catalog_table. But, I would like to make a consensus
about this point before the implementation.

By the way, Amit-san already mentioned the main reason of this
is that we allow decoding of TRUNCATE operation for user_catalog_table in synchronous mode.
The choices are provided by Amit-san already in the past email in [1].
(1) disallow decoding of TRUNCATE operation for user_catalog_tables
(2) disallow decoding of any operation for user_catalog_tables like system catalog tables

Yet, I'm not sure if either option solves the deadlock concern completely.
If application takes an ACCESS EXCLUSIVE lock by LOCK command (not by TRUNCATE !)
on the user_catalog_table in a transaction, and if the plugin tries to take a lock on it,
I think the deadlock happens. Of course, having a consensus that the plugin takes no lock at all
would remove this concern, though.

Like this, I'd like to discuss those two items in question together at first.
* the plugin should take a lock on user_catalog_table or not
* the range of decoding related to user_catalog_table

To me, taking no lock on the user_catalog_table from plugin is fine because
we have historical snapshot mechanism, which doesn't produce deadlock in any combination
even when application executes a LOCK command for ACCESS EXCLUSIVE.
In addition, I agree with the idea that we don't decode any operation on user_catalog_table
and have better alignment with usual system catalogs.

Thoughts ?

[1] - https://www.postgresql.org/message-id/CAA4eK1LP8xTysPEMEBYAZ%3D6KoMWfjyf0gzF-9Bp%3DSgVFvYQDVw%40mail.gmail.com

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-05-14 08:50:30 Re: dump cannot be restored if schema permissions revoked
Previous Message Greg Nancarrow 2021-05-14 08:47:26 Re: Parallel scan with SubTransGetTopmostTransaction assert coredump