Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mahendra Singh <mahi6run(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
Date: 2020-01-06 01:42:02
Message-ID: 20200106014202.GA3598@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Sun, Dec 29, 2019 at 07:37:15AM -0500, Robert Haas wrote:
> I think this was a bad idea and that it should be reverted. It seems
> to me that the problem here is that you introduced a feature which had
> a bug, namely that it couldn't tolerate concurrency, and when somebody
> discovered the bug, you "fixed" it not by making the code able to
> tolerate concurrent activity but by preventing concurrent activity
> from happening in the first place. I think that's wrong on general
> principle.

Sorry for the delay, there was a long period off here so I could not
have a serious look.

The behavior of the code in 246a6c8 has changed so as a non-existing
temporary namespace is considered as not in use, in which case
autovacuum would consider this relation to be orphaned, and it would
then try to drop it. Anyway, just a revert of the patch is not a good
idea either, because keeping around the old behavior allows any user
to create orphaned relations that autovacuum would just ignore in
9.4~10, leading to ultimately a forced shutdown of the instance as no
cleanup can happen if this goes unnoticed. This also puts pg_class
into an inconsistent state as pg_class entries would include
references to a namespace that does not exist for sessions still
holding its own references to myTempNamespace/myTempToastNamespace.

> In this specific case, DROP SCHEMA on another temporary sessions's
> schema is a feature which has existed for a very long time and which I
> have used on multiple occasions to repair damaged databases. Suppose,
> for example, there's a catalog entry that prevents the schema from
> being dropped. Before this commit, you could fix it or delete the
> entry and then retry the drop. Now, you can't. You can maybe wait for
> autovacuum to retry it or something, assuming autovacuum is working
> and you're in multi-user mode.

This behavior is broken since its introduction then per the above. If
we were to allow DROP SCHEMA to work properly on temporary schema, we
would need to do more than what we have now, and that does not involve
just mimicking DISCARD TEMP if you really wish to be able to drop the
schema entirely and not only the objects it includes. Allowing a
temporary schema to be dropped only if it is owned by the current
session would be simple enough to implement, but I think that allowing
that to work properly for a schema owned by another session would be
rather difficult to implement for little gains. Now, if you still
wish to be able to do a DROP SCHEMA on a temporary schema, I have no
objections to allow doing that, but under some conditions. So I would
recommend to restrict it so as this operation is not allowed by
default, and I think we ought to use allow_system_table_mods to
control that, because if you were to do that you are an operator and
you know what you are doing. Normally :)

> But even if that weren't the case, this seems like a very fragile fix.
> Maybe someday we'll allow multiple autovacuum workers in the same
> database, and the problem comes back. Maybe some user who can't drop
> the schema because of this arbitrary prohibition will find themselves
> forced to delete the pg_namespace row by hand and then crash the
> server. Most server code is pretty careful that to either tolerate
> missing system catalog tuples or elog(ERROR), not crash (e.g. cache
> lookup failed for ...). This code shouldn't be an exception to that
> rule.

You are right here, things could be done better in 11 and newer
versions, still there are multiple ways to do that. Here are three
suggestions:
1) Issue an elog(ERROR) as that's what we do usually for lookup errors
and such when seeing an orphaned relation which refers to a
non-existing namespace. But this would prevent autovacuum to do
any kind of work and just loop over-and-over on the same error, just
bloating the database involved.
2) Ignore the relation and leave it around, though we really have been
fighting to make autovacuum more aggressive, so that would defeat the
work done lately for that purpose.
3) Still drop the orphaned relation even if it references to a
non-existing schema, generating an appropriate LOG message so as the
problem comes from an incorrect lookup at the namespace name.

Attached is a patch doing two things:
a) Control DROP SCHEMA on a temporary namespace using
allow_system_table_mods.
b) Generate a non-buggy LOG message if trying to remove a temp
relation referring to a temporary schema that does not exist, using
"(null)" as a replacement for the schema name.

My suggestion is to do a) down to 9.4 if that's thought to be helpful
to have, and at least Robert visibly thinks so, then b) in 11~ as
that's where 246a6c8 exists. Comments welcome.
--
Michael

Attachment Content-Type Size
drop-temp-schema-adjust-v1.patch text/x-diff 1.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-01-06 02:52:51 BUG #16186: The usage of undefined value in pgbench.c
Previous Message Ryan Lambert 2020-01-06 00:21:20 Re: BUG #16183: PREPARED STATEMENT slowed down by jit

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-06 02:01:45 Re: Commit fest manager for 2020-01
Previous Message tsunakawa.takay@fujitsu.com 2020-01-06 01:37:46 RE: Libpq support to connect to standby server as priority