Re: BUG #17182: Race condition on concurrent DROP and CREATE of dependent object

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: exclusion(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17182: Race condition on concurrent DROP and CREATE of dependent object
Date: 2021-09-05 14:15:37
Message-ID: 2872252.1630851337@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> As result of the following script:
> for i in `seq 100`; do
> ( { for n in `seq 20`; do echo "DROP DOMAIN i;"; done } | psql ) >psql1.log
> 2>&1 &
> ( echo "
> CREATE DOMAIN i AS int;
> CREATE FUNCTION f1() RETURNS i LANGUAGE SQL RETURN 1;
> CREATE FUNCTION f2() RETURNS i LANGUAGE SQL RETURN 2;
> CREATE FUNCTION f3() RETURNS i LANGUAGE SQL RETURN 3;
> CREATE FUNCTION f4() RETURNS i LANGUAGE SQL RETURN 4;
> CREATE FUNCTION f5() RETURNS i LANGUAGE SQL RETURN 5;
> " | psql ) >psql2.log 2>&1 &
> wait
> psql -c "DROP DOMAIN i CASCADE" >psql3.log 2>&1
> done

> I get several broken functions with the invalid return type:
> SELECT f1()
> ERROR: cache lookup failed for type 16519
> CONTEXT: SQL function "f1" during inlining

I don't find this particularly surprising, and I'm unwilling to add the
amount of locking overhead it'd take to prevent it.

The generic problem is that a newly-created dependent object is not
protected against deletion of its referenced object(s) until we commit
its new pg_depend entries; before that, a concurrent DROP won't see
the dependencies. I recall some discussion of trying to take an
anti-deletion lock in recordDependency, but that's too late: the
deletion might have committed and released its own lock since we
looked up the type (or other referenced object). So the only real fix
for this would be to make every object lookup in the entire system do
the sort of dance that's done in RangeVarGetRelidExtended. We have
agreed that the cost is worth it for tables (though I don't think
that that was without controversy, nor am I 100% convinced that
RangeVarGetRelidExtended is correct). But I'm not excited about
extending the principle to other object types.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-09-05 14:24:35 Re: Logs vanish after partial log destination change
Previous Message hubert depesz lubaczewski 2021-09-05 11:28:33 Logs vanish after partial log destination change