Database "contrib_regression" does not exist during testing

From: Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Database "contrib_regression" does not exist during testing
Date: 2022-11-10 15:02:29
Message-ID: 755e742f-8476-5a4e-69e1-dc3ac4cf0af8@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone.

Recently when I was running regression tests, I got 'Database
"contrib_regression" does not exist' error. After I reproduce the
problem, I found it is an auto-vacuum worker process who complains about
this error.

Then I tried to analyze the code. When this auto-vacuum worker process
is forked from PostMaster and get into `InitPostgres` in postinit.c, it
will do following steps:

1. Use the oid of current database to search for the tuple in catalog,
and get the database name. During this time, it will add AccessShareLock
on catalog and release it after scan;
2. Call LockSharedObject to add RowExclusiveLock on catalog
3. Use database name to search catalog again, make sure the tuple of
current database still exists.

During the interval between step 1 and 2, the catalog is not protected
by any lock, so that another backend process can drop the database
successfully, causing current process complains about database does not
exist in step 3.

This issue could not only happen between auto vacuum worker process and
backend process, but also can happen between two backend processes,
given the special interleaving order of processes. We can use psql to
connect to the database, and make the backend process stops at the
interval between step 1 and 2, and let another backend process drop this
database, then the first backend process will complain about this error.

I am confused about whether this error should happen in regression
testing? Is it possible to lock the catalog at step 1 and hold it, so
that another process will not have the chance to drop the database,
since dropdb needs to lock the catalog with AccessExclusiveLock? And
what is the consideration of the design at these 3 steps?

Hopefully to get some voice from kernel hackers, thanks~

--
Best Regards,

Jingtang

——————————————————————

Jingtang Zhang

E-Mail: mrdrivingduck(at)gmail(dot)com
GitHub: @mrdrivingduck

Sent from Microsoft Surface Book 2.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-11-10 15:09:37 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Zhang Mingli 2022-11-10 14:06:09 What’s the usage of SO_TYPE_ANALYZE