Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-08-04 11:08:35
Message-ID: CAFiTN-v9M6myyLqZt0u5+52dqdpGkGbdjNcM3WMB5KOWcGD2tA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 4, 2022 at 9:41 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Aug 4, 2022 at 12:18 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > On Wed, Aug 03, 2022 at 11:26:43AM -0700, Andres Freund wrote:
> > > Hm. This looks more like an issue of DROP DATABASE not being interruptible. I
> > > suspect this isn't actually related to STRATEGY wal_log and could likely be
> > > reproduced in older versions too.
> >
> > I couldn't reproduce it with file_copy, but my recipe isn't exactly reliable.
> > That may just mean that it's easier to hit now.
>
> I think this looks like a problem with drop db but IMHO you are seeing
> this behavior only when a database is created using WAL LOG because in
> this strategy we are using buffers to write the destination database
> pages and some of the dirty buffers and sync requests might still be
> pending. And now when we try to drop the database it drops all the
> dirty buffers and all pending sync requests and then before it
> actually removes the directory it gets interrupted and now you see the
> database directory on disk which is partially corrupted. See below
> sequence of drop database
>
>
> dropdb()
> {
> ...
> DropDatabaseBuffers(db_id);
> ...
> ForgetDatabaseSyncRequests(db_id);
> ...
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
>
> WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
> -- Inside this it can process the cancel query and get interrupted
> remove_dbtablespaces(db_id);
> ..
> }
>
> I reproduced the same error by inducing error just before
> WaitForProcSignalBarrier.
>
> postgres[14968]=# CREATE DATABASE a STRATEGY WAL_LOG ; drop database a;
> CREATE DATABASE
> ERROR: XX000: test error
> LOCATION: dropdb, dbcommands.c:1684
> postgres[14968]=# \c a
> connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC:
> could not open critical system index 2662
> Previous connection kept
> postgres[14968]=#

So basically, from this we can say it is completely a problem with
drop databases, I mean I can produce any behavior by interrupting drop
database
1. If we created some tables/inserted data and the drop database got
cancelled, it might have a database directory and those objects are
lost.
2. Or you can even drop the database directory and then get cancelled
before deleting the pg_database entry then also you will end up with a
corrupted database, doesn't matter whether you created it with WAL LOG
or FILE COPY.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinya Kato 2022-08-04 11:09:51 Fix inconsistencies GUC categories
Previous Message Richard Guo 2022-08-04 11:02:13 Re: Fix obsoleted comments for function prototypes