From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Date: | 2022-08-04 11:08:35 |
Message-ID: | CAFiTN-v9M6myyLqZt0u5+52dqdpGkGbdjNcM3WMB5KOWcGD2tA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 4, 2022 at 9:41 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Aug 4, 2022 at 12:18 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> >
> > On Wed, Aug 03, 2022 at 11:26:43AM -0700, Andres Freund wrote:
> > > Hm. This looks more like an issue of DROP DATABASE not being interruptible. I
> > > suspect this isn't actually related to STRATEGY wal_log and could likely be
> > > reproduced in older versions too.
> >
> > I couldn't reproduce it with file_copy, but my recipe isn't exactly reliable.
> > That may just mean that it's easier to hit now.
>
> I think this looks like a problem with drop db but IMHO you are seeing
> this behavior only when a database is created using WAL LOG because in
> this strategy we are using buffers to write the destination database
> pages and some of the dirty buffers and sync requests might still be
> pending. And now when we try to drop the database it drops all the
> dirty buffers and all pending sync requests and then before it
> actually removes the directory it gets interrupted and now you see the
> database directory on disk which is partially corrupted. See below
> sequence of drop database
>
>
> dropdb()
> {
> ...
> DropDatabaseBuffers(db_id);
> ...
> ForgetDatabaseSyncRequests(db_id);
> ...
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
>
> WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
> -- Inside this it can process the cancel query and get interrupted
> remove_dbtablespaces(db_id);
> ..
> }
>
> I reproduced the same error by inducing error just before
> WaitForProcSignalBarrier.
>
> postgres[14968]=# CREATE DATABASE a STRATEGY WAL_LOG ; drop database a;
> CREATE DATABASE
> ERROR: XX000: test error
> LOCATION: dropdb, dbcommands.c:1684
> postgres[14968]=# \c a
> connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC:
> could not open critical system index 2662
> Previous connection kept
> postgres[14968]=#
So basically, from this we can say it is completely a problem with
drop databases, I mean I can produce any behavior by interrupting drop
database
1. If we created some tables/inserted data and the drop database got
cancelled, it might have a database directory and those objects are
lost.
2. Or you can even drop the database directory and then get cancelled
before deleting the pg_database entry then also you will end up with a
corrupted database, doesn't matter whether you created it with WAL LOG
or FILE COPY.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Shinya Kato | 2022-08-04 11:09:51 | Fix inconsistencies GUC categories |
Previous Message | Richard Guo | 2022-08-04 11:02:13 | Re: Fix obsoleted comments for function prototypes |