Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-03-23 13:19:11
Message-ID: CAFiTN-vPp8czS_dBFoVCZ+nd2B6rUGTSyXOZwuMSPnMBKYExfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 23, 2022 at 5:54 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Mar 23, 2022 at 4:42 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > Okay this is an interesting point. So one option is that in case of
> > failure while using the wal log strategy we do not remove the database
> > directory, because an abort transaction will take care of removing the
> > relation file. But then in failure case we will leave the orphaned
> > database directory with version file and the relmap file. Another
> > option is to do the redundant cleanup as we are doing now. Any other
> > options?
>
> I think our overriding goal should be to get everything using one
> mechanism. It doesn't look straightforward to get everything to go
> through the PendingRelDelete mechanism, because as you say, it can't
> handle non-relation files or directories. However, what if we opt out
> of that mechanism? We could do that either by not using
> RelationCreateStorage() in the first place and directly calling
> smgrcreate(), or by using RelationPreserveStorage() afterwards to yank
> the file back out of the list.

I think directly using smgrcreate() is a better idea instead of first
registering and then unregistering it. I have made that change in
the attached patch. After this change now we can merge creating the
MAIN_FORKNUM also in the loop below where we are creating other
fork[1] with one extra condition but I think current code is in more
sync with the other code where we are doing the similar things so I
have not merged it in the loop. Please let me know if you think
otherwise.

[1]
+ /*
+ * Create and copy all forks of the relation. We are not using
+ * RelationCreateStorage() as it is registering the cleanup for the
+ * underlying relation storage on the transaction abort. But during create
+ * database failure, we have a separate cleanup mechanism for the whole
+ * database directory. Therefore, we don't need to register cleanup for
+ * each individual relation storage.
+ */
+ smgrcreate(RelationGetSmgr(dst_rel), MAIN_FORKNUM, false);
+ if (permanent)
+ log_smgrcreate(&dst_rnode, MAIN_FORKNUM);
+
+ /* copy main fork. */
+ RelationCopyStorageUsingBuffer(src_rel, dst_rel, MAIN_FORKNUM, permanent);
+
+ /* copy those extra forks that exist */
+ for (ForkNumber forkNum = MAIN_FORKNUM + 1;
+ forkNum <= MAX_FORKNUM; forkNum++)
+ {
+ if (smgrexists(RelationGetSmgr(src_rel), forkNum))
+ {
+ smgrcreate(RelationGetSmgr(dst_rel), forkNum, false);
+

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v3-0001-Add-new-block-by-block-strategy-for-CREATE-DATABA.patch text/x-patch 61.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-23 13:19:42 Re: refactoring basebackup.c (zstd workers)
Previous Message Julien Rouhaud 2022-03-23 13:03:18 Re: make MaxBackends available in _PG_init