Re: making relfilenodes 56 bits

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-09-28 03:52:56
Message-ID: CA+hUKGLViQEkmM0KJa_KHKmBVfNgps2oUpzVByEWXpzGnvO8kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Dilip,

I am very happy to see these commits. Here's some belated review for
the tombstone-removal patch.

> v7-0004-Don-t-delay-removing-Tombstone-file-until-next.patch

More things you can remove:

* sync_unlinkfiletag in struct SyncOps
* the macro UNLINKS_PER_ABSORB
* global variable pendingUnlinks

This comment after the question mark is obsolete:

* XXX should we CHECK_FOR_INTERRUPTS in this loop?
Escaping with an
* error in the case of SYNC_UNLINK_REQUEST would leave the
* no-longer-used file still present on disk, which
would be bad, so
* I'm inclined to assume that the checkpointer will
always empty the
* queue soon.

(I think if the answer to the question is now yes, then we should
replace the stupid sleep with a condition variable sleep, but there's
another thread about that somewhere).

In a couple of places in dbcommands.c you could now make this change:

/*
- * Force a checkpoint before starting the copy. This will
force all dirty
- * buffers, including those of unlogged tables, out to disk, to ensure
- * source database is up-to-date on disk for the copy.
- * FlushDatabaseBuffers() would suffice for that, but we also want to
- * process any pending unlink requests. Otherwise, if a checkpoint
- * happened while we're copying files, a file might be deleted just when
- * we're about to copy it, causing the lstat() call in copydir() to fail
- * with ENOENT.
+ * Force all dirty buffers, including those of unlogged tables, out to
+ * disk, to ensure source database is up-to-date on disk for the copy.
*/
- RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
- CHECKPOINT_WAIT |
CHECKPOINT_FLUSH_ALL);
+ FlushDatabaseBuffers(src_dboid);

More obsolete comments you could change:

* If we were copying database at block levels then drop pages for the
* destination database that are in the shared buffer cache. And tell
--> * checkpointer to forget any pending fsync and unlink
requests for files

--> * Tell checkpointer to forget any pending fsync and unlink requests for
* files in the database; else the fsyncs will fail at next
checkpoint, or
* worse, it will delete file

In tablespace.c I think you could now make this change:

if (!destroy_tablespace_directories(tablespaceoid, false))
{
- /*
- * Not all files deleted? However, there can be
lingering empty files
- * in the directories, left behind by for example DROP
TABLE, that
- * have been scheduled for deletion at next checkpoint
(see comments
- * in mdunlink() for details). We could just delete
them immediately,
- * but we can't tell them apart from important data
files that we
- * mustn't delete. So instead, we force a checkpoint
which will clean
- * out any lingering files, and try again.
- */
- RequestCheckpoint(CHECKPOINT_IMMEDIATE |
CHECKPOINT_FORCE | CHECKPOINT_WAIT);
-
+#ifdef WIN32
/*
* On Windows, an unlinked file persists in the
directory listing
* until no process retains an open handle for the
file. The DDL
@@ -523,6 +513,7 @@ DropTableSpace(DropTableSpaceStmt *stmt)

/* And now try again. */
if (!destroy_tablespace_directories(tablespaceoid, false))
+#endif
{
/* Still not empty, the files must be important then */
ereport(ERROR,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-09-28 04:07:43 Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
Previous Message Masahiko Sawada 2022-09-28 03:49:07 Re: [PoC] Improve dead tuple storage for lazy vacuum