Re: making relfilenodes 56 bits

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-07-05 08:32:42
Message-ID: CAFiTN-uGKi32HUW8xHY_qpxA6zFh4XuuTREPJShL7+-U0_2=Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 3, 2022 at 8:02 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Jul 2, 2022 at 3:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Why did you choose a quite small value for VAR_RFN_PREFETCH? VAR_OID_PREFETCH
> > is 8192, but you chose 64 for VAR_RFN_PREFETCH?
>
> As Dilip mentioned, I suggested a lower value. If that's too low, we
> can go higher, but I think there is value in not making this
> excessively large. Somebody somewhere is going to have a database
> that's crash-restarting like mad, and I don't want that person to run
> through an insane number of relfilenodes for no reason. I don't think
> there are going to be a lot of people creating thousands upon
> thousands of relations in a short period of time, and I'm not sure
> that it's a big deal if those who do end up having to wait for a few
> extra xlog flushes.

Here is the updated version of the patch.

Patch 0001-0003 are the same with review comments fixes given by
Andres, 0004 as an extra assert patch suggested by Andres, this can be
merged with 0003. Basically, during recovery we add asserts checking
"relfilenumbers aren't above ->nextRelFileNumber," and also the assert
for checking that after we allocate a new relfile number the file
should not already exist on the disk so that once we are sure that
this assertion is not hitting then maybe we are safe for removing the
TombStone files immediately what we were doing in 0005.

In 0005 I fixed the file delete order so now we are deleting in
descending order, for that first we need to count the number of
segments by doing stat() on each file and after that we need to go
ahead and unlink in the descending order.

The VAR_RELFILENUMBER_PREFETCH is still 64 as we have not yet
concluded on that, and as discussed I will test some performance to
see whether we have some obvious impact with different values of this.
Maybe I will start with some very small numbers so that we have some
impact.

I thought about this comment from Robert
> that's not quite the same as either of those things. For example, in
> tableam.h we currently say "This callback needs to create a new
> relation filenode for `rel`" and how should that be changed in this
> new naming? We're not creating a new RelFileNumber - those would need
> to be allocated, not created, as all the numbers in the universe exist
> already. Neither are we creating a new locator; that sounds like it
> means assembling it from pieces.

I think that "This callback needs to create a new relation storage
for `rel`" looks better.

I have again reviewed 0001 and 0003 and found some discrepancies in
usage of relfilenumber vs relfilelocator and fixed those, also some
places InvalidOid were use instead of InvalidRelFileNumber.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v5-0005-Don-t-delay-removing-Tombstone-file-until-next.patch application/x-patch 13.4 KB
v5-0004-Assert-checking-to-be-merged-with-0003.patch application/x-patch 4.7 KB
v5-0002-Preliminary-refactoring-for-supporting-larger.patch application/x-patch 20.2 KB
v5-0003-Use-56-bits-for-relfilenumber-to-avoid-wraparound.patch application/x-patch 77.6 KB
v5-0001-Rename-RelFileNode-to-RelFileLocator-and-relNode-.patch application/x-patch 412.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Svetlana Derevyanko 2022-07-05 08:40:23 [PATCH] Optional OR REPLACE in CREATE OPERATOR statement
Previous Message Tatsuo Ishii 2022-07-05 08:29:57 Wrong provolatile value for to_timestamp (1 argument)