Re: Make relfile tombstone files conditional on WAL level

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make relfile tombstone files conditional on WAL level
Date: 2022-01-28 14:40:15
Message-ID: CAFiTN-sJqzCnXOVFb0QVAKidXdV1QzA9tR9EDwM65JAScGr0VQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 19, 2022 at 10:37 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Jan 6, 2022 at 7:22 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Thu, Jan 6, 2022 at 3:47 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> > Another problem is that relfilenodes are normally allocated with
>> > GetNewOidWithIndex(), and initially match a relation's OID. We'd need
>> > a new allocator, and they won't be able to match the OID in general
>> > (while we have 32 bit OIDs at least).
>>
>> Personally I'm not sad about that. Values that are the same in simple
>> cases but diverge in more complex cases are kind of a trap for the
>> unwary. There's no real reason to have them ever match. Yeah, in
>> theory, it makes it easier to tell which file matches which relation,
>> but in practice, you always have to double-check in case the table has
>> ever been rewritten. It doesn't seem worth continuing to contort the
>> code for a property we can't guarantee anyway.
>
>
> Make sense, I have started working on this idea, I will try to post the first version by early next week.

Here is the first working patch, with that now we don't need to
maintain the TombStone file until the next checkpoint. This is still
a WIP patch with this I can see my problem related to ALTER DATABASE
SET TABLESPACE WAL-logged problem is solved which Robert reported a
couple of mails above in the same thread.

General idea of the patch:
- Change the RelFileNode.relNode to be 64bit wide, out of which 8 bits
for fork number and 56 bits for the relNode as shown below. [1]
- GetNewRelFileNode() will just generate a new unique relfilenode and
check the file existence and if it already exists then throw an error,
so no loop. We also need to add the logic for preserving the
nextRelNode across restart and also WAL logging it but that is similar
to the preserving nextOid.
- mdunlinkfork, will directly forget the relfilenode, so we get rid of
all unlinking code from the code.
- Now, we don't need any post checkpoint unlinking activity.

[1]
/*
* RelNodeId:
*
* this is a storage type for RelNode. The reasoning behind using this is same
* as using the BlockId so refer comment atop BlockId.
*/
typedef struct RelNodeId
{
uint32 rn_hi;
uint32 rn_lo;
} RelNodeId;
typedef struct RelFileNode
{
Oid spcNode; /* tablespace */
Oid dbNode; /* database */
RelNodeId relNode; /* relation */
} RelFileNode;

TODO:

There are a couple of TODOs and FIXMEs which I am planning to improve
by next week. I am also planning to do the testing where relfilenode
consumes more than 32 bits, maybe for that we can set the
FirstNormalRelfileNode to higher value for the testing purpose. And,
Improve comments.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-WIP-0001-Don-t-wait-for-next-checkpoint-to-remove-unwanted.patch text/x-patch 79.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-01-28 14:42:17 Re: warn if GUC set to an invalid shared library
Previous Message Aleksander Alekseev 2022-01-28 14:30:17 Re: Add 64-bit XIDs into PostgreSQL 15