Re: making relfilenodes 56 bits

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making relfilenodes 56 bits
Date: 2022-07-12 17:09:44
Message-ID: 20220712170944.uyhlisplliyz2sir@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-07-12 09:51:12 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 7:22 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I guess I'm not enthused in duplicating the necessary knowledge in evermore
> > places. We've forgotten one of the magic incantations in the past, and needing
> > to find all the places that need to be patched is a bit bothersome.
> >
> > Perhaps we could add extract helpers out of durable_rename()?
> >
> > OTOH, I don't really see what we gain by keeping things out of the critical
> > section? It does seem good to have the temp-file creation/truncation and write
> > separately, but after that I don't think it's worth much to avoid a
> > PANIC. What legitimate issue does it avoid?
>
> OK, so then I think we should just use durable_rename(). Here's a
> patch that does it that way. I briefly considered the idea of
> extracting helpers, but it doesn't seem worthwhile to me. There's not
> that much code in durable_rename() in the first place.

Cool.

> In this version, I also removed the struct padding, changed the limit
> on the number of entries to a nice round 64, and made some comment
> updates.

What does currently happen if we exceed that?

I wonder if we should just reference a new define generated by genbki.pl
documenting the number of relations that need to be tracked. Then we don't
need to maintain this manually going forward.

> I considered trying to go further and actually make the file
> variable-size, so that we never again need to worry about the limit on
> the number of entries, but I don't actually think that's a good idea.

Yea, I don't really see what we'd gain. For this stuff to change we need to
recompile anyway.

> If we were going to split up durable_rename(), the only intelligible
> split I can see would be to have a second version of the function, or
> a flag to the existing function, that caters to the situation where
> the old file is already known to have been fsync()'d.

I was thinking of something like durable_rename_prep() that'd fsync the
file/directories under their old names, and then durable_rename_exec() that
actually renames and then fsyncs. But without a clear usecase...

> + /* Write new data to the file. */
> + pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_WRITE);
> + if (write(fd, newmap, sizeof(RelMapFile)) != sizeof(RelMapFile))
...
> + pgstat_report_wait_end();
> +

Not for this patch, but we eventually should move this sequence into a
wrapper. Perhaps combined with retry handling for short writes, the ENOSPC
stuff and an error message when the write fails. It's a bit insane how many
copies of this we have.

> diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
> index b578e2ec75..5d3775ccde 100644
> --- a/src/include/utils/wait_event.h
> +++ b/src/include/utils/wait_event.h
> @@ -193,7 +193,7 @@ typedef enum
> WAIT_EVENT_LOGICAL_REWRITE_TRUNCATE,
> WAIT_EVENT_LOGICAL_REWRITE_WRITE,
> WAIT_EVENT_RELATION_MAP_READ,
> - WAIT_EVENT_RELATION_MAP_SYNC,
> + WAIT_EVENT_RELATION_MAP_RENAME,

Very minor nitpick: To me REPLACE would be a bit more accurate than RENAME,
since it includes fsync etc?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-12 17:18:10 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Andres Freund 2022-07-12 17:01:31 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)