|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||Robert Haas <robertmhaas(at)gmail(dot)com>|
|Cc:||Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: making relfilenodes 56 bits|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2022-07-12 09:51:12 -0400, Robert Haas wrote:
> On Mon, Jul 11, 2022 at 7:22 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I guess I'm not enthused in duplicating the necessary knowledge in evermore
> > places. We've forgotten one of the magic incantations in the past, and needing
> > to find all the places that need to be patched is a bit bothersome.
> > Perhaps we could add extract helpers out of durable_rename()?
> > OTOH, I don't really see what we gain by keeping things out of the critical
> > section? It does seem good to have the temp-file creation/truncation and write
> > separately, but after that I don't think it's worth much to avoid a
> > PANIC. What legitimate issue does it avoid?
> OK, so then I think we should just use durable_rename(). Here's a
> patch that does it that way. I briefly considered the idea of
> extracting helpers, but it doesn't seem worthwhile to me. There's not
> that much code in durable_rename() in the first place.
> In this version, I also removed the struct padding, changed the limit
> on the number of entries to a nice round 64, and made some comment
What does currently happen if we exceed that?
I wonder if we should just reference a new define generated by genbki.pl
documenting the number of relations that need to be tracked. Then we don't
need to maintain this manually going forward.
> I considered trying to go further and actually make the file
> variable-size, so that we never again need to worry about the limit on
> the number of entries, but I don't actually think that's a good idea.
Yea, I don't really see what we'd gain. For this stuff to change we need to
> If we were going to split up durable_rename(), the only intelligible
> split I can see would be to have a second version of the function, or
> a flag to the existing function, that caters to the situation where
> the old file is already known to have been fsync()'d.
I was thinking of something like durable_rename_prep() that'd fsync the
file/directories under their old names, and then durable_rename_exec() that
actually renames and then fsyncs. But without a clear usecase...
> + /* Write new data to the file. */
> + pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_WRITE);
> + if (write(fd, newmap, sizeof(RelMapFile)) != sizeof(RelMapFile))
> + pgstat_report_wait_end();
Not for this patch, but we eventually should move this sequence into a
wrapper. Perhaps combined with retry handling for short writes, the ENOSPC
stuff and an error message when the write fails. It's a bit insane how many
copies of this we have.
> diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
> index b578e2ec75..5d3775ccde 100644
> --- a/src/include/utils/wait_event.h
> +++ b/src/include/utils/wait_event.h
> @@ -193,7 +193,7 @@ typedef enum
> - WAIT_EVENT_RELATION_MAP_SYNC,
> + WAIT_EVENT_RELATION_MAP_RENAME,
Very minor nitpick: To me REPLACE would be a bit more accurate than RENAME,
since it includes fsync etc?
|Next Message||Andres Freund||2022-07-12 17:18:10||Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)|
|Previous Message||Andres Freund||2022-07-12 17:01:31||Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)|