Re: [HACKERS] Unlogged tables cleanup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unlogged tables cleanup
Date: 2019-05-13 15:28:32
Message-ID: CA+TgmobiYkqB23VD=nDrRRgCEF9FUDmdn9MwzbLEHmu9vWTLQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 8, 2016 at 7:49 PM Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Fri, Dec 9, 2016 at 4:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Wed, Dec 7, 2016 at 11:20 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> OK, I rewrote a bit the patch as attached. What do you think?
> >
> > Committed and back-patched all the way back to 9.2.
>
> Thanks!

This thread resulted in commit
fa0f466d5329e10b16f3b38c8eaf5306f7e234e8, and today I had cause to
look at that commit again. The code is now in
heapam_relation_set_new_filenode. I noticed a few things that seem
like they are not quite right.

1. The comment added in that commit says "Recover may as well remove
it while replaying..." but what is really meant is "Recovery may well
remove it well replaying..." The phrase "may as well" means that
there isn't really any reason not to do it even if it's not strictly
necessary. The phrase "may well" means that it's entirely possible.
The latter meaning is the one we want here.

2. The comment as adjusted in that commit refers to needing an
immediate sync "even if the page has been logged, because the write
did not go through shared_buffers," but there is no page and no write,
because an empty heap is just an empty file. That logic makes sense
for index AMs that bypass shared buffers to write a metapage, e.g.
nbtree, as opposed to others which go through shared_buffers and don't
have the immediate sync, e.g. brin. However, the heap writes no pages
either through shared buffers or bypassing shared buffers, so either
I'm confused or the comment makes no sense.

3. Before that commit, the comment said that "the immediate sync is
required, because otherwise there's no guarantee that this will hit
the disk before the next checkpoint moves the redo pointer." However,
that justification seems to apply to *anything* that does smgrcreate +
log_smgrcreate would also need to do smgrimmedsync, and
RelationCreateStorage doesn't. Unless, for some reason that I'm not
thinking of right now, the init fork has stronger durability
requirements that the main fork, it's hard to understand why
heapam_relation_set_new_filenode can call RelationCreateStorage to do
smgrcreate + log_smgrcreate for the main fork and that's OK, but when
it does the same thing itself for the init fork, it now needs
smgrimmedsync as well.

My guess, just shooting from the hip, is that the smgrimmedsync call
can be removed here. If that's wrong, then we need a better
explanation for why it's needed, and we possibly need to add it to
every single place that does smgrcreate that doesn't have it already.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-05-13 15:32:40 Re: Fuzzy thinking in is_publishable_class
Previous Message Ashutosh Sharma 2019-05-13 14:47:49 Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()