Re: Unlogged tables cleanup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged tables cleanup
Date: 2016-11-17 15:06:28
Message-ID: CA+TgmoZ4TWaPCKhF-szV-nPxDXL40zCwm9pNFJZURvRgm2oJzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 16, 2016 at 11:55 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Nov 16, 2016 at 7:09 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Nov 16, 2016 at 3:54 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> Indeed I missed this comment block. Please let me suggest the following instead:
>>> /*
>>> * Set up an init fork for an unlogged table so that it can be correctly
>>> - * reinitialized on restart. Since we're going to do an immediate sync, we
>>> - * only need to xlog this if archiving or streaming is enabled. And the
>>> - * immediate sync is required, because otherwise there's no guarantee that
>>> - * this will hit the disk before the next checkpoint moves the redo pointer.
>>> + * reinitialized on restart. An immediate sync is required even if the
>>> + * page has been logged, because the write did not go through
>>> + * shared_buffers and therefore a concurrent checkpoint may have moved
>>> + * the redo pointer past our xlog record.
>>> */
>>
>> Hmm. Well, that deletes the comment that's no longer true, but it
>> doesn't replace it with any explanation of why we also need to WAL-log
>> it unconditionally, and I think that explanation is not entirely
>> trivial?
>
> OK, the original code does not give any special reason either
> regarding why doing so is safe for archiving or streaming :)

Yeah, but surely it's obvious that if you don't WAL log it, it's not
going to work for archiving or streaming. It's a lot less obvious why
you have to WAL log it when you're not doing either of those things if
you've already written it and fsync'd it locally. How is it going to
disappear if it's been fsync'd, one wonders?

> More seriously, if there could be more details regarding that, I would
> think that we could say something like "logging the init fork is
> mandatory in any case to ensure its on-disk presence when at recovery
> replay, even on non-default tablespaces whose base location are
> deleted and re-created from scratch if the WAL record in charge of
> creating this tablespace gets replayed". The problem shows up because
> of tablespaces being deleted at replay at the end... So perhaps this
> makes sense. What do you think?

Yes, that's about what I think we need to explain. Actually, I'm
wondering if we ought to just switch this over to using the delayChkpt
mechanism instead of going through this complicated song-and-dance.
Incurring an immediate sync to avoid having to WAL-log this is a bit
tenuous, but having to WAL-log it AND do the immediate sync just seems
silly, and I'm actually a bit worried that even with your fix there
might still be a latent bug somewhere. We couldn't switch mechanisms
cleanly in the 9.2 branch (cf.
f21bb9cfb5646e1793dcc9c0ea697bab99afa523) so perhaps we should
back-patch it in the form you have it in already, but it's probably
worth switching over at least in master.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tobias Bussmann 2016-11-17 15:07:00 Re: Parallel execution and prepared statements
Previous Message Kyotaro HORIGUCHI 2016-11-17 12:24:06 Re: A bug in UCS_to_most.pl