Re: Unlogged relations and WAL-logging

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlogged relations and WAL-logging
Date: 2023-09-01 12:49:52
Message-ID: a5b13df4-a083-8119-5a27-63360a147ece@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Is the patch
0003-Remove-unnecessary-smgrimmedsync-when-creating-unlog.patch still
relevant, or can this commitfest entry be closed?

On 23.08.23 16:40, Heikki Linnakangas wrote:
>>>> 5. In heapam_relation_set_new_filenode(), we do this:
>>>>
>>>>>
>>>>>         /*
>>>>>          * If required, set up an init fork for an unlogged table
>>>>> so that it can
>>>>>          * be correctly reinitialized on restart.  An immediate
>>>>> sync is required
>>>>>          * even if the page has been logged, because the write did
>>>>> not go through
>>>>>          * shared_buffers and therefore a concurrent checkpoint may
>>>>> have moved the
>>>>>          * redo pointer past our xlog record.  Recovery may as well
>>>>> remove it
>>>>>          * while replaying, for example, XLOG_DBASE_CREATE or
>>>>> XLOG_TBLSPC_CREATE
>>>>>          * record. Therefore, logging is necessary even if
>>>>> wal_level=minimal.
>>>>>          */
>>>>>         if (persistence == RELPERSISTENCE_UNLOGGED)
>>>>>         {
>>>>>                 Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
>>>>>                            rel->rd_rel->relkind == RELKIND_MATVIEW ||
>>>>>                            rel->rd_rel->relkind ==
>>>>> RELKIND_TOASTVALUE);
>>>>>                 smgrcreate(srel, INIT_FORKNUM, false);
>>>>>                 log_smgrcreate(newrnode, INIT_FORKNUM);
>>>>>                 smgrimmedsync(srel, INIT_FORKNUM);
>>>>>         }
>>>>
>>>> The comment doesn't make much sense, we haven't written nor WAL-logged
>>>> any page here, with nor without the buffer cache. It made more sense
>>>> before commit fa0f466d53.
>>>
>>> Well, it seems to me (and perhaps I am just confused) that complaining
>>> that there's no page written here might be a technicality. The point
>>> is that there's no synchronization between the work we're doing here
>>> -- which is creating a fork, not writing a page -- and any concurrent
>>> checkpoint. So we both need to log it, and also sync it immediately.
>>
>> I see. I pushed the fix from the other thread that makes smgrcreate()
>> call register_dirty_segment (commit 4b4798e13). I believe that makes
>> this smgrimmedsync() unnecessary. If a concurrent checkpoint happens
>> with a redo pointer greater than this WAL record, it must've received
>> the fsync request created by smgrcreate(). That depends on the fact that
>> we write the WAL record *after* smgrcreate(). Subtle..
>>
>> Hmm, we have a similar smgrimmedsync() call after index build, because
>> we have written pages directly with smgrextend(skipFsync=true). If no
>> checkpoints have occurred during the index build, we could call
>> register_dirty_segment() instead of smgrimmedsync(). That would avoid
>> the fsync() latency when creating an index on an empty or small index.
>>
>> This is all very subtle to get right though. That's why I'd like to
>> invent a new bulk-creation facility that would handle this stuff, and
>> make the callers less error-prone.
>
> Having a more generic and less error-prone bulk-creation mechanism is
> still on my long TODO list..
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-09-01 12:55:35 Re: Commitfest 2023-09 starts soon
Previous Message Jelte Fennema 2023-09-01 12:47:56 Re: Should we use MemSet or {0} for struct initialization?