RE: Speed up the removal of WAL files

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Speed up the removal of WAL files
Date: 2018-02-21 07:20:00
Message-ID: 0A3221C70F24FB45833433255569204D1F8D0CB7@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Fujii Masao [mailto:masao(dot)fujii(at)gmail(dot)com]
> On Fri, Nov 17, 2017 at 5:20 PM, Tsunakawa, Takayuki
> <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> > Yes, I noticed it after submitting the patch and was wondering what to
> do. Thinking simply, I think it would be just enough to replace
> durable_unlink/durable_rename in RemoveXLogFile() with unlink/rename, and
> sync the pg_wal directory once in RemoveNonParentXlogFiles() and
> RemoveOldXlogFiles(). This will benefit the failover time when fast
> promotion is not performed. What do you think?
>
> It seems not good idea to just replace durable_rename() with rename() in
> RemoveOldXlogFiles()->RemoveXlogFiles()->InstallXLogFileSegment().
> Because that change seems to be able to cause the following problem.
>
> 1. Checkpoint calls RemoveOldXlogFiles().
> 2. It recycles the WAL file AAA to BBB. pg_wal directory has not fsync'd
> yet.
> 3. Another transaction (TX1) writes its WAL data into the (recycled) file
> BBB.
> 4. CRASH and RESTART
> 5. The WAL file BBB disappears and you can see AAA,
> but AAA is not used in recovery. This causes data loss of transaction
> by Tx1.

Right. Then I thought of doing the following to avoid making a new function only for RemoveNonParentXlogFiles() which is similar to RemoveXlogFile().

* Add an argument "bool durable" to RemoveXlogFile(). Based on its value, RemoveXlogFile() calls either durable_xx() or xx().

* Likewise, add an argument "bool durable" to InstallXLogFileSegment() and do the same.

* RemoveNonParentXlogFiles() calls RemoveXlogFile() with durable=false. At the end of the function, sync the pg_wal directory with fsync_fname().

* RemoveOldXlogFiles() does the same thing. One difference is that it passes false to RemoveXlogFile() during recovery (InRecovery == true) and true otherwise.

But this requires InstallXLogFileSegment() to also have an argument "bool durable." That means InstallXLogFileSegment() and RemoveXlogFile() have to include something like below in several places, which is dirty:

if (durable)
rc = durable_xx(path, elevel);
else
rc = xx(path);
if (rc != 0)
{
/* durable_xx() output the message */
if (!durable)
ereport(elevel, ...);
}

Do you think of a cleaner way?

From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
> The orinal code recycles some of the to-be-removed files, but the
> patch removes all the victims. This may impact on performance.

At most only 10 WAL files are recycled. If there's no good solution to the above matter, can we compromise with the current patch?

if (PriorRedoPtr == InvalidXLogRecPtr)
recycleSegNo = endlogSegNo + 10;

> Likewise the original code is using durable_unlink to actually
> remove a file so separating unlink and fsync might resurrect the
> problem that should have been fixed by
> 1b02be21f271db6bd3cd43abb23fa596fcb6bac3 (I'm not sure what it
> was but you are one of the reviwers of it). I suppose that you
> need to explain the reason why this change doesn't risk anything.

If the host crashes while running RemoveNonParentXlogFiles() and some effects of the function are lost, the next recovery will do them again.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-21 07:35:50 Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs
Previous Message Thomas Munro 2018-02-21 07:17:54 Re: Hash Joins vs. Bloom Filters / take 2