Re: [HACKERS] WAL logging problem in 9.4.3?

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: sfrost(at)snowman(dot)net, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)2ndquadrant(dot)com, david(at)pgmasters(dot)net, hlinnaka(at)iki(dot)fi, simon(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, masao(dot)fujii(at)gmail(dot)com, kleptog(at)svana(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2018-07-04 04:59:12
Message-ID: 20180704045912.GG1672@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 30, 2018 at 10:06:46AM +0900, Kyotaro HORIGUCHI wrote:
> Hello. I found that c203d6cf81 hit this and this is the rebased
> version on the current master.

Okay, as this is visibly the oldest item in this commit fest, Andrew has
asked me to look at a solution which would allow us to definitely close
the loop for all maintained branches. In consequence, I have been
looking at this problem. Here are my thoughts:
- The set of errors reported on this thread are alarming, depending on
the scenarios used, we could have "could not read file" stuff, or even
data loss after WAL replay comes and wipes out everything.
- Disabling completely the TRUNCATE optimization is definitely not cool,
as there could be an impact for users.
- Removing wal_level = minimal is not acceptable as well, as some people
rely on this feature.
- Rewriting the sync handling of heap relation files in an invasive way
may be something to investigate and improve on HEAD (I am not really
convinced about that actually for the optimizations discussed on this
thread as this may result in more bugs than actual fixes), but that
would do nothing for back-branches.

Hence I propose the patch attached which disables the TRUNCATE and COPY
optimizations for two cases, which are the ones actually causing
problems. One solution has been presented by Simon here for COPY, which
is to disable the optimization when there are no blocks on a relation
with wal_level = minimal:
https://www.postgresql.org/message-id/CANP8+jKN4V4MJEzFN_iEtdZ+1oM=YETxvmuu1YK4UMXQY2gaGw@mail.gmail.com
For back-patching, I find that really appealing.

The second thing that the patch attached does is to tweak
ExecuteTruncateGuts so as the TRUNCATE optimization never runs for
wal_level = minimal.

Another thing that this patch adds is a set of regression tests to
stress all the various scenarios presented on this thread with table
creation, INSERT, COPY and TRUNCATE running in the same transactions for
both wal_level = minimal and replica, which make sure that there are no
failures and no actual data loss. The test is useful anyway, as any
patch presented did not present a way to test easily all the scenarios,
except for a bash script present upthread, but this discarded some of
the cases.

I would propose that for a back-patch, except for the test which can go
down easily to 9.6 but I have not tested that yet.

Thoughts?
--
Michael

Attachment Content-Type Size
wal-minimal-copy-truncate.patch text/x-diff 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2018-07-04 05:49:31 Legacy GiST invalid tuples
Previous Message Andres Freund 2018-07-04 04:35:57 Re: Cache invalidation after authentication (on-the-fly role creation)