Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: noah(at)leadboat(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-11-28 11:56:20
Message-ID: 20191128.205620.2015649987051831334.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 26 Nov 2019 21:37:52 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail> Is is not fully checked. I didn't merged and mesured performance yet,
> but I post the status-quo patch for now.

It was actually inconsistency caused by swap_relation_files.

1. rd_createSubid of relcache for r2 is not turned off. This prevents
the relcache entry from flushed. Commit processes pendingSyncs and
leaves the relcache entry with rd_createSubid != Invalid. It is
inconsistency.

2. relation_open(r1) returns a relcache entry with its relfilenode has
the old value (relfilenode1) since command counter has not been
incremented. On the other hand if it is incremented just before,
AssertPendingSyncConsistency() aborts because of the inconsistency
between relfilenode and rd_firstRel*.

As the result, I returned to think that we need to modify both
relcache entries with right relfilenode.

I once thought that taking AEL in the function has no side effect but
the code path is executed also when wal_level = replica or higher. And
as I mentioned upthread, we can even get there without taking any lock
on r1 or sometimes ShareLock. So upgrading to AEL emits Standby/LOCK
WAL and propagates to standby. After all I'd like to take the weakest
lock (AccessShareLock) there.

The attached is the new version of the patch.

- v26-0001-version-nm24.patch
Same with v24

- v26-0002-change-swap_relation_files.patch

Changes to swap_relation_files as mentioned above.

- v26-0003-Improve-the-performance-of-relation-syncs.patch

Do multiple pending syncs by one shared_buffers scanning.

- v26-0004-Revert-FlushRelationBuffersWithoutRelcache.patch

v26-0003 makes the function useless. Remove it.

- v26-0005-Fix-gistGetFakeLSN.patch

gistGetFakeLSN fix.

- v26-0006-Sync-files-shrinked-by-truncation.patch

Fix the problem of commit-time-FPI after truncation after checkpoint.
I'm not sure this is the right direction but pendingSyncHash is
removed from pendingDeletes list again.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v26-0001-version-nm24.patch text/x-patch 70.5 KB
v26-0002-change-swap_relation_files.patch text/x-patch 2.4 KB
v26-0003-Improve-the-performance-of-relation-syncs.patch text/x-patch 8.5 KB
v26-0004-Revert-FlushRelationBuffersWithoutRelcache.patch text/x-patch 3.3 KB
v26-0005-Fix-gistGetFakeLSN.patch text/x-patch 5.7 KB
v26-0006-Sync-files-shrinked-by-truncation.patch text/x-patch 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-11-28 12:35:08 Re: [HACKERS] WAL logging problem in 9.4.3?
Previous Message Jinbao Chen 2019-11-28 11:18:57 Re: Planner chose a much slower plan in hashjoin, using a large table as the inner table.