Re: [HACKERS] WAL logging problem in 9.4.3?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: noah(at)leadboat(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2020-03-23 08:20:27
Message-ID: 20200323.172027.2270553329883636814.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the labour on this.

At Sat, 21 Mar 2020 15:49:20 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in
> On Sat, Mar 21, 2020 at 12:01:27PM -0700, Noah Misch wrote:
> > Pushed, after adding a missing "break" to gist_identify() and tweaking two
..
> The proximate cause is the RelFileNodeSkippingWAL() call that we added to
> MarkBufferDirtyHint(). MarkBufferDirtyHint() runs in parallel workers, but
> parallel workers have zeroes for pendingSyncHash and rd_*Subid. I hacked up
> the attached patch to understand the scope of the problem (not to commit). It
> logs a message whenever a parallel worker uses pendingSyncHash or
> RelationNeedsWAL(). Some of the cases happen often enough to make logs huge,
> so the patch suppresses logging for them. You can see the lower-volume calls
> like this:
>
> printf '%s\n%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' 'force_parallel_mode = regress' >/tmp/minimal_parallel.conf
> make check-world TEMP_CONFIG=/tmp/minimal_parallel.conf
> find . -name log | xargs grep -rl 'nm0 invalid'
>
> Not all are actual bugs. For example, get_relation_info() behaves fine:
>
> /* Temporary and unlogged relations are inaccessible during recovery. */
> if (!RelationNeedsWAL(relation) && RecoveryInProgress())

But the relcache entry shows wrong information about new-ness of its
storage and it is the root cause of the all other problems.

> Kyotaro, can you look through the affected code and propose a strategy for
> good coexistence of parallel query with the WAL skipping mechanism?

Bi-directional communication between leader and workers is too-much.
It wouldn't be acceptable to inhibit the problematic operations on
workers such like heap-prune or btree pin removal. If we do pending
syncs just before worker start, it won't fix the issue.

The attached patch passes a list of pending-sync relfilenodes at
worker start. Workers create (immature) pending sync hash from the
list and create relcache entries using the hash. Given that parallel
workers don't perform transactional operations and DDL operations,
workers needs only the list of relfilenodes. The list might be long,
but I don't think it realistic that such many tables are truncated or
created then scanned in parallel within a transaction while wal_level
= minimal.

> Since I don't expect one strategy to win clearly and quickly, I plan to revert
> the main patch around 2020-03-22 17:30 UTC. That will give the patch about
> twenty-four hours in the buildfarm, so more animals can report in. I will
> leave the three smaller patches in place.

Thank you for your trouble and the check code. Sorry for not
responding in time.

> > fairywren failed differently on 9.5; I have not yet studied it:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2020-03-21%2018%3A01%3A10
>
> This did not remain specific to 9.5. On platforms where SIZEOF_SIZE_T==4 or
> SIZEOF_LONG==4, wal_skip_threshold cannot exceed 2GB. A simple s/1TB/1GB/ in
> the test should fix this.

Oops. I felt that the 2TB looks too large but didn't get it
seriously. 1GB is 1048576 is less than the said limit 2097151 so the
attached second patch does that.

The attached is a proposal of fix of the issue on top of the reverted
commit.

- v36-0001-Skip-WAL-for-new-relfilenodes-under-wal_level-mi.patch
The reverted patch.

- v36-0002-Fix-GUC-value-in-TAP-test.patch
Change wal_skip_threashold to 2TB to 2GB in TAP test.

v36-0003-Fix-the-name-of-struct-pendingSyncs.patch
I found that the struct of pending sync hash entry is named
differently way from pending delete hash entry. Change it so that the
two are in similarly naming convention.

v36-0004-Propagage-pending-sync-information-to-parallel-w.patch

The proposed fix for the parallel-worker problem.

The make check-world above didn't fail with this patch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v36-0001-Skip-WAL-for-new-relfilenodes-under-wal_level-mi.patch text/x-patch 116.4 KB
v36-0002-Fix-GUC-value-in-TAP-test.patch text/x-patch 952 bytes
v36-0003-Fix-the-name-of-struct-pendingSyncs.patch text/x-patch 2.4 KB
v36-0004-Propagage-pending-sync-information-to-parallel-w.patch text/x-patch 9.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2020-03-23 08:21:42 Re: Additional improvements to extended statistics
Previous Message Kyotaro Horiguchi 2020-03-23 08:17:13 Re: shared-memory based stats collector