Re: [Patch] Optimize dropping of relation buffers using dlist

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
Cc: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-10-22 03:35:27
Message-ID: CA+hUKGKNDZy0Ni9+WC5DhwHM8fa54ExXmfYLq0NX9t4g1UDxmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 22, 2020 at 3:07 PM k(dot)jamison(at)fujitsu(dot)com
<k(dot)jamison(at)fujitsu(dot)com> wrote:
+ /*
+ * Get the total number of to-be-invalidated blocks of a relation as well
+ * as the total blocks for a given fork. The cached value returned by
+ * smgrnblocks could be smaller than the actual number of existing buffers
+ * of the file. This is caused by buggy Linux kernels that might not have
+ * accounted for the recent write. Give up the optimization if the block
+ * count of any fork cannot be trusted.
+ */
+ for (i = 0; i < nforks; i++)
+ {
+ /* Get the number of blocks for a relation's fork */
+ nForkBlocks[i] = smgrnblocks(smgr_reln, forkNum[i], &accurate);
+
+ if (!accurate)
+ break;

Hmmm. The Linux comment led me to commit ffae5cc and a 2006 thread[1]
showing a buggy sequence of system calls. AFAICS it was not even an
SMP/race problem of the type you might half expect, it was a single
process not seeing its own write. I didn't find details on the
version, filesystem etc.

Searching for our message "This has been seen to occur with buggy
kernels; consider updating your system" turns up recent-ish results
too. The reports I read involved GlusterFS, which I don't personally
know anything about, but it claims full POSIX compliance, and POSIX is
strict about that sort of thing, so I'd guess that is/was a fairly
serious bug or misconfiguration. Surely there must be other symptoms
for PostgreSQL on such systems too, like sequential scans that don't
see recently added pages.

But... does the proposed caching behaviour and "accurate" flag really
help with any of that? Cached values come from lseek() anyway. If we
just trusted unmodified smgrnblocks(), someone running on such a
forgetful file system might eventually see nasty errors because we
left buffers in the buffer pool that prevent a checkpoint from
completing (and panic?), but they might also see other really strange
errors, and that applies with or without that "accurate" flag, no?

[1] https://www.postgresql.org/message-id/flat/26202.1159032931%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-10-22 03:45:08 Re: new heapcheck contrib module
Previous Message Yugo NAGATA 2020-10-22 03:21:26 Re: Implementing Incremental View Maintenance