Re: [HACKERS] WAL logging problem in 9.4.3?

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, 9erthalion6(at)gmail(dot)com, andrew(dot)dunstan(at)2ndquadrant(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, michael(at)paquier(dot)xyz
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Date: 2019-08-26 05:08:43
Message-ID: 20190826050843.GB3153606@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 22, 2019 at 09:06:06PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 19 Aug 2019 23:03:14 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190820060314(dot)GA3086296(at)rfd(dot)leadboat(dot)com>
> > On Mon, Aug 19, 2019 at 06:59:59PM +0900, Kyotaro Horiguchi wrote:
> > > At Sat, 17 Aug 2019 20:52:30 -0700, Noah Misch <noah(at)leadboat(dot)com> wrote in <20190818035230(dot)GB3021338(at)rfd(dot)leadboat(dot)com>
> > > > The https://postgr.es/m/559FA0BA.3080808@iki.fi design had another component
> > > > not appearing here. It said, "Instead, at COMMIT, we'd fsync() the relation,
> > > > or if it's smaller than some threshold, WAL-log the contents of the whole file
> > > > at that point." Please write the part to WAL-log the contents of small files
> > > > instead of syncing them.
> > >
> > > I'm not sure the point of the behavior. I suppose that the "log"
> > > is a sequence of new_page records. It also needs to be synced and
> > > it is always larger than the file to be synced. I can't think of
> > > an appropriate threshold without the point.
> >
> > Yes, it would be a sequence of new-page records. FlushRelationBuffers() locks
> > every buffer header containing a buffer of the current database. The belief
> > has been that writing one page to xlog is cheaper than FlushRelationBuffers()
> > in a busy system with large shared_buffers.
>
> I'm at a loss.. The decision between WAL and sync is made at
> commit time, when we no longer have a pin on a buffer. When
> emitting WAL, opposite to the assumption, lock needs to be
> re-acquired for every page to emit log_new_page. What is worse,
> we may need to reload evicted buffers. If the file has been
> CopyFrom'ed, ring buffer strategy makes the situnation farther
> worse. That doesn't seem cheap at all..

Consider a one-page relfilenode. Doing all the things you list for a single
page may be cheaper than locking millions of buffer headers.

> If there were any chance on WAL for smaller files here, it would
> be on the files smaller than the ring size of bulk-write
> strategy(16MB).

Like you, I expect the optimal threshold is less than 16MB, though you should
benchmark to see. Under the ideal threshold, when a transaction creates a new
relfilenode just smaller than the threshold, that transaction will be somewhat
slower than it would be if the threshold were zero. Locking every buffer
header causes a distributed slow-down for other queries, and protecting the
latency of non-DDL queries is typically more useful than accelerating
TRUNCATE, CREATE TABLE, etc. Writing more WAL also slows down other queries;
beyond a certain relfilenode size, the extra WAL harms non-DDL queries more
than the buffer scan harms them. That's about where the threshold should be.

This should be GUC-controlled, especially since this is back-patch material.
We won't necessarily pick the best value on the first attempt, and the best
value could depend on factors like the filesystem, the storage hardware, and
the database's latency goals. One could define the GUC as an absolute size
(e.g. 1MB) or as a ratio of shared_buffers (e.g. GUC value of 0.001 means the
threshold is 1MB when shared_buffers is 1GB). I'm not sure which is better.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Moon, Insung 2019-08-26 05:47:25 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Michael Paquier 2019-08-26 04:48:40 Re: Re: Email to hackers for test coverage