Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Ryan Murphy <ryanfmurphy(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait
Date: 2017-01-20 01:59:10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2017-01-19 20:45:57 -0500, Stephen Frost wrote:
> > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > > > WAL replay does do more work, generally speaking (the WAL has to be
> > > > read, the checksum validated on it, and then the write has to go out,
> > > > while the checkpointer just writes the page out from memory), but it's
> > > > also dealing with less contention on the system (there aren't a bunch of
> > > > backends hammering the disks to pull data in with reads when you're
> > > > doing crash recovery...).
> > >
> > > There's a huge difference though: WAL replay is single threaded, whereas
> > > generating WAL is not.
> >
> > I'm aware- but *checkpointing* is still single-threaded, unless, as I
> > mentioned, you end up with backends pushing out their own changes to the
> > heap to make room for new pages to come in.
> Sure, but buffer checkpointing isn't necessarily that large a portion of
> the work done in one checkpoint cycle, in comparison to all the WAL
> being generated. Quite commonly a lot of the buffers will already have
> been flushed to disk by backend and/or bgwriter, and are clean by the
> time checkpointer gets to them. So I don't think checkpointer being
> single threaded necessarily means much WRT replay performance.

Yes, good point, we also have the bgwriter going through and helping.

> > > Especially if there's synchronous IO required
> > > (most commonly reading in data, because more data was modified in the
> > > current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> > > buffers), you can be significantly slower than generating the WAL.
> >
> > That is an interesting point, if I'm following what you're saying
> > correctly- during the replay we can end up having more pages modified
> > than fit in shared buffers, which means that we have to read back in
> > pages that we pushed out to implement the non-FPI WAL changes to that
> > page.
> Right. (And not just during replay obviously, also during the intial WAL
> generation).


> > I wonder if we should have a way to configure the amount of memory
> > allowed to be used for WAL replay, independent of shared_buffers?
> I don't quite see how that'd work, especially with HS. We just use the
> normal shared buffers code etc, and there we can't just resize the
> amount of shared_buffers allocated after doing crash recovery.

It wouldn't work with HS (or, at least, I have no idea how it would). I
was specifically thinking about *just* during crash recovery there
(sorry that I didn't make that clear), and my thought was that we'd just
allocate the memory locally, not as shared memory, and then drop the
whole thing and allocate shared_buffers after crash recovery was done.

Obviously, this is a lot of hand-waving, but that's what I was

> > That said, I wonder if our eviction algorithm could be
> > improved/changed when performing WAL replay too to reduce the chances
> > that we'll have to read a page back in.
> I don't think that's a that promising angle of attach. Having a separate
> pre-fetching backend that parses the WAL and pre-reads everything
> necessary seems more promising.

I agree, that would be helpful and could help with HS too, which I agree
is an important piece.



In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-01-20 01:59:37 Re: Parallel Index Scans
Previous Message Haribabu Kommi 2017-01-20 01:56:17 Re: pg_hba_file_settings view patch