Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait

From: Andres Freund <andres(at)anarazel(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Ryan Murphy <ryanfmurphy(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait
Date: 2017-01-20 01:54:27
Message-ID: 20170120015427.z7kv5avhju56hr7a@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-01-19 20:45:57 -0500, Stephen Frost wrote:
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > > WAL replay does do more work, generally speaking (the WAL has to be
> > > read, the checksum validated on it, and then the write has to go out,
> > > while the checkpointer just writes the page out from memory), but it's
> > > also dealing with less contention on the system (there aren't a bunch of
> > > backends hammering the disks to pull data in with reads when you're
> > > doing crash recovery...).
> >
> > There's a huge difference though: WAL replay is single threaded, whereas
> > generating WAL is not.
>
> I'm aware- but *checkpointing* is still single-threaded, unless, as I
> mentioned, you end up with backends pushing out their own changes to the
> heap to make room for new pages to come in.

Sure, but buffer checkpointing isn't necessarily that large a portion of
the work done in one checkpoint cycle, in comparison to all the WAL
being generated. Quite commonly a lot of the buffers will already have
been flushed to disk by backend and/or bgwriter, and are clean by the
time checkpointer gets to them. So I don't think checkpointer being
single threaded necessarily means much WRT replay performance.

> > Especially if there's synchronous IO required
> > (most commonly reading in data, because more data was modified in the
> > current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> > buffers), you can be significantly slower than generating the WAL.
>
> That is an interesting point, if I'm following what you're saying
> correctly- during the replay we can end up having more pages modified
> than fit in shared buffers, which means that we have to read back in
> pages that we pushed out to implement the non-FPI WAL changes to that
> page.

Right. (And not just during replay obviously, also during the intial WAL
generation).

> I wonder if we should have a way to configure the amount of memory
> allowed to be used for WAL replay, independent of shared_buffers?

I don't quite see how that'd work, especially with HS. We just use the
normal shared buffers code etc, and there we can't just resize the
amount of shared_buffers allocated after doing crash recovery.

> That said, I wonder if our eviction algorithm could be
> improved/changed when performing WAL replay too to reduce the chances
> that we'll have to read a page back in.

I don't think that's a that promising angle of attach. Having a separate
pre-fetching backend that parses the WAL and pre-reads everything
necessary seems more promising.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-01-20 01:56:17 Re: pg_hba_file_settings view patch
Previous Message Stephen Frost 2017-01-20 01:45:57 Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait