Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Ryan Murphy <ryanfmurphy(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait
Date: 2017-01-20 01:45:57
Message-ID: 20170120014557.GB18360@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > WAL replay does do more work, generally speaking (the WAL has to be
> > read, the checksum validated on it, and then the write has to go out,
> > while the checkpointer just writes the page out from memory), but it's
> > also dealing with less contention on the system (there aren't a bunch of
> > backends hammering the disks to pull data in with reads when you're
> > doing crash recovery...).
>
> There's a huge difference though: WAL replay is single threaded, whereas
> generating WAL is not.

I'm aware- but *checkpointing* is still single-threaded, unless, as I
mentioned, you end up with backends pushing out their own changes to the
heap to make room for new pages to come in. Or is there some other way
the checkpoint ends up being performed with multiple processes?

> Especially if there's synchronous IO required
> (most commonly reading in data, because more data was modified in the
> current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> buffers), you can be significantly slower than generating the WAL.

That is an interesting point, if I'm following what you're saying
correctly- during the replay we can end up having more pages modified
than fit in shared buffers, which means that we have to read back in
pages that we pushed out to implement the non-FPI WAL changes to that
page. I wonder if we should have a way to configure the amount of
memory allowed to be used for WAL replay, independent of shared_buffers?

I mean, really, during crash recovery on a dedicated database box, you'd
probably want to say "ALL the memory can be used if it makes crash
recovery faster!". That said, I wonder if our eviction algorithm could
be improved/changed when performing WAL replay too to reduce the chances
that we'll have to read a page back in.

Very interesting.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-01-20 01:54:27 Re: Re: Clarifying "server starting" messaging in pg_ctl start without --wait
Previous Message Peter Geoghegan 2017-01-20 01:45:37 Possible issue with expanded object infrastructure on Postgres 9.6.1