Re: WAL prefetch

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject: Re: WAL prefetch
Date: 2018-06-19 09:08:27
Message-ID: 27163fe9-fc41-b3de-76b3-a850f1b3c9e7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18.06.2018 23:47, Andres Freund wrote:
> On 2018-06-18 16:44:09 -0400, Robert Haas wrote:
>> On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>> The posix_fadvise approach is not perfect, no doubt about that. But it
>>>> works pretty well for bitmap heap scans, and it's about 13249x better
>>>> (rough estimate) than the current solution (no prefetching).
>>> Sure, but investing in an architecture we know might not live long also
>>> has it's cost. Especially if it's not that complicated to do better.
>> My guesses are:
>>
>> - Using OS prefetching is a very small patch.
>> - Prefetching into shared buffers is a much bigger patch.
> Why? The majority of the work is standing up a bgworker that does
> prefetching (i.e. reads WAL, figures out reads not in s_b, does
> prefetch). Allowing a configurable number + some synchronization between
> them isn't that much more work.

I do not think that prefetching in shared buffers requires much more
efforts and make patch more envasive...
It even somehow simplify it, because there is no to maintain own cache
of prefetched pages...
But it will definitely have much more impact on Postgres performance:
contention for buffer locks, throwing away pages accessed by read-only
queries,...

Also there are two points which makes prefetching into shared buffers
more complex:
1. Need to spawn multiple workers to make prefetch in parallel and
somehow distribute work between them.
2. Synchronize work of recovery process with prefetch to prevent
prefetch to go too far and doing useless job.
The same problem exists for prefetch in OS cache, but here risk of false
prefetch is less critical.

>
>
>> - It'll be five years before we have direct I/O.
> I think we'll have lost a significant market share by then if that's the
> case. Deservedly so.

I have implemented some number of DBMS engines (GigaBASE, GOODS, FastDB,
...) and have supported direct IO (as option) in most of them.
But at most workloads I have not get any significant improvement in
performance.
Certainly, it may be some problem with my implementations... and Linux
kernel is significantly changed since this time.
But there is one "axiom" which complicates usage of direct IO: only OS
knows at each moment of time how much free memory it has.
So only OS can efficiently schedule memory so that all system RAM is
used.\302\240 It is very hard if ever possible to do it at application level.

As a result you will have to be very conservative in choosing size of
shared buffers to fit in RAM and avoid swapping.
It may be possible if you have complete control on the server and there
is just one Postgres instance running at this server.
But now there is a trend towards visualization and clouds and such
assumption is not true in most cases. So double buffering
(or even triple if take in account on-device internal caches) is
definitely an issue. But direct IO seems to be not a silver bullet for
solving it...

Concerning WAL perfetch I still have a serious doubt if it is needed at
all:
if checkpoint interval is less than size of free memory at the system,
then redo process should not read much.
And if checkpoint interval is much larger than OS cache (are there cases
when it is really needed?) then quite small patch (as it seems to me now)
forcing full page write when distance between page LSN and current WAL
insertion point exceeds some threshold should eliminate random reads
also in this case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kato, Sho 2018-06-19 09:11:52 RE: Add function to release an allocated SQLDA
Previous Message Amit Langote 2018-06-19 09:02:22 Re: partition -> partitioned