Re: O_DIRECT for relations and SLRUs (Prototype)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: O_DIRECT for relations and SLRUs (Prototype)
Date: 2019-01-12 21:35:55
Message-ID: CAEepm=01B6YsdMBR9i3K8MyBAAVH1SacTaB4c+skJ3XUc7w+dA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 13, 2019 at 5:13 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> Hi!
>
> > 12 янв. 2019 г., в 9:46, Michael Paquier <michael(at)paquier(dot)xyz> написал(а):
> >
> > Attached is a toy patch that I have begun using for tests in this
> > area. That's nothing really serious at this stage, but you can use
> > that if you would like to see the impact of O_DIRECT. Of course,
> > things get significantly slower.
>
> Cool!
> I've just gathered a group of students to task them with experimenting with shared buffer eviction algorithms during their February internship at Yandex-Sirius edu project. Your patch seems very handy for benchmarks in this area.

+1, thanks for sharing the patch. Even though just turning on
O_DIRECT is the trivial part of this project, it's good to encourage
discussion. We may indeed become more sensitive to the quality of
buffer eviction algorithms, but it seems like the main work to regain
lost performance will be the background IO scheduling piece:

1. We need a new "bgreader" process to do read-ahead. I think you'd
want a way to tell it with explicit hints (for example, perhaps
sequential scans would advertise that they're reading sequentially so
that it starts to slurp future blocks into the buffer pool, and
streaming replicas might look ahead in the WAL and tell it what's
coming). In theory this might be better than the heuristics OSes use
to guess our access pattern and pre-fetch into the page cache, since
we have better information (and of course we're skipping a buffer
layer).

2. We need a new kind of bgwriter/syncer that aggressively creates
clean pages so that foreground processes rarely have to evict (since
that is now super slow), but also efficiently finds ranges of dirty
blocks that it can write in big sequential chunks.

3. We probably want SLRUs to use the main buffer pool, instead of
their own mini-pools, so they can benefit from the above.

Whether we need multiple bgreader and bgwriter processes or perhaps a
general IO scheduler process may depend on whether we also want to
switch to async (multiplexing from a single process). Starting simple
with a traditional sync IO and N processes seems OK to me.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-01-12 23:23:36 Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Previous Message Tom Lane 2019-01-12 20:41:25 Re: Alternative to \copy in psql modelled after \g