Re: posix_fadvsise in base backups

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: posix_fadvsise in base backups
Date: 2011-09-24 21:26:16
Message-ID: CAF6yO=0xcROb-Wz53GdC8=Xu=fiMRU3GQvMOLEcVAqhDzqdCmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/9/24 Andres Freund <andres(at)anarazel(dot)de>:
> Hi,
>
> On Saturday, September 24, 2011 05:16:48 PM Magnus Hagander wrote:
>> On Sat, Sep 24, 2011 at 17:14, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On Saturday, September 24, 2011 05:08:17 PM Magnus Hagander wrote:
>> >> Attached patch adds a simple call to posix_fadvise with
>> >> POSIX_FADV_DONTNEED on all the files being read when doing a base
>> >> backup, to help the kernel not to trash the filesystem cache.
>> >> Seems like a simple enough fix - in fact, I don't remember why I took
>> >> it out of the original patch :O
>> >> Any reason not to put this in? Is it even safe enough to put into 9.1
>> >> (probably not, but maybe?)
>> > Won't that possibly throw a formerly fully cached database out of the
>> > cache?
>> I was assuming the kernel was smart enough to read this as "*this*
>> process is not going to be using this file anymore", not "nobody in
>> the whole machine is going to use this file anymore". And the process
>> running the base backup is certainly not going to read it again.
>> But that's a good point - do you know if that is the case, or does it
>> mandate more testing?
> I am pretty but not totally sure that the kernel does not track each process
> that uses a page. For one doing so would probably prohibitively expensive. For
> another I am pretty (but not ...) sure that I restructured an application not
> to fadvise(DONTNEED) memory that is also used in other processes.

DONTNEED will remove pages from cache. It may happens that it doesn't
(DONTNEED, WILLNEED are just flags, but DONTNEED is honored most of
the time)
You can either readahead the mincore status of a page to decide if you
need to remove it after (this is what some modified dd are doing).
You can also use pgfincore to work before/after basebackup to revcover
the previous state of the page cache.
There are some ideas floating around pgfincore to do seqscan (pg_dump)
with less impact on the page cache this way. (probably possible with
ExecStart/Stop hooks)

>
> Currently I can only think of to workarounds, both os specific:
> - Use O_DIRECT for reading the base backup. Will be slow in fully cached
> situations, but should work ok enough in all others. Need to be carefull about
> the usual O_DIRECT pitfalls (pagesize, alignment etcetera).
> - use mmap/mincore() to gather whether data is in cache and restore that state
> afterwards.
>
> Too bad that POSIX_FADV_NOREUSE is not really implemented.

yes.

>
>
> Andres
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2011-09-24 21:29:09 Re: WIP: SP-GiST, Space-Partitioned GiST
Previous Message Jeff Janes 2011-09-24 21:18:19 Re: psql setenv command