Re: Cache relation sizes?

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2019-12-31 04:05:31
Message-ID: CA+hUKG+d-9sETQaGfBGbGBOAPS-GjDns_vSMYhDuRW=VsYrzZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 31, 2019 at 4:43 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I still believe that one shared memory element for every
> non-mapped relation is not only too-complex but also too-much, as
> Andres (and implicitly I) wrote. I feel that just one flag for
> all works fine but partitioned flags (that is, relations or files
> corresponds to the same hash value share one flag) can reduce the
> shared memory elements to a fixed small number.

There is one potentially interesting case that doesn't require any
kind of shared cache invalidation AFAICS. XLogReadBufferExtended()
calls smgrnblocks() for every buffer access, even if the buffer is
already in our buffer pool. I tried to make yet another quick
experiment-grade patch to cache the size[1], this time for use in
recovery only.

initdb -D pgdata
postgres -D pgdata -c checkpoint_timeout=60min

In another shell:
pgbench -i -s100 postgres
pgbench -M prepared -T60 postgres
killall -9 postgres
mv pgdata pgdata-save

Master branch:

cp -r pgdata-save pgdata
strace -c -f postgres -D pgdata
[... wait for "redo done", then hit ^C ...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
...
18.61 22.492286 26 849396 lseek
6.95 8.404369 30 277134 pwrite64
6.63 8.009679 28 277892 pread64
0.50 0.604037 39 15169 sync_file_range
...

Patched:

rm -fr pgdata
cp -r pgdata-save pgdata
strace -c -f ~/install/bin/postgres -D pgdata
[... wait for "redo done", then hit ^C ...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
...
16.33 8.097631 29 277134 pwrite64
15.56 7.715052 27 277892 pread64
1.13 0.559648 39 14137 sync_file_range
...
0.00 0.001505 25 59 lseek

> Note: I'm still not sure how much lseek impacts performance.

It doesn't seem great that we are effectively making system calls for
most WAL records we replay, but, sadly, in this case the patch didn't
really make any measurable difference when run without strace on this
Linux VM. I suspect there is some workload and stack where it would
make a difference (CF the read(postmaster pipe) call for every WAL
record that was removed), but this is just something I noticed in
passing while working on something else, so I haven't investigated
much.

[1] https://github.com/postgres/postgres/compare/master...macdice:cache-nblocks
(just a test, unfinished, probably has bugs)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-12-31 06:05:38 Re: Reorderbuffer crash during recovery
Previous Message Tom Lane 2019-12-31 03:50:21 Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno