Re: Cache relation sizes?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2020-02-03 13:23:19
Message-ID: 20200203132319.x7my43whtefeznz7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-12-31 17:05:31 +1300, Thomas Munro wrote:
> There is one potentially interesting case that doesn't require any
> kind of shared cache invalidation AFAICS. XLogReadBufferExtended()
> calls smgrnblocks() for every buffer access, even if the buffer is
> already in our buffer pool.

Yea, that's really quite bad*. The bit about doing so even when already
in the buffer pool is particularly absurd. Needing to have special
handling in mdcreate() for XLogReadBufferExtended() always calling it is
also fairly ugly.

> It doesn't seem great that we are effectively making system calls for
> most WAL records we replay, but, sadly, in this case the patch didn't
> really make any measurable difference when run without strace on this
> Linux VM. I suspect there is some workload and stack where it would
> make a difference (CF the read(postmaster pipe) call for every WAL
> record that was removed), but this is just something I noticed in
> passing while working on something else, so I haven't investigated
> much.

I wonder if that's just because your workload is too significantly
bottlenecked elsewhere:

> postgres -D pgdata -c checkpoint_timeout=60min

> In another shell:
> pgbench -i -s100 postgres
> pgbench -M prepared -T60 postgres
> killall -9 postgres
> mv pgdata pgdata-save

With scale 100, but the default shared_buffers, you'll frequently hit
the OS for reads/writes. Which will require the same metadata in the
kernel, but then also memcpys between kernel and userspace.

A word of caution about strace's -c: In my experience the total time
measurements are very imprecise somehow. I think it might be that some
of the overhead of ptracing will be attributed to the syscalls or such,
which means frequent syscalls appear relatively more expensive than they
really are.

Greetings,

Andres Freund

* it insults my sense of aesthetics

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseny Sher 2020-02-03 13:46:05 Re: ERROR: subtransaction logged without previous top-level txn record
Previous Message Peter Eisentraut 2020-02-03 13:19:11 Re: Brokenness in dump/restore for GENERATED expressions