Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Saurabh Nanda <saurabhnanda(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
Date: 2019-01-29 15:42:12
Message-ID: CAMkU=1xjTvQAN5034t=DxHdW9TwtisPGTzSO_BaJAAacpTX=fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jan 28, 2019 at 12:03 AM Saurabh Nanda <saurabhnanda(at)gmail(dot)com>
wrote:

> All this benchmarking has led me to a philosophical question, why does PG
> need shared_buffers in the first place?
>

PostgreSQL cannot let the OS get its hands on a dirty shared buffer until
the WAL record "protecting" that buffer has been flushed to disk. If a
dirty shared buffer got written to disk, but then a crash happened before
the WAL record go flushed to disk, then the data could be corrupted when it
comes back up. So shared_buffers effectively serves as cooling pond where
dirty buffers wait for their WAL to be flushed naturally so they can be
written without instigating a performance-reducing flush just for them.

Also, concurrent clients needs to access the same disk pages at overlapping
times without corrupting each other. Perhaps that could be implemented to
have just the buffer headers in shared memory to coordinate the locking,
and not having the buffers themselves in shared memory. But that is not
how it is currently implemented.

> What's wrong with letting the OS do the caching/buffering?
>

Nothing, and that is what it does. Which is why the advice for
shared_buffers is often to use a small fraction of RAM, leaving the rest
for the OS to do its thing. But PostgreSQL still needs a way to lock those
pages, both against concurrent access by its own clients, and against
getting flushed out of order by the OS. There is no performant way to
release the dirty pages immediately to the OS while still constraining the
order in which the OS flushes them to disk.

Finally, while reading a page from the OS cache into shared_buffers is much
faster than reading it from disk, it is still much slower than finding it
already located in shared_buffers. So if your entire database fits in RAM,
you will get better performance if shared_buffers is large enough for the
entire thing to fit in there, as well. This is an exception to the rule
that shared_buffers should be a small fraction of RAM.

> Isn't it optimised for this kind of stuff?
>

Maybe. But you might be surprised at poorly optimized it is. It depends
on your OS and version of it, of course. If you have a high usage_count
buffer which is re-dirtied constantly, it will only get written and flushed
to disk once per checkpoint if under PostgreSQL control. But I've seen
pages like that get written many times per second under kernel control.
Whatever optimization it tried to do, it wasn't very good at. Also, if
many contiguous pages are dirtied in a close time-frame, but not dirtied in
their physical order, the kernel should be able to re-order them into long
sequential writes, correct? But empirically, it doesn't, at least back in
the late 2.* series kernels when I did the experiments. I don't know if it
didn't even try, or tried but failed. (Of course back then, PostgreSQL
didn't do a good job of it either)

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Bob Jolliffe 2019-01-29 15:42:57 Re: Interpreting shared_buffers setting
Previous Message Alvaro Herrera 2019-01-29 15:12:18 Re: pg_locks - what is a virtualxid locktype