Re: Bypassing shared_buffers

From: Vladimir Churyukin <vladimir(at)churyukin(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Bypassing shared_buffers
Date: 2023-06-15 08:16:31
Message-ID: CAFSGpE2Lt+sdh4mXM48RwJdP0mzgQvpiRGRb6N3sin77KYuu8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 15, 2023 at 12:32 AM Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
wrote:

>
>
> On 15.06.2023 4:37 AM, Vladimir Churyukin wrote:
> > Ok, got it, thanks.
> > Is there any alternative approach to measuring the performance as if
> > the cache was empty?
> > The goal is basically to calculate the max possible I/O time for a
> > query, to get a range between min and max timing.
> > It's ok if it's done during EXPLAIN ANALYZE call only, not for regular
> > executions.
> > One thing I can think of is even if the data in storage might be
> > stale, issue read calls from it anyway, for measuring purposes.
> > For EXPLAIN ANALYZE it should be fine as it doesn't return real data
> > anyway.
> > Is it possible that some pages do not exist in storage at all? Is
> > there a different way to simulate something like that?
> >
>
> I do not completely understand what you want to measure: how fast cache
> be prewarmed or what is the performance
> when working set doesn't fit in memory?
>
>
No, it's not about working set or prewarming speed.
We're trying to see what is the worst performance in terms of I/O, i.e.
when the database just started up or the data/indexes being queried are not
cached at all.

Why not changing `shared_buffers` size to some very small values (i.e.
> 1MB) doesn't work?
>
As it was already noticed, there are levels of caching: shared buffers
> and OS file cache.
> By reducing size of shared buffers you rely mostly on OS file cache.
> And actually there is no big gap in performance here - at most workloads
> I didn't see more than 15% difference).
>

I thought about the option of setting minimal shared_buffers, but it
requires a server restart anyway, something I'd like to avoid.

You can certainly flush OS cache `echo 3 > /proc/sys/vm/drop_caches` and
> so simulate cold start.
> But OS cached will be prewarmed quite fast (unlike shared buffer because
> of strange Postgres ring-buffer strategies which cause eviction of pages
> from shared buffers even if there is a lot of free space).
>
> So please more precisely specify the goal of your experiment.
> "max possible I/O time for a query" depends on so many factors...
> Do you consider just one client working in isolation or there will be
> many concurrent queries and background tasks like autovacuum and
> checkpointer competing for the resources?
>

> My point is that if you need some deterministic result then you will
> have to exclude a lot of different factors which may affect performance
> and then ... you calculate speed of horse in vacuum, which has almost no
> relation to real performance.
>
>
Exactly, we need more or less deterministic results for how bad I/O timings
can be.
Even though it's not necessarily the numbers we will be getting in real
life, it gives us ideas about distribution,
and it's useful because we care about the long tail (p99+) of our queries.
For simplicity let's say it will be a single client only (it will be hard
to do the proposed solutions reliably with other stuff running in parallel
anyway).

-Vladimir Churyukin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2023-06-15 08:30:10 MergeJoin beats HashJoin in the case of multiple hash clauses
Previous Message Sho Kato (Fujitsu) 2023-06-15 08:07:19 Fix a typo in rewriteHandler.c