Re: [Patch] Optimize dropping of relation buffers using dlist

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-07-31 20:23:32
Message-ID: 20200731202332.feicx3miocjfanka@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-07-31 15:50:04 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Indeed. The buffer mapping hashtable already is visible as a major
> > bottleneck in a number of workloads. Even in readonly pgbench if s_b is
> > large enough (so the hashtable is larger than the cache). Not to speak
> > of things like a cached sequential scan with a cheap qual and wide rows.
>
> To be fair, the added overhead is in buffer allocation not buffer lookup.
> So it shouldn't add cost to fully-cached cases. As Tomas noted upthread,
> the potential trouble spot is where the working set is bigger than shared
> buffers but still fits in RAM (so there's no actual I/O needed, but we do
> still have to shuffle buffers a lot).

Oh, right, not sure what I was thinking.

> > Wonder if the temporary fix is just to do explicit hashtable probes for
> > all pages iff the size of the relation is < s_b / 500 or so. That'll
> > address the case where small tables are frequently dropped - and
> > dropping large relations is more expensive from the OS and data loading
> > perspective, so it's not gonna happen as often.
>
> Oooh, interesting idea. We'd need a reliable idea of how long the
> relation had been (preferably without adding an lseek call), but maybe
> that's do-able.

IIRC we already do smgrnblocks nearby, when doing the truncation (to
figure out which segments we need to remove). Perhaps we can arrange to
combine the two? The layering probably makes that somewhat ugly :(

We could also just use pg_class.relpages. It'll probably mostly be
accurate enough?

Or we could just cache the result of the last smgrnblocks call...

One of the cases where this type of strategy is most intersting to me is
the partial truncations that autovacuum does... There we even know the
range of tables ahead of time.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-07-31 20:28:34 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Previous Message James Coleman 2020-07-31 20:18:54 Re: Nicer error when connecting to standby with hot_standby=off