Re: [Patch] Optimize dropping of relation buffers using dlist

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-08-07 17:33:43
Message-ID: CA+TgmoaqsayB8p2+tT1QPKv-nbNkeeiRShi=PCz6yWamC6Yoeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 7, 2020 at 12:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> At least in the case of segment zero, the file will still exist. It'll
> have been truncated to zero length, and if the filesystem is stupid about
> holes in files then maybe a write to a high block number would consume
> excessive disk space, but does anyone still care about such filesystems?
> I don't remember at the moment how we handle higher segments, but likely
> we could make them still exist too, postponing all the unlinks till after
> checkpoint. Or we could just have the backends give up on recycling a
> particular buffer if they can't write it (which is the response to an I/O
> failure already, I hope).

None of this sounds very appealing. Postponing the unlinks means
postponing recovery of the space at the OS level, which I think will
be noticeable and undesirable for users. The other notions all seem to
involve treating as valid on-disk states we currently treat as
invalid, and our sanity checks in this area are already far too weak.
And all you're buying for it is putting a hash table that would
otherwise be shared memory into backend-private memory, which seems
like quite a minor gain. Having that information visible to everybody
seems a lot cleaner.

> Yeah, trying to amortize the cost across multiple drops seems like
> what we really want here. I'm starting to think about a "relation
> dropper" background process, which would be somewhat like the checkpointer
> but it wouldn't have any interest in actually doing buffer I/O.
> We'd send relation drop commands to it, and it would scan all of shared
> buffers and flush related buffers, and then finally do the file truncates
> or unlinks. Amortization would happen by considering multiple target
> relations during any one scan over shared buffers. I'm not very clear
> on how this would relate to the checkpointer's handling of relation
> drops, but it could be worked out; if we were lucky maybe the checkpointer
> could stop worrying about that.

I considered that, too, but it might be overkill. I think that one
scan of shared_buffers every now and then might be cheap enough that
we could just not worry too much about which process gets stuck doing
it. So for example if the number of buffers allocated since the hash
table ended up non-empty reaches NBuffers, the process wanting to do
the next eviction gets handed the job of cleaning it out. Or maybe the
background writer could help; it's not like it does much anyway, zing.
It's possible that a dedicated process is the right solution, but we
might want to at least poke a bit at other alternatives.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-07 17:37:59 Re: Parallel worker hangs while handling errors.
Previous Message Tom Lane 2020-08-07 17:12:06 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks