Re: [Patch] Optimize dropping of relation buffers using dlist

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-08-07 16:52:26
Message-ID: 2758009.1596819146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Aug 7, 2020 at 12:09 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... it's not very clear what to do in
>> backends that haven't got access to that table, but maybe we could just
>> accept that backends that are forced to flush dirty buffers might do some
>> useless writes in such cases.

> I don't see how that can work. It's not that the writes are useless;
> it's that they will fail outright because the file doesn't exist.

At least in the case of segment zero, the file will still exist. It'll
have been truncated to zero length, and if the filesystem is stupid about
holes in files then maybe a write to a high block number would consume
excessive disk space, but does anyone still care about such filesystems?
I don't remember at the moment how we handle higher segments, but likely
we could make them still exist too, postponing all the unlinks till after
checkpoint. Or we could just have the backends give up on recycling a
particular buffer if they can't write it (which is the response to an I/O
failure already, I hope).

> My viewpoint on this is - I have yet to see anybody really get hosed
> because they drop one relation and that causes a full scan of
> shared_buffers. I mean, it's slightly expensive, but computers are
> fast. Whatever. What hoses people is dropping a lot of relations in
> quick succession, either by spamming DROP TABLE commands or by running
> something like DROP SCHEMA, and then suddenly they're scanning
> shared_buffers over and over again, and their standby is doing the
> same thing, and now it hurts.

Yeah, trying to amortize the cost across multiple drops seems like
what we really want here. I'm starting to think about a "relation
dropper" background process, which would be somewhat like the checkpointer
but it wouldn't have any interest in actually doing buffer I/O.
We'd send relation drop commands to it, and it would scan all of shared
buffers and flush related buffers, and then finally do the file truncates
or unlinks. Amortization would happen by considering multiple target
relations during any one scan over shared buffers. I'm not very clear
on how this would relate to the checkpointer's handling of relation
drops, but it could be worked out; if we were lucky maybe the checkpointer
could stop worrying about that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-07 16:56:01 Re: Parallel worker hangs while handling errors.
Previous Message Robert Haas 2020-08-07 16:29:03 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks