Re: [Patch] Optimize dropping of relation buffers using dlist

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-08-08 10:32:27
Message-ID: CAA4eK1JDUUXUJf41ah7gnzgZK+dszk0FheLk=npjdFg5TQJLAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 7, 2020 at 11:03 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Aug 7, 2020 at 12:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > At least in the case of segment zero, the file will still exist. It'll
> > have been truncated to zero length, and if the filesystem is stupid about
> > holes in files then maybe a write to a high block number would consume
> > excessive disk space, but does anyone still care about such filesystems?
> > I don't remember at the moment how we handle higher segments,
> >

We do unlink them and register the request to forget the Fsync
requests for those. See mdunlinkfork.

> > but likely
> > we could make them still exist too, postponing all the unlinks till after
> > checkpoint. Or we could just have the backends give up on recycling a
> > particular buffer if they can't write it (which is the response to an I/O
> > failure already, I hope).
> >

Note that we don't often try to flush the buffers from the backend. We
first try to forward the request to checkpoint queue and only if the
queue is full, the backend tries to flush it, so even if we decide to
give up flushing such a buffer (where we get an error) via backend, it
shouldn't impact very many cases. I am not sure but if we can somehow
reliably distinguish this type of error from any other I/O failure
then we can probably give up on flushing this buffer and continue or
maybe just retry to push this request to checkpointer.

>
> None of this sounds very appealing. Postponing the unlinks means
> postponing recovery of the space at the OS level, which I think will
> be noticeable and undesirable for users. The other notions all seem to
> involve treating as valid on-disk states we currently treat as
> invalid, and our sanity checks in this area are already far too weak.
> And all you're buying for it is putting a hash table that would
> otherwise be shared memory into backend-private memory, which seems
> like quite a minor gain. Having that information visible to everybody
> seems a lot cleaner.
>

The one more benefit of giving this responsibility to a single process
like checkpointer is that we can avoid unlinking the relation until we
scan all the buffers corresponding to it. Now, surely keeping it in
shared memory and allow other processes to work on it has other merits
which are that such buffers might get invalidated faster but not sure
we can retain the benefit of another approach which is to perform all
such invalidation of buffers before unlinking the relation's first
segment.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-08-08 14:07:31 Re: LSM tree for Postgres
Previous Message Amit Kapila 2020-08-08 09:52:50 Re: [Patch] Optimize dropping of relation buffers using dlist