Quick Links

RE: [Patch] Optimize dropping of relation buffers using dlist

From:	"tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To:	'Thomas Munro' <thomas(dot)munro(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	"k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: [Patch] Optimize dropping of relation buffers using dlist
Date:	2020-10-22 07:31:55
Message-ID:	TYAPR01MB299007E63A1E128D3BF39579FE1D0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> On Thu, Oct 22, 2020 at 7:33 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > Mmm. Not exact. The requirement here is that we must be certain that
> > the we don't have a buffuer for blocks after the file size known to
> > the process. While recoverying, If the first lseek() returned smaller
> > size than actual, we cannot have a buffer for the blocks after the
> > size. After we trncated or extended the file, we are certain that we
> > don't have a buffer for unknown blocks.
>
> Thanks, I understand now. Something feels fragile about it, perhaps
> because it's not really acting as a "cache" anymore despite its name,
> but I see the logic now. It becomes the authoritative source of
> information, even if the kernel decides to make our file smaller
> asynchronously.

Thank you Horiguchi-san, you are a savior! I was worried like the end of the world has come.

> I think a synchronised file size cache wouldn't be enough to use this
> trick outside the recovery process, because the initial value would
> come from a call to lseek(), but unlike recovery, that wouldn't happen
> *before* we start putting pages in the buffer pool. Also, if we one
> day have a size-limited relcache, even recovery could get into
> trouble, if it evicts the RelationData that holds the authoritative
> nblocks value.

That's too bad, because we hoped we may be able to various operations during normal operation (TRUNCATE, DROP TABLE/INDEX, DROP DATABASE, etc.) An honest man can't believe the system call, that's a hell.

I'm probably being silly, but can't we avoid the problem by using fstat() instead of lseek(SEEK_END)? Would they return the same value from the i-node?

Or, can't we just try to do BufTableLookup() one block after what smgrnblocks() returns?

Regards
Takayuki Tsunakawa

In response to

Re: [Patch] Optimize dropping of relation buffers using dlist at 2020-10-22 06:45:08 from Thomas Munro

Responses

Re: [Patch] Optimize dropping of relation buffers using dlist at 2020-10-22 08:50:36 from Kyotaro Horiguchi
Re: [Patch] Optimize dropping of relation buffers using dlist at 2020-10-22 21:45:05 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2020-10-22 08:39:08	Re: Track statistics for streaming of in-progress transactions
Previous Message	Peter Eisentraut	2020-10-22 07:03:49	Re: abstract Unix-domain sockets