Re: Cache relation sizes?

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Cache relation sizes?
Date: 2020-12-17 09:22:30
Message-ID: CAKU4AWpUNQtzAggKMJ+kNZkmNwi56ghec1NuxQmJjneEPDTwYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thomas,

Thank you for your quick response.

On Thu, Dec 17, 2020 at 3:05 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> Hi Andy,
>
> On Thu, Dec 17, 2020 at 7:29 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> > I spent one day studying the patch and I want to talk about one question
> for now.
> > What is the purpose of calling smgrimmedsync to evict a DIRTY sr (what
> will happen
> > if we remove it and the SR_SYNCING and SR_JUST_DIRTIED flags)?
>
> The underlying theory of the patch is that after fsync() succeeds, the
> new size is permanent, but before that it could change at any time
> because of asynchronous activities in the kernel*. Therefore, if you
> want to evict the size from shared memory, you have to fsync() the
> file first. I used smgrimmedsync() to do that.
>
> *I suspect that it can really only shrink with network file systems
> such as NFS and GlusterFS, where the kernel has to take liberties to
> avoid network round trips for every file extension, but I don't know
> the details on all 9 of our supported OSes and standards aren't very
> helpful for understanding buffered I/O.
>

Let me try to understand your point. Suppose process 1 extends a file to
2 blocks from 1 block, and fsync is not called, then a). the lseek *may*
still
return 1 based on the comments in the ReadBuffer_common ("because
of buggy Linux kernels that sometimes return an seek SEEK_END result
that doesn't account for a recent write"). b). When this patch is
introduced,
we would get 2 blocks for sure if we can get the size from cache. c). After
user reads the 2 blocks from cache and then the cache is evicted, and user
reads the blocks again, it is possible to get 1 again.

Do we need to consider case a) in this patch? and Is case c). the situation
you
want to avoid and do we have to introduce fsync to archive this? Basically I
think case a) should not happen by practice so we don't need to handle case
c) and finally we don't need the fsync/SR_SYNCING/SR_JUST_DIRTIED flag.
I'm not opposed to adding them, I am just interested in the background in
case
you are also willing to discuss it.

I would suggest the below change for smgrnblock_fast, FYI

@@ -182,7 +182,11 @@ smgrnblocks_fast(SMgrRelation reln, ForkNumber forknum)
/*
* The generation doesn't match, the shared relation must
have been
* evicted since we got a pointer to it. We'll need to do
more work.
+ * and invalid the fast path for next time.
*/
+ reln->smgr_shared = NULL;
}

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-17 09:36:59 Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
Previous Message Fujii Masao 2020-12-17 09:12:39 Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module