RE: [Patch] Optimize dropping of relation buffers using dlist

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-09-23 06:30:52
Message-ID: TYAPR01MB29907AC4C6218EDE02CF72F0FE380@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> The idea is that we can't use this optimization if the value is not
> cached because we can't rely on lseek behavior. See all the discussion
> between Horiguchi-San and me in the thread above. So, how would you
> ensure that if we don't use Kirk-San's proposal?

Hmm, buggy Linux kernel... (Until when should we be worried about the bug?)

According to the following Horiguchi-san's suggestion, it's during normal operation, not during recovery, when we should be careful, right? Then, we can use the current smgrnblocks() as is?

+ /*
+ * We cannot believe the result from smgr_nblocks is always accurate
+ * because lseek of buggy Linux kernels doesn't account for a recent
+ * write. However, we can rely on the result from lseek while recovering
+ * because the first call to this function is not happen just after a file
+ * extension. Return values on subsequent calls return cached nblocks,
+ * which should be accurate during recovery.
+ */
+ if (!InRecovery && must_accurate)
+ return InvalidBlockNumber;
+
return result;
}

If smgrnblocks() could return a smaller value than the actual file size by one block even during recovery, how about always adding one to the return value of smgrnblocks() in DropRelFileNodeBuffers()? When smgrnblocks() actually returned the correct value, the extra one block is not found in the shared buffer, so DropRelFileNodeBuffers() does no harm.

Or, add a new function like smgrnblocks_precise() to avoid adding an argument to smgrnblocks()?

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Katsuragi Yuta 2020-09-23 06:48:15 Re: [PATCH] Add features to pg_stat_statements
Previous Message Peter Eisentraut 2020-09-23 06:11:59 Re: Range checks of pg_test_fsync --secs-per-test and pg_test_timing --duration