Re: logtape.c stats don't account for unused "prefetched" block numbers

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logtape.c stats don't account for unused "prefetched" block numbers
Date: 2020-09-09 06:28:49
Message-ID: dd630e6cec768004430d35fb92cc42cd8d0b7bfc.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2020-09-05 at 12:03 -0700, Peter Geoghegan wrote:
> The comments in ltsWriteBlock() added by the 2017 bugfix commit
> 7ac4a389a7d clearly say that the zero block writing stuff is only
> supposed to happen at the edge of a tape boundary, which ought to be
> rare -- see the comment block in ltsWriteBlock(). And yet the new
> preallocation stuff explicitly relies on that it writing zero blocks
> much more frequently. I'm concerned that that can result in increased
> and unnecessary I/O, especially for sorts, but also for hash aggs
> that
> spill. I'm also concerned that having preallocated-but-allocated
> blocks confuses the accounting used by
> trace_sort/LogicalTapeSetBlocks().

Preallocation showed significant gains for HashAgg, and BufFile doesn't
support sparse writes. So, for HashAgg, it seems like we should just
update the comment and consider it the price of using BufFile.

(Aside: is there a reason why BufFile doesn't support sparse writes, or
is it just a matter of implementation?)

For Sort, we can just disable preallocation.

> Separately, it's possible to make the
> trace_sort/LogicalTapeSetBlocks() instrumentation agree with the
> filesystem by replacing the use of nBlocksAllocated within
> LogicalTapeSetBlocks() with nBlocksWritten -- that seems to make the
> instrumentation correct without changing the current behavior at all.
> But I'm not ready to endorse that approach, since it's not quite
> clear
> what nBlocksAllocated and nBlocksWritten mean right now -- those two
> fields were both added by the aforementioned 2017 bugfix commit,
> which
> introduced the "allocated vs written" distinction in the first place

Right now, it seems nBlocksAllocated means "number of blocks returned
by ltsGetFreeBlock(), plus nHoleBlocks".

nBlocksWritten seems to mean "the logical size of the BufFile". The
BufFile can have holes in it after concatenation, but from the
perspective of logtape.c, nBlocksWritten seems like a better fit for
instrumentation purposes. So I'd be inclined to return (nBlocksWritten
- nHoleBlocks).

The only thing I can think of that would be better is if BufFile
tracked for itself the logical vs. physical size, which might be a good
improvement to make (and would mean that logtape.c wouldn't be
responsible for tracking the holes itself).

Thoughts?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-09-09 06:35:58 Re: Inconsistent Japanese name order in v13 contributors list
Previous Message Ian Barwick 2020-09-09 06:28:07 Re: Improving connection scalability: GetSnapshotData()